From 98ee802023a4db76879c761e7ce3677eb4555871 Mon Sep 17 00:00:00 2001 From: Ikram Ali Date: Tue, 18 Aug 2020 01:04:29 +0500 Subject: [PATCH] [model_cards] Add model cards for Urduhack model (roberta-urdu-small) (#6536) * [model_cards] roberta-urdu-small added. * [model_cards] typo fixed. * Tweak license format (yaml expects a simple string) Co-authored-by: Ikram Ali Co-authored-by: Julien Chaumond --- .../urduhack/roberta-urdu-small/README.md | 30 +++++++++++++++++++ 1 file changed, 30 insertions(+) create mode 100644 model_cards/urduhack/roberta-urdu-small/README.md diff --git a/model_cards/urduhack/roberta-urdu-small/README.md b/model_cards/urduhack/roberta-urdu-small/README.md new file mode 100644 index 0000000000..b0d6cfbf40 --- /dev/null +++ b/model_cards/urduhack/roberta-urdu-small/README.md @@ -0,0 +1,30 @@ +--- +language: ur +thumbnail: https://raw.githubusercontent.com/urduhack/urduhack/master/docs/_static/urduhack.png +tags: +- roberta-urdu-small +- urdu +- transformers +license: mit +--- +## roberta-urdu-small + +[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/urduhack/urduhack/blob/master/LICENSE) +### Overview +**Language model:** roberta-urdu-small +**Model size:** 125M +**Language:** Urdu +**Training data:** News data from urdu news resources in Pakistan +### About roberta-urdu-small +roberta-urdu-small is a language model for urdu language. +``` +from transformers import pipeline +fill_mask = pipeline("fill-mask", model="urduhack/roberta-urdu-small", tokenizer="urduhack/roberta-urdu-small") +``` +## Training procedure +roberta-urdu-small was trained on urdu news corpus. Training data was normalized using normalization module from +urduhack to eliminate characters from other languages like arabic. + +### About Urduhack +Urduhack is a Natural Language Processing (NLP) library for urdu language. +Github: https://github.com/urduhack/urduhack