[model_cards] Add model cards for Urduhack model (roberta-urdu-small) (#6536)
* [model_cards] roberta-urdu-small added. * [model_cards] typo fixed. * Tweak license format (yaml expects a simple string) Co-authored-by: Ikram Ali <mrikram1989> Co-authored-by: Julien Chaumond <chaumond@gmail.com>
This commit is contained in:
30
model_cards/urduhack/roberta-urdu-small/README.md
Normal file
30
model_cards/urduhack/roberta-urdu-small/README.md
Normal file
@@ -0,0 +1,30 @@
|
||||
---
|
||||
language: ur
|
||||
thumbnail: https://raw.githubusercontent.com/urduhack/urduhack/master/docs/_static/urduhack.png
|
||||
tags:
|
||||
- roberta-urdu-small
|
||||
- urdu
|
||||
- transformers
|
||||
license: mit
|
||||
---
|
||||
## roberta-urdu-small
|
||||
|
||||
[](https://github.com/urduhack/urduhack/blob/master/LICENSE)
|
||||
### Overview
|
||||
**Language model:** roberta-urdu-small
|
||||
**Model size:** 125M
|
||||
**Language:** Urdu
|
||||
**Training data:** News data from urdu news resources in Pakistan
|
||||
### About roberta-urdu-small
|
||||
roberta-urdu-small is a language model for urdu language.
|
||||
```
|
||||
from transformers import pipeline
|
||||
fill_mask = pipeline("fill-mask", model="urduhack/roberta-urdu-small", tokenizer="urduhack/roberta-urdu-small")
|
||||
```
|
||||
## Training procedure
|
||||
roberta-urdu-small was trained on urdu news corpus. Training data was normalized using normalization module from
|
||||
urduhack to eliminate characters from other languages like arabic.
|
||||
|
||||
### About Urduhack
|
||||
Urduhack is a Natural Language Processing (NLP) library for urdu language.
|
||||
Github: https://github.com/urduhack/urduhack
|
||||
Reference in New Issue
Block a user