zuBERTa model card (#5536)
* Create README * Update README.md Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
This commit is contained in:
56
model_cards/MoseliMotsoehli/zuBERTa/README.md
Normal file
56
model_cards/MoseliMotsoehli/zuBERTa/README.md
Normal file
@@ -0,0 +1,56 @@
|
||||
---
|
||||
language: zulu
|
||||
---
|
||||
|
||||
# zuBERTa
|
||||
zuBERTa is a RoBERTa style transformer language model trained on zulu text.
|
||||
|
||||
## Intended uses & limitations
|
||||
The model can be used for getting embeddings to use on a down-stream task such as question answering.
|
||||
|
||||
#### How to use
|
||||
|
||||
```python
|
||||
>>> from transformers import pipeline
|
||||
>>> from transformers import AutoTokenizer, AutoModelWithLMHead
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("MoseliMotsoehli/zuBERTa")
|
||||
>>> model = AutoModelWithLMHead.from_pretrained("MoseliMotsoehli/zuBERTa")
|
||||
>>> unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
|
||||
>>> unmasker("Abafika eNkandla bafika sebeholwa <mask> uMpongo kaZingelwayo.")
|
||||
|
||||
[
|
||||
{
|
||||
"sequence": "<s>Abafika eNkandla bafika sebeholwa khona uMpongo kaZingelwayo.</s>",
|
||||
"score": 0.050459690392017365,
|
||||
"token": 555,
|
||||
"token_str": "Ġkhona"
|
||||
},
|
||||
{
|
||||
"sequence": "<s>Abafika eNkandla bafika sebeholwa inkosi uMpongo kaZingelwayo.</s>",
|
||||
"score": 0.03668094798922539,
|
||||
"token": 2321,
|
||||
"token_str": "Ġinkosi"
|
||||
},
|
||||
{
|
||||
"sequence": "<s>Abafika eNkandla bafika sebeholwa ubukhosi uMpongo kaZingelwayo.</s>",
|
||||
"score": 0.028774697333574295,
|
||||
"token": 5101,
|
||||
"token_str": "Ġubukhosi"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
## Training data
|
||||
|
||||
1. 30k sentences of text, came from the [Leipzig Corpora Collection](https://wortschatz.uni-leipzig.de/en/download) of zulu 2018. These were collected from news articles and creative writtings.
|
||||
2. ~7500 articles of human generated translations were scraped from the zulu [wikipedia](https://zu.wikipedia.org/wiki/Special:AllPages).
|
||||
|
||||
### BibTeX entry and citation info
|
||||
|
||||
```bibtex
|
||||
@inproceedings{author = {Moseli Motsoehli},
|
||||
title = {Towards transformation of Southern African language models through transformers.},
|
||||
year={2020}
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user