zuBERTa model card (#5536)
* Create README * Update README.md Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
This commit is contained in:
56
model_cards/MoseliMotsoehli/zuBERTa/README.md
Normal file
56
model_cards/MoseliMotsoehli/zuBERTa/README.md
Normal file
@@ -0,0 +1,56 @@
|
|||||||
|
---
|
||||||
|
language: zulu
|
||||||
|
---
|
||||||
|
|
||||||
|
# zuBERTa
|
||||||
|
zuBERTa is a RoBERTa style transformer language model trained on zulu text.
|
||||||
|
|
||||||
|
## Intended uses & limitations
|
||||||
|
The model can be used for getting embeddings to use on a down-stream task such as question answering.
|
||||||
|
|
||||||
|
#### How to use
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> from transformers import pipeline
|
||||||
|
>>> from transformers import AutoTokenizer, AutoModelWithLMHead
|
||||||
|
|
||||||
|
>>> tokenizer = AutoTokenizer.from_pretrained("MoseliMotsoehli/zuBERTa")
|
||||||
|
>>> model = AutoModelWithLMHead.from_pretrained("MoseliMotsoehli/zuBERTa")
|
||||||
|
>>> unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
|
||||||
|
>>> unmasker("Abafika eNkandla bafika sebeholwa <mask> uMpongo kaZingelwayo.")
|
||||||
|
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"sequence": "<s>Abafika eNkandla bafika sebeholwa khona uMpongo kaZingelwayo.</s>",
|
||||||
|
"score": 0.050459690392017365,
|
||||||
|
"token": 555,
|
||||||
|
"token_str": "Ġkhona"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"sequence": "<s>Abafika eNkandla bafika sebeholwa inkosi uMpongo kaZingelwayo.</s>",
|
||||||
|
"score": 0.03668094798922539,
|
||||||
|
"token": 2321,
|
||||||
|
"token_str": "Ġinkosi"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"sequence": "<s>Abafika eNkandla bafika sebeholwa ubukhosi uMpongo kaZingelwayo.</s>",
|
||||||
|
"score": 0.028774697333574295,
|
||||||
|
"token": 5101,
|
||||||
|
"token_str": "Ġubukhosi"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Training data
|
||||||
|
|
||||||
|
1. 30k sentences of text, came from the [Leipzig Corpora Collection](https://wortschatz.uni-leipzig.de/en/download) of zulu 2018. These were collected from news articles and creative writtings.
|
||||||
|
2. ~7500 articles of human generated translations were scraped from the zulu [wikipedia](https://zu.wikipedia.org/wiki/Special:AllPages).
|
||||||
|
|
||||||
|
### BibTeX entry and citation info
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@inproceedings{author = {Moseli Motsoehli},
|
||||||
|
title = {Towards transformation of Southern African language models through transformers.},
|
||||||
|
year={2020}
|
||||||
|
}
|
||||||
|
```
|
||||||
Reference in New Issue
Block a user