From d6b6ab11f04eec306a7386d2b52bd31018902088 Mon Sep 17 00:00:00 2001 From: Bashar Talafha Date: Wed, 8 Jul 2020 23:07:48 +0300 Subject: [PATCH] Create README.md (#5601) --- .../multi-dialect-bert-base-arabic/README.md | 63 +++++++++++++++++++ 1 file changed, 63 insertions(+) create mode 100644 model_cards/bashar-talafha/multi-dialect-bert-base-arabic/README.md diff --git a/model_cards/bashar-talafha/multi-dialect-bert-base-arabic/README.md b/model_cards/bashar-talafha/multi-dialect-bert-base-arabic/README.md new file mode 100644 index 0000000000..7cc5397dd1 --- /dev/null +++ b/model_cards/bashar-talafha/multi-dialect-bert-base-arabic/README.md @@ -0,0 +1,63 @@ +# Multi-dialect-Arabic-BERT +This is a repository of Multi-dialect Arabic BERT model. + +By [Mawdoo3-AI](https://ai.mawdoo3.com/). + +

+
+ Background reference: http://www.qfi.org/wp-content/uploads/2018/02/Qfi_Infographic_Mother-Language_Final.pdf +
+

+ + + +### About our Multi-dialect-Arabic-BERT model +Instead of training the Multi-dialect Arabic BERT model from scratch, we initialized the weights of the model using [Arabic-BERT](https://github.com/alisafaya/Arabic-BERT) and trained it on 10M arabic tweets from the unlabled data of [The Nuanced Arabic Dialect Identification (NADI) shared task](https://sites.google.com/view/nadi-shared-task). + +### To cite this work +We haven't published any paper on this work. Please cite this repository: +``` +@inproceedings{talafha-etal-2020-nadi, + title ={{Multi-dialect Arabic BERT for Country-level Dialect Identification}}, + author = {Talafha, Bashar, Ali, Mohammad, Za'ter, Muhy Eddin, Seelawi, Haitham, Tuffaha, Ibraheem, Samir, Mostafa, Farhan, Wael and Al-Natsheh, Hussein}, + booktitle ={{Proceedings of the Fifth Arabic Natural Language Processing Workshop (WANLP2020)}}, + year = {2020}, + address = {Barcelona, Spain} +} +``` +We will update the BibTeX once the paper published. + +### Usage +The model weights can be loaded using `transformers` library by HuggingFace. + +```python +from transformers import AutoTokenizer, AutoModel + +tokenizer = AutoTokenizer.from_pretrained("bashar-talafha/multi-dialect-bert-base-arabic") +model = AutoModel.from_pretrained("bashar-talafha/multi-dialect-bert-base-arabic") +``` + +Example using `pipeline`: + +```python +from transformers import pipeline + +fill_mask = pipeline( + "fill-mask", + model="bashar-talafha/multi-dialect-bert-base-arabic ", + tokenizer="bashar-talafha/multi-dialect-bert-base-arabic " +) + +fill_mask(" سافر الرحالة من مطار [MASK] ") +``` +``` +[{'sequence': '[CLS] سافر الرحالة من مطار الكويت [SEP]', 'score': 0.08296813815832138, 'token': 3226}, + {'sequence': '[CLS] سافر الرحالة من مطار دبي [SEP]', 'score': 0.05123933032155037, 'token': 4747}, + {'sequence': '[CLS] سافر الرحالة من مطار مسقط [SEP]', 'score': 0.046838656067848206, 'token': 13205}, + {'sequence': '[CLS] سافر الرحالة من مطار القاهرة [SEP]', 'score': 0.03234650194644928, 'token': 4003}, + {'sequence': '[CLS] سافر الرحالة من مطار الرياض [SEP]', 'score': 0.02606341242790222, 'token': 2200}] +``` +### Repository +Please check the [original repository](https://github.com/mawdoo3/Multi-dialect-Arabic-BERT) for more information. + +