Create model card (#3890)

Model: TinyBERT-spanish-uncased-finetuned-ner
2020-04-22 20:56:43 +02:00
parent d698b87f20
commit cb3c2212c7
1 changed files with 106 additions and 0 deletions
--- a/model_cards/mrm8488/TinyBERT-spanish-uncased-finetuned-ner/README.md
+++ b/model_cards/mrm8488/TinyBERT-spanish-uncased-finetuned-ner/README.md
@@ -0,0 +1,106 @@
+---
+language: spanish
+thumbnail:
+---
+
+# Spanish TinyBERT + NER
+
+This model is a fine-tuned on [NER-C](https://www.kaggle.com/nltkdata/conll-corpora) of a [Spanish Tiny Bert](https://huggingface.co/mrm8488/es-tinybert-v1-1) model I created using *distillation* for **NER** downstream task. The **size** of the model is **55MB**
+
+## Details of the downstream task (NER) - Dataset
+
+- [Dataset:  CONLL Corpora ES](https://www.kaggle.com/nltkdata/conll-corpora) 
+
+I preprocessed the dataset and splitted it as train / dev (80/20)
+
+| Dataset                | # Examples |
+| ---------------------- | ----- |
+| Train                  | 8.7 K |
+| Dev                    | 2.2 K |
+
+
+- [Fine-tune on NER script provided by Huggingface](https://github.com/huggingface/transformers/blob/master/examples/run_ner.py)
+
+- Labels covered:
+
+```
+B-LOC
+B-MISC
+B-ORG
+B-PER
+I-LOC
+I-MISC
+I-ORG
+I-PER
+O
+```
+
+## Metrics on evaluation set:
+
+|                                                      Metric                                                       |  # score  |
+| :------------------------------------------------------------------------------------: | :-------: |
+| F1                                       | **70.00**  
+| Precision                                | **67.83** | 
+| Recall                                   | **71.46** |    
+
+## Comparison:
+
+|                                                      Model                                                       |  # F1 score  |Size(MB)|
+| :--------------------------------------------------------------------------------------------------------------: | :-------: |:------|
+|                                        bert-base-spanish-wwm-cased (BETO)                                        |   88.43   | 421
+| [bert-spanish-cased-finetuned-ner](https://huggingface.co/mrm8488/bert-spanish-cased-finetuned-ner) | **90.17** | 420 |
+|                                              Best Multilingual BERT                                              |   87.38   | 681 |
+|TinyBERT-spanish-uncased-finetuned-ner (this one)                                                                  | 70.00 | **55** |
+
+## Model in action
+
+
+Example of usage:
+
+```python
+import torch
+from transformers import AutoModelForTokenClassification, AutoTokenizer
+
+id2label = {
+    "0": "B-LOC",
+    "1": "B-MISC",
+    "2": "B-ORG",
+    "3": "B-PER",
+    "4": "I-LOC",
+    "5": "I-MISC",
+    "6": "I-ORG",
+    "7": "I-PER",
+    "8": "O"
+}
+
+tokenizer = AutoTokenizer.from_pretrained('mrm8488/TinyBERT-spanish-uncased-finetuned-ner')
+model = AutoModelForTokenClassification.from_pretrained('mrm8488/TinyBERT-spanish-uncased-finetuned-ner')
+text ="Mis amigos están pensando viajar a Londres este verano."
+input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
+
+outputs = model(input_ids)
+last_hidden_states = outputs[0]
+
+for m in last_hidden_states:
+  for index, n in enumerate(m):
+    if(index > 0 and index <= len(text.split(" "))):
+      print(text.split(" ")[index-1] + ": " + id2label[str(torch.argmax(n).item())])
+      
+'''
+Output:
+--------
+Mis: O
+amigos: O
+están: O
+pensando: O
+viajar: O
+a: O
+Londres: B-LOC
+este: O
+verano.: O
+'''
+```
+
+> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
+
+> Made with <span style="color: #e25555;">&hearts;</span> in Spain