From b3f272ffcbc319b3dd1279a6771cefe5557a8546 Mon Sep 17 00:00:00 2001 From: Manuel Romero Date: Mon, 27 Apr 2020 10:15:15 +0200 Subject: [PATCH] Create model card --- .../README.md | 72 +++++++++++++++++++ 1 file changed, 72 insertions(+) create mode 100644 model_cards/mrm8488/bert-small-finetuned-typo-detection/README.md diff --git a/model_cards/mrm8488/bert-small-finetuned-typo-detection/README.md b/model_cards/mrm8488/bert-small-finetuned-typo-detection/README.md new file mode 100644 index 0000000000..242894d75b --- /dev/null +++ b/model_cards/mrm8488/bert-small-finetuned-typo-detection/README.md @@ -0,0 +1,72 @@ +--- +language: english +thumbnail: +--- + +# BERT SMALL + Typo Detection βœβŒβœβœ” + +[BERT SMALL](https://huggingface.co/google/bert_uncased_L-4_H-512_A-8) fine-tuned on [GitHub Typo Corpus](https://github.com/mhagiwara/github-typo-corpus) for **typo detection** (using *NER* style) + +## Details of the downstream task (Typo detection as NER) + +- Dataset: [GitHub Typo Corpus](https://github.com/mhagiwara/github-typo-corpus) πŸ“š + +- [Fine-tune script on NER dataset provided by Huggingface](https://github.com/huggingface/transformers/blob/master/examples/run_ner.py) πŸ‹οΈβ€β™‚οΈ + +## Metrics on test set πŸ“‹ + +| Metric | # score | +| :-------: | :-------: | +| F1 | **89.12** | +| Precision | **93.82** | +| Recall | **84.87** | + +## Model in action πŸ”¨ + +Fast usage with **pipelines** πŸ§ͺ + +```python +from transformers import pipeline + +typo_checker = pipeline( + "ner", + model="mrm8488/bert-small-finetuned-typo-detection", + tokenizer="mrm8488/bert-small-finetuned-typo-detection" +) + +result = typo_checker("here there is an error in coment") +result[1:-1] + +# Output: +[{'entity': 'ok', 'score': 0.9021041989326477, 'word': 'here'}, + {'entity': 'ok', 'score': 0.7975626587867737, 'word': 'there'}, + {'entity': 'ok', 'score': 0.8596242070198059, 'word': 'is'}, + {'entity': 'ok', 'score': 0.7071516513824463, 'word': 'an'}, + {'entity': 'ok', 'score': 0.943381130695343, 'word': 'error'}, + {'entity': 'ok', 'score': 0.8047608733177185, 'word': 'in'}, + {'entity': 'ok', 'score': 0.8240702152252197, 'word': 'come'}, + {'entity': 'typo', 'score': 0.5004884004592896, 'word': '##nt'}] +``` + +It worksπŸŽ‰! we typed ```coment``` instead of ```comment``` + +Let's try with another example + +```python +result = typo_checker("Adddd validation midelware") +result[1:-1] + +# Output: +[{'entity': 'ok', 'score': 0.7128152847290039, 'word': 'add'}, + {'entity': 'typo', 'score': 0.5388424396514893, 'word': '##dd'}, + {'entity': 'ok', 'score': 0.94792640209198, 'word': 'validation'}, + {'entity': 'typo', 'score': 0.5839331746101379, 'word': 'mid'}, + {'entity': 'ok', 'score': 0.5195121765136719, 'word': '##el'}, + {'entity': 'ok', 'score': 0.7222476601600647, 'word': '##ware'}] +``` +Yeah! We typed wrong ```Add and middleware``` + + +> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) + +> Made with in Spain