From 55b932a8180e1823285f6a29f853f600463a5006 Mon Sep 17 00:00:00 2001 From: Manuel Romero Date: Fri, 3 Jul 2020 12:19:49 +0200 Subject: [PATCH] Create model card (#5464) Create model card for electra-small-discriminator fine-tuned on SQUAD v2.0 --- .../electra-small-finetuned-squadv2/README.md | 86 +++++++++++++++++++ 1 file changed, 86 insertions(+) create mode 100644 model_cards/mrm8488/electra-small-finetuned-squadv2/README.md diff --git a/model_cards/mrm8488/electra-small-finetuned-squadv2/README.md b/model_cards/mrm8488/electra-small-finetuned-squadv2/README.md new file mode 100644 index 0000000000..0a3404b5b0 --- /dev/null +++ b/model_cards/mrm8488/electra-small-finetuned-squadv2/README.md @@ -0,0 +1,86 @@ +--- +language: english +--- + +# Electra small โšก + SQuAD v2 โ“ + +[Electra-small-discriminator](https://huggingface.co/google/electra-small-discriminator) fine-tuned on [SQUAD v2.0 dataset](https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/) for **Q&A** downstream task. + +## Details of the downstream task (Q&A) - Model ๐Ÿง  + +**ELECTRA** is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a [GAN](https://arxiv.org/pdf/1406.2661.pdf). At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the [SQuAD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) dataset. + + +## Details of the downstream task (Q&A) - Dataset ๐Ÿ“š + +**SQuAD2.0** combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering. + +## Model training ๐Ÿ‹๏ธโ€ + +The model was trained on a Tesla P100 GPU and 25GB of RAM with the following command: + +```bash +python transformers/examples/question-answering/run_squad.py \ + --model_type electra \ + --model_name_or_path 'google/electra-small-discriminator' \ + --do_eval \ + --do_train \ + --do_lower_case \ + --train_file '/content/dataset/train-v2.0.json' \ + --predict_file '/content/dataset/dev-v2.0.json' \ + --per_gpu_train_batch_size 16 \ + --learning_rate 3e-5 \ + --num_train_epochs 10 \ + --max_seq_length 384 \ + --doc_stride 128 \ + --output_dir '/content/output' \ + --overwrite_output_dir \ + --save_steps 1000 \ + --version_2_with_negative +``` + +## Test set Results ๐Ÿงพ + +| Metric | # Value | +| ------ | --------- | +| **EM** | **69.71** | +| **F1** | **73.44** | +| **Size**| **50 MB** | + + +```json +{ +'exact': 69.71279373368147, +'f1': 73.4439546123672, +'total': 11873, +'HasAns_exact': 69.92240215924427, +'HasAns_f1': 77.39542393937836, +'HasAns_total': 5928, +'NoAns_exact': 69.50378469301934, +'NoAns_f1': 69.50378469301934, +'NoAns_total': 5945, +'best_exact': 69.71279373368147, +'best_exact_thresh': 0.0, +'best_f1': 73.44395461236732, +'best_f1_thresh': 0.0 +} +``` + +### Model in action ๐Ÿš€ + +Fast usage with **pipelines**: + +```python +from transformers import pipeline +QnA_pipeline = pipeline('question-answering', model='mrm8488/electra-base-finetuned-squadv2') +QnA_pipeline({ + 'context': 'A new strain of flu that has the potential to become a pandemic has been identified in China by scientists.', + 'question': 'What has been discovered by scientists from China ?' +}) +# Output: +{'answer': 'A new strain of flu', 'end': 19, 'score': 0.8650811568752914, 'start': 0} +``` + +> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/) + +> Made with in Spain