From 6c61c0801e28ef348b780bfa260d0b8217b76dfc Mon Sep 17 00:00:00 2001 From: Manuel Romero Date: Wed, 11 Mar 2020 00:33:05 +0100 Subject: [PATCH] Create README.md --- .../README.md | 78 +++++++++++++++++++ 1 file changed, 78 insertions(+) create mode 100644 model_cards/mrm8488/bert-multi-cased-finedtuned-xquad-tydiqa-goldp/README.md diff --git a/model_cards/mrm8488/bert-multi-cased-finedtuned-xquad-tydiqa-goldp/README.md b/model_cards/mrm8488/bert-multi-cased-finedtuned-xquad-tydiqa-goldp/README.md new file mode 100644 index 0000000000..45a08bd6dd --- /dev/null +++ b/model_cards/mrm8488/bert-multi-cased-finedtuned-xquad-tydiqa-goldp/README.md @@ -0,0 +1,78 @@ +--- +language: multilingual +thumbnail: +--- + +# A fine-tuned model on GoldP task from Tydi QA dataset + +This model uses [bert-multi-cased-finetuned-xquadv1](https://huggingface.co/mrm8488/bert-multi-cased-finetuned-xquadv1) and fine-tuned on [Tydi QA](https://github.com/google-research-datasets/tydiqa) dataset for Gold Passage task [(GoldP)](https://github.com/google-research-datasets/tydiqa#the-tasks) + +## Details of the language model +The base language model [(bert-multi-cased-finetuned-xquadv1)](https://huggingface.co/mrm8488/bert-multi-cased-finetuned-xquadv1) is a fine-tuned version of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) for the **Q&A** downstream task + + +## Details of the Tydi QA dataset + +TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and without the use of translation, and is designed for the **training and evaluation** of automatic question answering systems. This repository provides evaluation code and a baseline system for the dataset. https://ai.google.com/research/tydiqa + + +## Details of the downstream task (Gold Passage or GoldP aka the secondary task) + +Given a passage that is guaranteed to contain the answer, predict the single contiguous span of characters that answers the question. the gold passage task differs from the [primary task](https://github.com/google-research-datasets/tydiqa/blob/master/README.md#the-tasks) in several ways: +* only the gold answer passage is provided rather than the entire Wikipedia article; +* unanswerable questions have been discarded, similar to MLQA and XQuAD; +* we evaluate with the SQuAD 1.1 metrics like XQuAD; and +* Thai and Japanese are removed since the lack of whitespace breaks some tools. + + +## Model training + +The model was fine-tuned on a Tesla P100 GPU and 25GB of RAM. +The script is the following: + +```python +python run_squad.py \ + --model_type bert \ + --model_name_or_path mrm8488/bert-multi-cased-finetuned-xquadv1 \ + --do_train \ + --do_eval \ + --train_file /content/dataset/train.json \ + --predict_file /content/dataset/dev.json \ + --per_gpu_train_batch_size 24 \ + --per_gpu_eval_batch_size 24 \ + --learning_rate 3e-5 \ + --num_train_epochs 2.5 \ + --max_seq_length 384 \ + --doc_stride 128 \ + --output_dir /content/model_output \ + --overwrite_output_dir \ + --save_steps 5000 \ + --threads 40 + ``` + +## Global Results (dev set): + +| Metric | # Value | +| --------- | ----------- | +| **Exact** | **71.06** | +| **F1** | **82.16** | + +## Specific Results (per language): + +| Language | # Samples | # Exact | # F1 | +| --------- | ----------- |--------| ------ | +| Arabic | 1314 | 73.29 | 84.72 | +| Bengali | 180 | 64.60 | 77.84 | +| English | 654 | 72.12 | 82.24 | +| Finnish | 1031 | 70.14 | 80.36 | +| Indonesian| 773 | 77.25 | 86.36 | +| Korean | 414 | 68.92 | 70.95 | +| Russian | 1079 | 62.65 | 78.55 | +| Swahili | 596 | 80.11 | 86.18 | +| Telegu | 874 | 71.00 | 84.24 | + + + +> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) + +> Made with in Spain