Merge branch 'master' into pr/2115
This commit is contained in:
@@ -24,8 +24,6 @@ pip install -r ./examples/requirements.txt
|
||||
| [Multiple Choice](#multiple-choice) | Examples running BERT/XLNet/RoBERTa on the SWAG/RACE/ARC tasks.
|
||||
| [Named Entity Recognition](#named-entity-recognition) | Using BERT for Named Entity Recognition (NER) on the CoNLL 2003 dataset, examples with distributed training. |
|
||||
| [XNLI](#xnli) | Examples running BERT/XLM on the XNLI benchmark. |
|
||||
| [Abstractive summarization](#abstractive-summarization) | Using the BertAbs
|
||||
model finetuned on the CNN/DailyMail dataset to generate summaries. |
|
||||
|
||||
## TensorFlow 2.0 Bert models on GLUE
|
||||
|
||||
@@ -45,7 +43,7 @@ Quick benchmarks from the script (no other modifications):
|
||||
| Titan V | AMP | 26s | 0.8281/0.8568/0.8411 |
|
||||
| V100 | FP32 | 35s | 0.8646/0.8359/0.8464 |
|
||||
| V100 | AMP | 22s | 0.8646/0.8385/0.8411 |
|
||||
| 1080 Ti | FP32 | 55s | - |
|
||||
| 1080 Ti | FP32 | 55s | - |
|
||||
|
||||
Mixed precision (AMP) reduces the training time considerably for the same hardware and hyper-parameters (same batch size was used).
|
||||
|
||||
@@ -359,9 +357,9 @@ eval_loss = 0.44457291918821606
|
||||
|
||||
Based on the script [`run_squad.py`](https://github.com/huggingface/transformers/blob/master/examples/run_squad.py).
|
||||
|
||||
#### Fine-tuning on SQuAD
|
||||
#### Fine-tuning BERT on SQuAD1.0
|
||||
|
||||
This example code fine-tunes BERT on the SQuAD dataset. It runs in 24 min (with BERT-base) or 68 min (with BERT-large)
|
||||
This example code fine-tunes BERT on the SQuAD1.0 dataset. It runs in 24 min (with BERT-base) or 68 min (with BERT-large)
|
||||
on a single tesla V100 16GB. The data for SQuAD can be downloaded with the following links and should be saved in a
|
||||
$SQUAD_DIR directory.
|
||||
|
||||
@@ -369,6 +367,12 @@ $SQUAD_DIR directory.
|
||||
* [dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json)
|
||||
* [evaluate-v1.1.py](https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py)
|
||||
|
||||
And for SQuAD2.0, you need to download:
|
||||
|
||||
- [train-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json)
|
||||
- [dev-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json)
|
||||
- [evaluate-v2.0.py](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/)
|
||||
|
||||
```bash
|
||||
export SQUAD_DIR=/path/to/SQUAD
|
||||
|
||||
@@ -398,7 +402,7 @@ exact_match = 81.22
|
||||
#### Distributed training
|
||||
|
||||
|
||||
Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD:
|
||||
Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD1.0:
|
||||
|
||||
```bash
|
||||
python -m torch.distributed.launch --nproc_per_node=8 run_squad.py \
|
||||
@@ -430,7 +434,9 @@ This fine-tuned model is available as a checkpoint under the reference
|
||||
|
||||
#### Fine-tuning XLNet on SQuAD
|
||||
|
||||
This example code fine-tunes XLNet on the SQuAD dataset. See above to download the data for SQuAD .
|
||||
This example code fine-tunes XLNet on both SQuAD1.0 and SQuAD2.0 dataset. See above to download the data for SQuAD .
|
||||
|
||||
##### Command for SQuAD1.0:
|
||||
|
||||
```bash
|
||||
export SQUAD_DIR=/path/to/SQUAD
|
||||
@@ -453,7 +459,32 @@ python /data/home/hlu/transformers/examples/run_squad.py \
|
||||
--save_steps 5000
|
||||
```
|
||||
|
||||
Training with the previously defined hyper-parameters yields the following results:
|
||||
##### Command for SQuAD2.0:
|
||||
|
||||
```bash
|
||||
export SQUAD_DIR=/path/to/SQUAD
|
||||
|
||||
python run_squad.py \
|
||||
--model_type xlnet \
|
||||
--model_name_or_path xlnet-large-cased \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--version_2_with_negative \
|
||||
--train_file $SQUAD_DIR/train-v2.0.json \
|
||||
--predict_file $SQUAD_DIR/dev-v2.0.json \
|
||||
--learning_rate 3e-5 \
|
||||
--num_train_epochs 4 \
|
||||
--max_seq_length 384 \
|
||||
--doc_stride 128 \
|
||||
--output_dir ./wwm_cased_finetuned_squad/ \
|
||||
--per_gpu_eval_batch_size=2 \
|
||||
--per_gpu_train_batch_size=2 \
|
||||
--save_steps 5000
|
||||
```
|
||||
|
||||
Larger batch size may improve the performance while costing more memory.
|
||||
|
||||
##### Results for SQuAD1.0 with the previously defined hyper-parameters:
|
||||
|
||||
```python
|
||||
{
|
||||
@@ -466,10 +497,28 @@ Training with the previously defined hyper-parameters yields the following resul
|
||||
}
|
||||
```
|
||||
|
||||
##### Results for SQuAD2.0 with the previously defined hyper-parameters:
|
||||
|
||||
```python
|
||||
{
|
||||
"exact": 80.4177545691906,
|
||||
"f1": 84.07154997729623,
|
||||
"total": 11873,
|
||||
"HasAns_exact": 76.73751686909581,
|
||||
"HasAns_f1": 84.05558584352873,
|
||||
"HasAns_total": 5928,
|
||||
"NoAns_exact": 84.0874684608915,
|
||||
"NoAns_f1": 84.0874684608915,
|
||||
"NoAns_total": 5945
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Named Entity Recognition
|
||||
|
||||
Based on the scripts [`run_ner.py`](https://github.com/huggingface/transformers/blob/master/examples/run_ner.py) for Pytorch and
|
||||
[`run_tf_ner.py`(https://github.com/huggingface/transformers/blob/master/examples/run_tf_ner.py)] for Tensorflow 2.
|
||||
[`run_tf_ner.py`](https://github.com/huggingface/transformers/blob/master/examples/run_tf_ner.py) for Tensorflow 2.
|
||||
This example fine-tune Bert Multilingual on GermEval 2014 (German NER).
|
||||
Details and results for the fine-tuning provided by @stefan-it.
|
||||
|
||||
@@ -646,34 +695,6 @@ micro avg 0.8722 0.8774 0.8748 13869
|
||||
macro avg 0.8712 0.8774 0.8740 13869
|
||||
```
|
||||
|
||||
## Abstractive summarization
|
||||
|
||||
Based on the script
|
||||
[`run_summarization_finetuning.py`](https://github.com/huggingface/transformers/blob/master/examples/run_summarization_finetuning.py).
|
||||
|
||||
Before running this script you should download **both** CNN and Daily Mail
|
||||
datasets from [Kyunghyun Cho's website](https://cs.nyu.edu/~kcho/DMQA/) (the
|
||||
links next to "Stories") in the same folder. Then uncompress the archives by running:
|
||||
|
||||
```bash
|
||||
tar -xvf cnn_stories.tgz && tar -xvf dailymail_stories.tgz
|
||||
```
|
||||
|
||||
note that the finetuning script **will not work** if you do not download both
|
||||
datasets. We will refer as `$DATA_PATH` the path to where you uncompressed both
|
||||
archive.
|
||||
|
||||
```bash
|
||||
export DATA_PATH=/path/to/dataset/
|
||||
|
||||
python run_summarization_finetuning.py \
|
||||
--output_dir=output \
|
||||
--model_type=bert2bert \
|
||||
--model_name_or_path=bert2bert \
|
||||
--do_train \
|
||||
--data_path=$DATA_PATH \
|
||||
```
|
||||
|
||||
## XNLI
|
||||
|
||||
Based on the script [`run_xnli.py`](https://github.com/huggingface/transformers/blob/master/examples/run_xnli.py).
|
||||
|
||||
Reference in New Issue
Block a user