small fix and updates to readme

This commit is contained in:
thomwolf
2019-06-19 09:38:38 +02:00
parent f7e2ac01ea
commit 68ab9599ce
5 changed files with 53 additions and 18 deletions

View File

@@ -1322,12 +1322,14 @@ python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json /tmp/debug_squad/pre
{"f1": 88.52381567990474, "exact_match": 81.22043519394512}
```
Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word Masking model to reach a F1 > 93 on SQuAD:
**distributed training**
Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD:
```bash
python -m torch.distributed.launch --nproc_per_node=8 \
run_squad.py \
--bert_model bert-large-cased-whole-word-masking \
--bert_model bert-large-uncased-whole-word-masking \
--do_train \
--do_predict \
--do_lower_case \
@@ -1337,17 +1339,31 @@ python -m torch.distributed.launch --nproc_per_node=8 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ../models/train_squad_large_cased_wwm/ \
--output_dir ../models/wwm_uncased_finetuned_squad/ \
--train_batch_size 24 \
--gradient_accumulation_steps 12
```
Training with these hyper-parameters gave us the following results:
```bash
python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ../models/train_squad_large_cased_wwm/predictions.json
python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ../models/wwm_uncased_finetuned_squad/predictions.json
{"exact_match": 86.91579943235573, "f1": 93.1532499015869}
```
This is the model provided as `bert-large-uncased-whole-word-masking-finetuned-squad`.
And here is the model provided as `bert-large-cased-whole-word-masking-finetuned-squad`:
```bash
python -m torch.distributed.launch --nproc_per_node=8 run_squad.py --bert_model bert-large-cased-whole-word-masking --do_train --do_predict --do_lower_case --train_file $SQUAD_DIR/train-v1.1.json --predict_file $SQUAD_DIR/dev-v1.1.json --learning_rate 3e-5 --num_train_epochs 2 --max_seq_length 384 --doc_stride 128 --output_dir ../models/wwm_cased_finetuned_squad/ --train_batch_size 24 --gradient_accumulation_steps 12
```
Training with these hyper-parameters gave us the following results:
```bash
python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ../models/wwm_uncased_finetuned_squad/predictions.json
{"exact_match": 84.18164616840113, "f1": 91.58645594850135}
```
#### SWAG
The data for SWAG can be downloaded by cloning the following [repository](https://github.com/rowanz/swagaf)