Change to the README file to add SWAG results.

2018-12-10 15:34:19 +01:00
parent 150f3cd9fa
commit 0876b77f7f
1 changed files with 13 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -441,13 +441,25 @@ python run_swag.py \
  --do_train \
  --do_eval \
  --data_dir $SWAG_DIR/data
-  --train_batch_size 10 \
+  --train_batch_size 4 \
  --learning_rate 2e-5 \
  --num_train_epochs 3.0 \
  --max_seq_length 80 \
  --output_dir /tmp/swag_output/
 ```
 Training with the previous hyper-parameters gave us the following results:
 ```
 eval_accuracy = 0.7776167149855043
 eval_loss = 1.006812262735175
 global_step = 55161
 loss = 0.282251750624779
 ```
 The difference with the `81.6%` accuracy announced in the Bert article
 is probably due to the different `training_batch_size` (here 4 and 16
 in the article).
 ## Fine-tuning BERT-large on GPUs
 The options we list above allow to fine-tune BERT-large rather easily on GPU(s) instead of the TPU used by the original implementation.