From 0876b77f7fbda110d5e64c03880e34123f2cea88 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gr=C3=A9gory=20Ch=C3=A2tel?= <chatel.gregory@gmail.com>
Date: Mon, 10 Dec 2018 15:34:19 +0100
Subject: [PATCH] Change to the README file to add SWAG results.

---
 README.md | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index d443ba7a07..23cd315c29 100644
--- a/README.md
+++ b/README.md
@@ -441,13 +441,25 @@ python run_swag.py \
   --do_train \
   --do_eval \
   --data_dir $SWAG_DIR/data
-  --train_batch_size 10 \
+  --train_batch_size 4 \
   --learning_rate 2e-5 \
   --num_train_epochs 3.0 \
   --max_seq_length 80 \
   --output_dir /tmp/swag_output/
 ```
 
+Training with the previous hyper-parameters gave us the following results:
+```
+eval_accuracy = 0.7776167149855043
+eval_loss = 1.006812262735175
+global_step = 55161
+loss = 0.282251750624779
+```
+
+The difference with the `81.6%` accuracy announced in the Bert article
+is probably due to the different `training_batch_size` (here 4 and 16
+in the article).
+
 ## Fine-tuning BERT-large on GPUs
 
 The options we list above allow to fine-tune BERT-large rather easily on GPU(s) instead of the TPU used by the original implementation.