From 278fd28a32e50a31d7f9fb20c120618cc6370ee6 Mon Sep 17 00:00:00 2001
From: Thomas Wolf <thomwolf@users.noreply.github.com>
Date: Tue, 13 Nov 2018 09:34:49 +0100
Subject: [PATCH] added results for 16-bit fine-tuning in readme

---
 README.md | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/README.md b/README.md
index c5b56a869d..cd4c7c97dd 100644
--- a/README.md
+++ b/README.md
@@ -236,3 +236,31 @@ python ./run_squad.py \
   --gradient_accumulation_steps 2 \
   --optimize_on_cpu
 ```
+
+If you have a recent GPU (starting from NVIDIA Volta series), you should try **16-bit fine-tuning** (FP16).
+
+Here is an example of hyper-parameters for a FP16 run we tried:
+```bash
+python ./run_squad.py \
+  --vocab_file $BERT_LARGE_DIR/vocab.txt \
+  --bert_config_file $BERT_LARGE_DIR/bert_config.json \
+  --init_checkpoint $BERT_LARGE_DIR/pytorch_model.bin \
+  --do_lower_case \
+  --do_train \
+  --do_predict \
+  --train_file $SQUAD_TRAIN \
+  --predict_file $SQUAD_EVAL \
+  --learning_rate 3e-5 \
+  --num_train_epochs 2 \
+  --max_seq_length 384 \
+  --doc_stride 128 \
+  --output_dir $OUTPUT_DIR \
+  --train_batch_size 24 \
+  --fp16 \
+  --loss_scale 128
+```
+
+The results were similar to the above FP32 results (actually slightly higher):
+```bash
+{"exact_match": 84.65468306527909, "f1": 91.238669287002}
+```