From 278fd28a32e50a31d7f9fb20c120618cc6370ee6 Mon Sep 17 00:00:00 2001 From: Thomas Wolf Date: Tue, 13 Nov 2018 09:34:49 +0100 Subject: [PATCH] added results for 16-bit fine-tuning in readme --- README.md | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/README.md b/README.md index c5b56a869d..cd4c7c97dd 100644 --- a/README.md +++ b/README.md @@ -236,3 +236,31 @@ python ./run_squad.py \ --gradient_accumulation_steps 2 \ --optimize_on_cpu ``` + +If you have a recent GPU (starting from NVIDIA Volta series), you should try **16-bit fine-tuning** (FP16). + +Here is an example of hyper-parameters for a FP16 run we tried: +```bash +python ./run_squad.py \ + --vocab_file $BERT_LARGE_DIR/vocab.txt \ + --bert_config_file $BERT_LARGE_DIR/bert_config.json \ + --init_checkpoint $BERT_LARGE_DIR/pytorch_model.bin \ + --do_lower_case \ + --do_train \ + --do_predict \ + --train_file $SQUAD_TRAIN \ + --predict_file $SQUAD_EVAL \ + --learning_rate 3e-5 \ + --num_train_epochs 2 \ + --max_seq_length 384 \ + --doc_stride 128 \ + --output_dir $OUTPUT_DIR \ + --train_batch_size 24 \ + --fp16 \ + --loss_scale 128 +``` + +The results were similar to the above FP32 results (actually slightly higher): +```bash +{"exact_match": 84.65468306527909, "f1": 91.238669287002} +```