From 5882c442e52921d6e8755efccd7e11a2ae405bbe Mon Sep 17 00:00:00 2001 From: erenup Date: Mon, 16 Sep 2019 22:38:08 +0800 Subject: [PATCH] add example usage --- examples/README.md | 37 ++++++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/examples/README.md b/examples/README.md index c47dc41433..3253e5481c 100644 --- a/examples/README.md +++ b/examples/README.md @@ -8,7 +8,8 @@ similar API between the different models. | [Language Model fine-tuning](#language-model-fine-tuning) | Fine-tuning the library models for language modeling on a text dataset. Causal language modeling for GPT/GPT-2, masked language modeling for BERT/RoBERTa. | | [Language Generation](#language-generation) | Conditional text generation using the auto-regressive models of the library: GPT, GPT-2, Transformer-XL and XLNet. | | [GLUE](#glue) | Examples running BERT/XLM/XLNet/RoBERTa on the 9 GLUE tasks. Examples feature distributed training as well as half-precision. | -| [SQuAD](#squad) | Using BERT for question answering, examples with distributed training. | +| [SQuAD](#squad) | Using BERT for question answering, examples with distributed training. | +| [Multiple Choice](#multiple choice) | Examples running BERT/XLNet/RoBERTa on the SWAG/RACE/ARC tasks. ## Language model fine-tuning @@ -282,6 +283,40 @@ The results are the following: loss = 0.04755385363816904 ``` +##Multiple Choice + +Based on the script [`run_multiple_choice.py`](). + +#### Fine-tuning on SWAG +Download [swag](https://github.com/rowanz/swagaf/tree/master/data) data + +``` +#training on 4 tesla V100(16GB) GPUS +export SWAG_DIR=/path/to/swag_data_dir +python ./examples/single_model_scripts/run_multiple_choice.py \ +--model_type roberta \ +--task_name swag \ +--model_name_or_path roberta-base \ +--do_train \ +--do_eval \ +--do_lower_case \ +--data_dir $SWAG_DIR \ +--learning_rate 5e-5 \ +--num_train_epochs 3 \ +--max_seq_length 80 \ +--output_dir models_bert/swag_base \ +--per_gpu_eval_batch_size=16 \ +--per_gpu_train_batch_size=16 \ +--gradient_accumulation_steps 2 \ +--overwrite_output +``` +Training with the defined hyper-parameters yields the following results: +``` +***** Eval results ***** +eval_acc = 0.8338998300509847 +eval_loss = 0.44457291918821606 +``` + ## SQuAD Based on the script [`run_squad.py`](https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_squad.py).