Merge pull request #388 from ananyahjha93/master
Added remaining GLUE tasks to 'run_classifier.py'
This commit is contained in:
51
README.md
51
README.md
@@ -927,11 +927,60 @@ Where `$THIS_MACHINE_INDEX` is an sequential index assigned to each of your mach
|
||||
|
||||
We showcase several fine-tuning examples based on (and extended from) [the original implementation](https://github.com/google-research/bert/):
|
||||
|
||||
- a *sequence-level classifier* on the MRPC classification corpus,
|
||||
- a *sequence-level classifier* on nine different GLUE tasks,
|
||||
- a *token-level classifier* on the question answering dataset SQuAD, and
|
||||
- a *sequence-level multiple-choice classifier* on the SWAG classification corpus.
|
||||
- a *BERT language model* on another target corpus
|
||||
|
||||
#### GLUE results on dev set
|
||||
|
||||
We get the following results on the dev set of GLUE benchmark with an uncased BERT base
|
||||
model. All experiments were run on a P100 GPU with a batch size of 32.
|
||||
|
||||
| Task | Metric | Result |
|
||||
|-|-|-|
|
||||
| CoLA | Matthew's corr. | 57.29 |
|
||||
| SST-2 | accuracy | 93.00 |
|
||||
| MRPC | F1/accuracy | 88.85/83.82 |
|
||||
| STS-B | Pearson/Spearman corr. | 89.70/89.37 |
|
||||
| QQP | accuracy/F1 | 90.72/87.41 |
|
||||
| MNLI | matched acc./mismatched acc.| 83.95/84.39 |
|
||||
| QNLI | accuracy | 89.04 |
|
||||
| RTE | accuracy | 61.01 |
|
||||
| WNLI | accuracy | 53.52 |
|
||||
|
||||
Some of these results are significantly different from the ones reported on the test set
|
||||
of GLUE benchmark on the website. For QQP and WNLI, please refer to [FAQ #12](https://gluebenchmark.com/faq) on the webite.
|
||||
|
||||
Before running anyone of these GLUE tasks you should download the
|
||||
[GLUE data](https://gluebenchmark.com/tasks) by running
|
||||
[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
|
||||
and unpack it to some directory `$GLUE_DIR`.
|
||||
|
||||
```shell
|
||||
export GLUE_DIR=/path/to/glue
|
||||
export TASK_NAME=MRPC
|
||||
|
||||
python run_classifier.py \
|
||||
--task_name $TASK_NAME \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--do_lower_case \
|
||||
--data_dir $GLUE_DIR/$TASK_NAME \
|
||||
--bert_model bert-base-uncased \
|
||||
--max_seq_length 128 \
|
||||
--train_batch_size 32 \
|
||||
--learning_rate 2e-5 \
|
||||
--num_train_epochs 3.0 \
|
||||
--output_dir /tmp/$TASK_NAME/
|
||||
```
|
||||
|
||||
where task name can be one of CoLA, SST-2, MRPC, STS-B, QQP, MNLI, QNLI, RTE, WNLI.
|
||||
|
||||
The dev set results will be present within the text file 'eval_results.txt' in the specified output_dir. In case of MNLI, since there are two separate dev sets, matched and mismatched, there will be a separate output folder called '/tmp/MNLI-MM/' in addition to '/tmp/MNLI/'.
|
||||
|
||||
The code has not been tested with half-precision training with apex on any GLUE task apart from MRPC, MNLI, CoLA, SST-2. The following section provides details on how to run half-precision training with MRPC. With that being said, there shouldn't be any issues in running half-precision training with the remaining GLUE tasks as well, since the data processor for each task inherits from the base class DataProcessor.
|
||||
|
||||
#### MRPC
|
||||
|
||||
This example code fine-tunes BERT on the Microsoft Research Paraphrase
|
||||
|
||||
Reference in New Issue
Block a user