Update dev results on GLUE (bert-base-uncased) w/ median on 5 runs

This commit is contained in:
VictorSanh
2019-08-21 03:43:29 +00:00
parent 07681b6b58
commit 6f877d9daf

View File

@@ -68,7 +68,9 @@ GLUE results on dev set
~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
We get the following results on the dev set of GLUE benchmark with an uncased BERT base We get the following results on the dev set of GLUE benchmark with an uncased BERT base
model. All experiments were run on a P100 GPU with a batch size of 32. model (`bert-base-uncased`). All experiments ran on 8 V100 GPUs with a total train batch size of 24. Some of
these tasks have a small dataset and training can lead to high variance in the results between different runs.
We report the median on 5 runs (with different seeds) for each of the metrics.
.. list-table:: .. list-table::
:header-rows: 1 :header-rows: 1
@@ -78,31 +80,31 @@ model. All experiments were run on a P100 GPU with a batch size of 32.
- Result - Result
* - CoLA * - CoLA
- Matthew's corr. - Matthew's corr.
- 57.29 - 55.75
* - SST-2 * - SST-2
- accuracy - accuracy
- 93.00 - 92.09
* - MRPC * - MRPC
- F1/accuracy - F1/accuracy
- 88.85/83.82 - 90.48/86.27
* - STS-B * - STS-B
- Pearson/Spearman corr. - Pearson/Spearman corr.
- 89.70/89.37 - 89.03/88.64
* - QQP * - QQP
- accuracy/F1 - accuracy/F1
- 90.72/87.41 - 90.92/87.72
* - MNLI * - MNLI
- matched acc./mismatched acc. - matched acc./mismatched acc.
- 83.95/84.39 - 83.74/84.06
* - QNLI * - QNLI
- accuracy - accuracy
- 89.04 - 91.07
* - RTE * - RTE
- accuracy - accuracy
- 61.01 - 68.59
* - WNLI * - WNLI
- accuracy - accuracy
- 53.52 - 43.66
Some of these results are significantly different from the ones reported on the test set Some of these results are significantly different from the ones reported on the test set