Update dev results on GLUE (bert-base-uncased) w/ median on 5 runs
This commit is contained in:
@@ -68,7 +68,9 @@ GLUE results on dev set
|
|||||||
~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
We get the following results on the dev set of GLUE benchmark with an uncased BERT base
|
We get the following results on the dev set of GLUE benchmark with an uncased BERT base
|
||||||
model. All experiments were run on a P100 GPU with a batch size of 32.
|
model (`bert-base-uncased`). All experiments ran on 8 V100 GPUs with a total train batch size of 24. Some of
|
||||||
|
these tasks have a small dataset and training can lead to high variance in the results between different runs.
|
||||||
|
We report the median on 5 runs (with different seeds) for each of the metrics.
|
||||||
|
|
||||||
.. list-table::
|
.. list-table::
|
||||||
:header-rows: 1
|
:header-rows: 1
|
||||||
@@ -78,31 +80,31 @@ model. All experiments were run on a P100 GPU with a batch size of 32.
|
|||||||
- Result
|
- Result
|
||||||
* - CoLA
|
* - CoLA
|
||||||
- Matthew's corr.
|
- Matthew's corr.
|
||||||
- 57.29
|
- 55.75
|
||||||
* - SST-2
|
* - SST-2
|
||||||
- accuracy
|
- accuracy
|
||||||
- 93.00
|
- 92.09
|
||||||
* - MRPC
|
* - MRPC
|
||||||
- F1/accuracy
|
- F1/accuracy
|
||||||
- 88.85/83.82
|
- 90.48/86.27
|
||||||
* - STS-B
|
* - STS-B
|
||||||
- Pearson/Spearman corr.
|
- Pearson/Spearman corr.
|
||||||
- 89.70/89.37
|
- 89.03/88.64
|
||||||
* - QQP
|
* - QQP
|
||||||
- accuracy/F1
|
- accuracy/F1
|
||||||
- 90.72/87.41
|
- 90.92/87.72
|
||||||
* - MNLI
|
* - MNLI
|
||||||
- matched acc./mismatched acc.
|
- matched acc./mismatched acc.
|
||||||
- 83.95/84.39
|
- 83.74/84.06
|
||||||
* - QNLI
|
* - QNLI
|
||||||
- accuracy
|
- accuracy
|
||||||
- 89.04
|
- 91.07
|
||||||
* - RTE
|
* - RTE
|
||||||
- accuracy
|
- accuracy
|
||||||
- 61.01
|
- 68.59
|
||||||
* - WNLI
|
* - WNLI
|
||||||
- accuracy
|
- accuracy
|
||||||
- 53.52
|
- 43.66
|
||||||
|
|
||||||
|
|
||||||
Some of these results are significantly different from the ones reported on the test set
|
Some of these results are significantly different from the ones reported on the test set
|
||||||
|
|||||||
Reference in New Issue
Block a user