Integrate DeBERTa v2(the 1.5B model surpassed human performance on Su… (#10018)

* Integrate DeBERTa v2(the 1.5B model surpassed human performance on SuperGLUE); Add DeBERTa v2 900M,1.5B models;

* DeBERTa-v2

* Fix v2 model loading issue (#10129)

* Doc members

* Update src/transformers/models/deberta/modeling_deberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Address Sylvain's comments

* Address Patrick's comments

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
This commit is contained in:
Pengcheng He
2021-02-19 15:34:44 -08:00
committed by GitHub
parent f6e53e3c2b
commit 9a7e63729f
19 changed files with 3012 additions and 73 deletions

View File

@@ -443,15 +443,30 @@ For the full list, refer to `https://huggingface.co/models <https://huggingface.
| | | |
| | | (see `details <https://github.com/microsoft/unilm/tree/master/layoutlm>`__) |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| DeBERTa | ``microsoft/deberta-base`` | | 12-layer, 768-hidden, 12-heads, ~125M parameters |
| DeBERTa | ``microsoft/deberta-base`` | | 12-layer, 768-hidden, 12-heads, ~140M parameters |
| | | | DeBERTa using the BERT-base architecture |
| | | |
| | | (see `details <https://github.com/microsoft/DeBERTa>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``microsoft/deberta-large`` | | 24-layer, 1024-hidden, 16-heads, ~390M parameters |
| | ``microsoft/deberta-large`` | | 24-layer, 1024-hidden, 16-heads, ~400M parameters |
| | | | DeBERTa using the BERT-large architecture |
| | | |
| | | (see `details <https://github.com/microsoft/DeBERTa>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``microsoft/deberta-xlarge`` | | 48-layer, 1024-hidden, 16-heads, ~750M parameters |
| | | | DeBERTa XLarge with similar BERT architecture |
| | | |
| | | (see `details <https://github.com/microsoft/DeBERTa>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``microsoft/deberta-xlarge-v2`` | | 24-layer, 1536-hidden, 24-heads, ~900M parameters |
| | | | DeBERTa XLarge V2 with similar BERT architecture |
| | | |
| | | (see `details <https://github.com/microsoft/DeBERTa>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``microsoft/deberta-xxlarge-v2`` | | 48-layer, 1536-hidden, 24-heads, ~1.5B parameters |
| | | | DeBERTa XXLarge V2 with similar BERT architecture |
| | | |
| | | (see `details <https://github.com/microsoft/DeBERTa>`__) |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| SqueezeBERT | ``squeezebert/squeezebert-uncased`` | | 12-layer, 768-hidden, 12-heads, 51M parameters, 4.3x faster than bert-base-uncased on a smartphone. |
| | | | SqueezeBERT architecture pretrained from scratch on masked language model (MLM) and sentence order prediction (SOP) tasks. |