Add DeBERTa model (#5929)

* Add DeBERTa model

* Remove dependency of deberta

* Address comments

* Patch DeBERTa
Documentation
Style

* Add final tests

* Style

* Enable tests + nitpicks

* position IDs

* BERT -> DeBERTa

* Quality

* Style

* Tokenization

* Last updates.

* @patrickvonplaten's comments

* Not everything can be a copy

* Apply most of @sgugger's review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Last reviews

* DeBERTa -> Deberta

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
Pengcheng He
2020-09-30 04:07:30 -07:00
committed by GitHub
parent 44a93c981f
commit 7a0cf0ec93
16 changed files with 2350 additions and 3 deletions

View File

@@ -415,4 +415,15 @@ For a list that includes community-uploaded models, refer to `https://huggingfac
| | ``microsoft/layoutlm-large-uncased`` | | 24 layers, 1024-hidden, 16-heads, 343M parameters |
| | | |
| | | (see `details <https://github.com/microsoft/unilm/tree/master/layoutlm>`__) |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| DeBERTa | ``microsoft/deberta-base`` | | 12-layer, 768-hidden, 12-heads, ~125M parameters |
| | | | DeBERTa using the BERT-base architecture |
| | | |
| | | (see `details <https://github.com/microsoft/DeBERTa>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``microsoft/deberta-large`` | | 24-layer, 1024-hidden, 16-heads, ~390M parameters |
| | | | DeBERTa using the BERT-large architecture |
| | | |
| | | (see `details <https://github.com/microsoft/DeBERTa>`__) |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+