add task_type_id to BERT to support ERNIE-2.0 and ERNIE-3.0 models (#18686)
* add_ernie * remove Tokenizer in ernie * polish code * format code style * polish code * fix style * update doc * make fix-copies * change model name * change model name * fix dependency * add more copied from * rename ErnieLMHeadModel to ErnieForCausalLM do not expose ErnieLayer update doc * fix * make style * polish code * polish code * fix * fix * fix * fix * fix * final fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
This commit is contained in:
@@ -295,6 +295,7 @@ Current number of checkpoints: ** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
|
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
|
||||||
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
||||||
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||||
|
1. **[ERNIE](https://huggingface.co/docs/transformers/main/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
|
||||||
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
||||||
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
|
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
|
||||||
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
|
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
|
||||||
|
|||||||
@@ -247,6 +247,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
|
|||||||
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
|
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
|
||||||
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
||||||
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||||
|
1. **[ERNIE](https://huggingface.co/docs/transformers/main/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
|
||||||
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
||||||
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
|
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
|
||||||
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
|
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
|
||||||
|
|||||||
@@ -271,6 +271,7 @@ conda install -c huggingface transformers
|
|||||||
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (来自 Intel Labs) 伴随论文 [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) 由 René Ranftl, Alexey Bochkovskiy, Vladlen Koltun 发布。
|
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (来自 Intel Labs) 伴随论文 [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) 由 René Ranftl, Alexey Bochkovskiy, Vladlen Koltun 发布。
|
||||||
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (来自 Google Research/Stanford University) 伴随论文 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 由 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 发布。
|
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (来自 Google Research/Stanford University) 伴随论文 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 由 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 发布。
|
||||||
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (来自 Google Research) 伴随论文 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。
|
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (来自 Google Research) 伴随论文 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。
|
||||||
|
1. **[ERNIE](https://huggingface.co/docs/transformers/main/model_doc/ernie)** (来自 Baidu) 伴随论文 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 发布。
|
||||||
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (来自 CNRS) 伴随论文 [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) 由 Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab 发布。
|
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (来自 CNRS) 伴随论文 [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) 由 Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab 发布。
|
||||||
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (来自 Facebook AI) 伴随论文 [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) 由 Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela 发布。
|
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (来自 Facebook AI) 伴随论文 [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) 由 Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela 发布。
|
||||||
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (来自 Google Research) 伴随论文 [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) 由 James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon 发布。
|
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (来自 Google Research) 伴随论文 [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) 由 James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon 发布。
|
||||||
|
|||||||
@@ -283,6 +283,7 @@ conda install -c huggingface transformers
|
|||||||
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
|
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
|
||||||
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
||||||
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||||
|
1. **[ERNIE](https://huggingface.co/docs/transformers/main/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
|
||||||
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
||||||
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
|
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
|
||||||
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
|
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
|
||||||
|
|||||||
@@ -237,6 +237,8 @@
|
|||||||
title: ELECTRA
|
title: ELECTRA
|
||||||
- local: model_doc/encoder-decoder
|
- local: model_doc/encoder-decoder
|
||||||
title: Encoder Decoder Models
|
title: Encoder Decoder Models
|
||||||
|
- local: model_doc/ernie
|
||||||
|
title: ERNIE
|
||||||
- local: model_doc/flaubert
|
- local: model_doc/flaubert
|
||||||
title: FlauBERT
|
title: FlauBERT
|
||||||
- local: model_doc/fnet
|
- local: model_doc/fnet
|
||||||
|
|||||||
@@ -87,6 +87,7 @@ The documentation is organized into five sections:
|
|||||||
1. **[DPT](master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
|
1. **[DPT](master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
|
||||||
1. **[ELECTRA](model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
1. **[ELECTRA](model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
||||||
1. **[EncoderDecoder](model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
1. **[EncoderDecoder](model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||||
|
1. **[ERNIE](model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
|
||||||
1. **[FlauBERT](model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
1. **[FlauBERT](model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
||||||
1. **[FLAVA](model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
|
1. **[FLAVA](model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
|
||||||
1. **[FNet](model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
|
1. **[FNet](model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
|
||||||
@@ -230,6 +231,7 @@ Flax), PyTorch, and/or TensorFlow.
|
|||||||
| DPT | ❌ | ❌ | ✅ | ❌ | ❌ |
|
| DPT | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||||
| ELECTRA | ✅ | ✅ | ✅ | ✅ | ✅ |
|
| ELECTRA | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||||
| Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ |
|
| Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ |
|
||||||
|
| ERNIE | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||||
| FairSeq Machine-Translation | ✅ | ❌ | ✅ | ❌ | ❌ |
|
| FairSeq Machine-Translation | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||||
| FlauBERT | ✅ | ❌ | ✅ | ✅ | ❌ |
|
| FlauBERT | ✅ | ❌ | ✅ | ✅ | ❌ |
|
||||||
| FLAVA | ❌ | ❌ | ✅ | ❌ | ❌ |
|
| FLAVA | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||||
|
|||||||
102
docs/source/en/model_doc/ernie.mdx
Normal file
102
docs/source/en/model_doc/ernie.mdx
Normal file
@@ -0,0 +1,102 @@
|
|||||||
|
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# ERNIE
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
ERNIE is a series of powerful models proposed by baidu, especially in Chinese tasks,
|
||||||
|
including [ERNIE1.0](https://arxiv.org/abs/1904.09223), [ERNIE2.0](https://ojs.aaai.org/index.php/AAAI/article/view/6428),
|
||||||
|
[ERNIE3.0](https://arxiv.org/abs/2107.02137), [ERNIE-Gram](https://arxiv.org/abs/2010.12148), [ERNIE-health](https://arxiv.org/abs/2110.07244), etc.
|
||||||
|
|
||||||
|
These models are contributed by [nghuyong](https://huggingface.co/nghuyong) and the official code can be found in [PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP) (in PaddlePaddle).
|
||||||
|
|
||||||
|
### How to use
|
||||||
|
Take `ernie-1.0-base-zh` as an example:
|
||||||
|
|
||||||
|
```Python
|
||||||
|
from transformers import AutoTokenizer, AutoModel
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained("nghuyong/ernie-1.0-base-zh")
|
||||||
|
model = AutoModel.from_pretrained("nghuyong/ernie-1.0-base-zh")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Supported Models
|
||||||
|
|
||||||
|
| Model Name | Language | Description |
|
||||||
|
|:-------------------:|:--------:|:-------------------------------:|
|
||||||
|
| ernie-1.0-base-zh | Chinese | Layer:12, Heads:12, Hidden:768 |
|
||||||
|
| ernie-2.0-base-en | English | Layer:12, Heads:12, Hidden:768 |
|
||||||
|
| ernie-2.0-large-en | English | Layer:24, Heads:16, Hidden:1024 |
|
||||||
|
| ernie-3.0-base-zh | Chinese | Layer:12, Heads:12, Hidden:768 |
|
||||||
|
| ernie-3.0-medium-zh | Chinese | Layer:6, Heads:12, Hidden:768 |
|
||||||
|
| ernie-3.0-mini-zh | Chinese | Layer:6, Heads:12, Hidden:384 |
|
||||||
|
| ernie-3.0-micro-zh | Chinese | Layer:4, Heads:12, Hidden:384 |
|
||||||
|
| ernie-3.0-nano-zh | Chinese | Layer:4, Heads:12, Hidden:312 |
|
||||||
|
| ernie-health-zh | Chinese | Layer:12, Heads:12, Hidden:768 |
|
||||||
|
| ernie-gram-zh | Chinese | Layer:12, Heads:12, Hidden:768 |
|
||||||
|
|
||||||
|
You can find all the supported models from huggingface's model hub: [huggingface.co/nghuyong](https://huggingface.co/nghuyong), and model details from paddle's official
|
||||||
|
repo: [PaddleNLP](https://paddlenlp.readthedocs.io/zh/latest/model_zoo/transformers/ERNIE/contents.html)
|
||||||
|
and [ERNIE](https://github.com/PaddlePaddle/ERNIE/blob/repro).
|
||||||
|
|
||||||
|
## ErnieConfig
|
||||||
|
|
||||||
|
[[autodoc]] ErnieConfig
|
||||||
|
- all
|
||||||
|
|
||||||
|
## Ernie specific outputs
|
||||||
|
|
||||||
|
[[autodoc]] models.ernie.modeling_ernie.ErnieForPreTrainingOutput
|
||||||
|
|
||||||
|
## ErnieModel
|
||||||
|
|
||||||
|
[[autodoc]] ErnieModel
|
||||||
|
- forward
|
||||||
|
|
||||||
|
## ErnieForPreTraining
|
||||||
|
|
||||||
|
[[autodoc]] ErnieForPreTraining
|
||||||
|
- forward
|
||||||
|
|
||||||
|
## ErnieForCausalLM
|
||||||
|
|
||||||
|
[[autodoc]] ErnieForCausalLM
|
||||||
|
- forward
|
||||||
|
|
||||||
|
## ErnieForMaskedLM
|
||||||
|
|
||||||
|
[[autodoc]] ErnieForMaskedLM
|
||||||
|
- forward
|
||||||
|
|
||||||
|
## ErnieForNextSentencePrediction
|
||||||
|
|
||||||
|
[[autodoc]] ErnieForNextSentencePrediction
|
||||||
|
- forward
|
||||||
|
|
||||||
|
## ErnieForSequenceClassification
|
||||||
|
|
||||||
|
[[autodoc]] ErnieForSequenceClassification
|
||||||
|
- forward
|
||||||
|
|
||||||
|
## ErnieForMultipleChoice
|
||||||
|
|
||||||
|
[[autodoc]] ErnieForMultipleChoice
|
||||||
|
- forward
|
||||||
|
|
||||||
|
## ErnieForTokenClassification
|
||||||
|
|
||||||
|
[[autodoc]] ErnieForTokenClassification
|
||||||
|
- forward
|
||||||
|
|
||||||
|
## ErnieForQuestionAnswering
|
||||||
|
|
||||||
|
[[autodoc]] ErnieForQuestionAnswering
|
||||||
|
- forward
|
||||||
@@ -67,6 +67,7 @@ Ready-made configurations include the following architectures:
|
|||||||
- DETR
|
- DETR
|
||||||
- DistilBERT
|
- DistilBERT
|
||||||
- ELECTRA
|
- ELECTRA
|
||||||
|
- ERNIE
|
||||||
- FlauBERT
|
- FlauBERT
|
||||||
- GPT Neo
|
- GPT Neo
|
||||||
- GPT-J
|
- GPT-J
|
||||||
|
|||||||
@@ -203,6 +203,10 @@ _import_structure = {
|
|||||||
"models.dpt": ["DPT_PRETRAINED_CONFIG_ARCHIVE_MAP", "DPTConfig"],
|
"models.dpt": ["DPT_PRETRAINED_CONFIG_ARCHIVE_MAP", "DPTConfig"],
|
||||||
"models.electra": ["ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP", "ElectraConfig", "ElectraTokenizer"],
|
"models.electra": ["ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP", "ElectraConfig", "ElectraTokenizer"],
|
||||||
"models.encoder_decoder": ["EncoderDecoderConfig"],
|
"models.encoder_decoder": ["EncoderDecoderConfig"],
|
||||||
|
"models.ernie": [
|
||||||
|
"ERNIE_PRETRAINED_CONFIG_ARCHIVE_MAP",
|
||||||
|
"ErnieConfig",
|
||||||
|
],
|
||||||
"models.flaubert": ["FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "FlaubertConfig", "FlaubertTokenizer"],
|
"models.flaubert": ["FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "FlaubertConfig", "FlaubertTokenizer"],
|
||||||
"models.flava": [
|
"models.flava": [
|
||||||
"FLAVA_PRETRAINED_CONFIG_ARCHIVE_MAP",
|
"FLAVA_PRETRAINED_CONFIG_ARCHIVE_MAP",
|
||||||
@@ -1168,6 +1172,21 @@ else:
|
|||||||
]
|
]
|
||||||
)
|
)
|
||||||
_import_structure["models.encoder_decoder"].append("EncoderDecoderModel")
|
_import_structure["models.encoder_decoder"].append("EncoderDecoderModel")
|
||||||
|
_import_structure["models.ernie"].extend(
|
||||||
|
[
|
||||||
|
"ERNIE_PRETRAINED_MODEL_ARCHIVE_LIST",
|
||||||
|
"ErnieForCausalLM",
|
||||||
|
"ErnieForMaskedLM",
|
||||||
|
"ErnieForMultipleChoice",
|
||||||
|
"ErnieForNextSentencePrediction",
|
||||||
|
"ErnieForPreTraining",
|
||||||
|
"ErnieForQuestionAnswering",
|
||||||
|
"ErnieForSequenceClassification",
|
||||||
|
"ErnieForTokenClassification",
|
||||||
|
"ErnieModel",
|
||||||
|
"ErniePreTrainedModel",
|
||||||
|
]
|
||||||
|
)
|
||||||
_import_structure["models.flaubert"].extend(
|
_import_structure["models.flaubert"].extend(
|
||||||
[
|
[
|
||||||
"FLAUBERT_PRETRAINED_MODEL_ARCHIVE_LIST",
|
"FLAUBERT_PRETRAINED_MODEL_ARCHIVE_LIST",
|
||||||
@@ -3066,6 +3085,7 @@ if TYPE_CHECKING:
|
|||||||
from .models.dpt import DPT_PRETRAINED_CONFIG_ARCHIVE_MAP, DPTConfig
|
from .models.dpt import DPT_PRETRAINED_CONFIG_ARCHIVE_MAP, DPTConfig
|
||||||
from .models.electra import ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP, ElectraConfig, ElectraTokenizer
|
from .models.electra import ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP, ElectraConfig, ElectraTokenizer
|
||||||
from .models.encoder_decoder import EncoderDecoderConfig
|
from .models.encoder_decoder import EncoderDecoderConfig
|
||||||
|
from .models.ernie import ERNIE_PRETRAINED_CONFIG_ARCHIVE_MAP, ErnieConfig
|
||||||
from .models.flaubert import FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, FlaubertConfig, FlaubertTokenizer
|
from .models.flaubert import FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, FlaubertConfig, FlaubertTokenizer
|
||||||
from .models.flava import (
|
from .models.flava import (
|
||||||
FLAVA_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
FLAVA_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
||||||
@@ -3879,6 +3899,19 @@ if TYPE_CHECKING:
|
|||||||
load_tf_weights_in_electra,
|
load_tf_weights_in_electra,
|
||||||
)
|
)
|
||||||
from .models.encoder_decoder import EncoderDecoderModel
|
from .models.encoder_decoder import EncoderDecoderModel
|
||||||
|
from .models.ernie import (
|
||||||
|
ERNIE_PRETRAINED_MODEL_ARCHIVE_LIST,
|
||||||
|
ErnieForCausalLM,
|
||||||
|
ErnieForMaskedLM,
|
||||||
|
ErnieForMultipleChoice,
|
||||||
|
ErnieForNextSentencePrediction,
|
||||||
|
ErnieForPreTraining,
|
||||||
|
ErnieForQuestionAnswering,
|
||||||
|
ErnieForSequenceClassification,
|
||||||
|
ErnieForTokenClassification,
|
||||||
|
ErnieModel,
|
||||||
|
ErniePreTrainedModel,
|
||||||
|
)
|
||||||
from .models.flaubert import (
|
from .models.flaubert import (
|
||||||
FLAUBERT_PRETRAINED_MODEL_ARCHIVE_LIST,
|
FLAUBERT_PRETRAINED_MODEL_ARCHIVE_LIST,
|
||||||
FlaubertForMultipleChoice,
|
FlaubertForMultipleChoice,
|
||||||
|
|||||||
@@ -57,6 +57,7 @@ from . import (
|
|||||||
dpt,
|
dpt,
|
||||||
electra,
|
electra,
|
||||||
encoder_decoder,
|
encoder_decoder,
|
||||||
|
ernie,
|
||||||
flaubert,
|
flaubert,
|
||||||
flava,
|
flava,
|
||||||
fnet,
|
fnet,
|
||||||
|
|||||||
@@ -61,6 +61,7 @@ CONFIG_MAPPING_NAMES = OrderedDict(
|
|||||||
("dpt", "DPTConfig"),
|
("dpt", "DPTConfig"),
|
||||||
("electra", "ElectraConfig"),
|
("electra", "ElectraConfig"),
|
||||||
("encoder-decoder", "EncoderDecoderConfig"),
|
("encoder-decoder", "EncoderDecoderConfig"),
|
||||||
|
("ernie", "ErnieConfig"),
|
||||||
("flaubert", "FlaubertConfig"),
|
("flaubert", "FlaubertConfig"),
|
||||||
("flava", "FlavaConfig"),
|
("flava", "FlavaConfig"),
|
||||||
("fnet", "FNetConfig"),
|
("fnet", "FNetConfig"),
|
||||||
@@ -188,6 +189,7 @@ CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict(
|
|||||||
("dpr", "DPR_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
("dpr", "DPR_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
||||||
("dpt", "DPT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
("dpt", "DPT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
||||||
("electra", "ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
("electra", "ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
||||||
|
("ernie", "ERNIE_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
||||||
("flaubert", "FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
("flaubert", "FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
||||||
("flava", "FLAVA_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
("flava", "FLAVA_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
||||||
("fnet", "FNET_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
("fnet", "FNET_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
||||||
@@ -316,6 +318,7 @@ MODEL_NAMES_MAPPING = OrderedDict(
|
|||||||
("dpt", "DPT"),
|
("dpt", "DPT"),
|
||||||
("electra", "ELECTRA"),
|
("electra", "ELECTRA"),
|
||||||
("encoder-decoder", "Encoder decoder"),
|
("encoder-decoder", "Encoder decoder"),
|
||||||
|
("ernie", "ERNIE"),
|
||||||
("flaubert", "FlauBERT"),
|
("flaubert", "FlauBERT"),
|
||||||
("flava", "FLAVA"),
|
("flava", "FLAVA"),
|
||||||
("fnet", "FNet"),
|
("fnet", "FNet"),
|
||||||
|
|||||||
@@ -60,6 +60,7 @@ MODEL_MAPPING_NAMES = OrderedDict(
|
|||||||
("dpr", "DPRQuestionEncoder"),
|
("dpr", "DPRQuestionEncoder"),
|
||||||
("dpt", "DPTModel"),
|
("dpt", "DPTModel"),
|
||||||
("electra", "ElectraModel"),
|
("electra", "ElectraModel"),
|
||||||
|
("ernie", "ErnieModel"),
|
||||||
("flaubert", "FlaubertModel"),
|
("flaubert", "FlaubertModel"),
|
||||||
("flava", "FlavaModel"),
|
("flava", "FlavaModel"),
|
||||||
("fnet", "FNetModel"),
|
("fnet", "FNetModel"),
|
||||||
@@ -165,6 +166,7 @@ MODEL_FOR_PRETRAINING_MAPPING_NAMES = OrderedDict(
|
|||||||
("deberta-v2", "DebertaV2ForMaskedLM"),
|
("deberta-v2", "DebertaV2ForMaskedLM"),
|
||||||
("distilbert", "DistilBertForMaskedLM"),
|
("distilbert", "DistilBertForMaskedLM"),
|
||||||
("electra", "ElectraForPreTraining"),
|
("electra", "ElectraForPreTraining"),
|
||||||
|
("ernie", "ErnieForPreTraining"),
|
||||||
("flaubert", "FlaubertWithLMHeadModel"),
|
("flaubert", "FlaubertWithLMHeadModel"),
|
||||||
("flava", "FlavaForPreTraining"),
|
("flava", "FlavaForPreTraining"),
|
||||||
("fnet", "FNetForPreTraining"),
|
("fnet", "FNetForPreTraining"),
|
||||||
@@ -223,6 +225,7 @@ MODEL_WITH_LM_HEAD_MAPPING_NAMES = OrderedDict(
|
|||||||
("distilbert", "DistilBertForMaskedLM"),
|
("distilbert", "DistilBertForMaskedLM"),
|
||||||
("electra", "ElectraForMaskedLM"),
|
("electra", "ElectraForMaskedLM"),
|
||||||
("encoder-decoder", "EncoderDecoderModel"),
|
("encoder-decoder", "EncoderDecoderModel"),
|
||||||
|
("ernie", "ErnieForMaskedLM"),
|
||||||
("flaubert", "FlaubertWithLMHeadModel"),
|
("flaubert", "FlaubertWithLMHeadModel"),
|
||||||
("fnet", "FNetForMaskedLM"),
|
("fnet", "FNetForMaskedLM"),
|
||||||
("fsmt", "FSMTForConditionalGeneration"),
|
("fsmt", "FSMTForConditionalGeneration"),
|
||||||
@@ -284,6 +287,7 @@ MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = OrderedDict(
|
|||||||
("ctrl", "CTRLLMHeadModel"),
|
("ctrl", "CTRLLMHeadModel"),
|
||||||
("data2vec-text", "Data2VecTextForCausalLM"),
|
("data2vec-text", "Data2VecTextForCausalLM"),
|
||||||
("electra", "ElectraForCausalLM"),
|
("electra", "ElectraForCausalLM"),
|
||||||
|
("ernie", "ErnieForCausalLM"),
|
||||||
("gpt2", "GPT2LMHeadModel"),
|
("gpt2", "GPT2LMHeadModel"),
|
||||||
("gpt_neo", "GPTNeoForCausalLM"),
|
("gpt_neo", "GPTNeoForCausalLM"),
|
||||||
("gpt_neox", "GPTNeoXForCausalLM"),
|
("gpt_neox", "GPTNeoXForCausalLM"),
|
||||||
@@ -413,6 +417,7 @@ MODEL_FOR_MASKED_LM_MAPPING_NAMES = OrderedDict(
|
|||||||
("deberta-v2", "DebertaV2ForMaskedLM"),
|
("deberta-v2", "DebertaV2ForMaskedLM"),
|
||||||
("distilbert", "DistilBertForMaskedLM"),
|
("distilbert", "DistilBertForMaskedLM"),
|
||||||
("electra", "ElectraForMaskedLM"),
|
("electra", "ElectraForMaskedLM"),
|
||||||
|
("ernie", "ErnieForMaskedLM"),
|
||||||
("flaubert", "FlaubertWithLMHeadModel"),
|
("flaubert", "FlaubertWithLMHeadModel"),
|
||||||
("fnet", "FNetForMaskedLM"),
|
("fnet", "FNetForMaskedLM"),
|
||||||
("funnel", "FunnelForMaskedLM"),
|
("funnel", "FunnelForMaskedLM"),
|
||||||
@@ -502,6 +507,7 @@ MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
|
|||||||
("deberta-v2", "DebertaV2ForSequenceClassification"),
|
("deberta-v2", "DebertaV2ForSequenceClassification"),
|
||||||
("distilbert", "DistilBertForSequenceClassification"),
|
("distilbert", "DistilBertForSequenceClassification"),
|
||||||
("electra", "ElectraForSequenceClassification"),
|
("electra", "ElectraForSequenceClassification"),
|
||||||
|
("ernie", "ErnieForSequenceClassification"),
|
||||||
("flaubert", "FlaubertForSequenceClassification"),
|
("flaubert", "FlaubertForSequenceClassification"),
|
||||||
("fnet", "FNetForSequenceClassification"),
|
("fnet", "FNetForSequenceClassification"),
|
||||||
("funnel", "FunnelForSequenceClassification"),
|
("funnel", "FunnelForSequenceClassification"),
|
||||||
@@ -558,6 +564,7 @@ MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES = OrderedDict(
|
|||||||
("deberta-v2", "DebertaV2ForQuestionAnswering"),
|
("deberta-v2", "DebertaV2ForQuestionAnswering"),
|
||||||
("distilbert", "DistilBertForQuestionAnswering"),
|
("distilbert", "DistilBertForQuestionAnswering"),
|
||||||
("electra", "ElectraForQuestionAnswering"),
|
("electra", "ElectraForQuestionAnswering"),
|
||||||
|
("ernie", "ErnieForQuestionAnswering"),
|
||||||
("flaubert", "FlaubertForQuestionAnsweringSimple"),
|
("flaubert", "FlaubertForQuestionAnsweringSimple"),
|
||||||
("fnet", "FNetForQuestionAnswering"),
|
("fnet", "FNetForQuestionAnswering"),
|
||||||
("funnel", "FunnelForQuestionAnswering"),
|
("funnel", "FunnelForQuestionAnswering"),
|
||||||
@@ -627,6 +634,7 @@ MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
|
|||||||
("deberta-v2", "DebertaV2ForTokenClassification"),
|
("deberta-v2", "DebertaV2ForTokenClassification"),
|
||||||
("distilbert", "DistilBertForTokenClassification"),
|
("distilbert", "DistilBertForTokenClassification"),
|
||||||
("electra", "ElectraForTokenClassification"),
|
("electra", "ElectraForTokenClassification"),
|
||||||
|
("ernie", "ErnieForTokenClassification"),
|
||||||
("flaubert", "FlaubertForTokenClassification"),
|
("flaubert", "FlaubertForTokenClassification"),
|
||||||
("fnet", "FNetForTokenClassification"),
|
("fnet", "FNetForTokenClassification"),
|
||||||
("funnel", "FunnelForTokenClassification"),
|
("funnel", "FunnelForTokenClassification"),
|
||||||
@@ -668,6 +676,7 @@ MODEL_FOR_MULTIPLE_CHOICE_MAPPING_NAMES = OrderedDict(
|
|||||||
("deberta-v2", "DebertaV2ForMultipleChoice"),
|
("deberta-v2", "DebertaV2ForMultipleChoice"),
|
||||||
("distilbert", "DistilBertForMultipleChoice"),
|
("distilbert", "DistilBertForMultipleChoice"),
|
||||||
("electra", "ElectraForMultipleChoice"),
|
("electra", "ElectraForMultipleChoice"),
|
||||||
|
("ernie", "ErnieForMultipleChoice"),
|
||||||
("flaubert", "FlaubertForMultipleChoice"),
|
("flaubert", "FlaubertForMultipleChoice"),
|
||||||
("fnet", "FNetForMultipleChoice"),
|
("fnet", "FNetForMultipleChoice"),
|
||||||
("funnel", "FunnelForMultipleChoice"),
|
("funnel", "FunnelForMultipleChoice"),
|
||||||
@@ -695,6 +704,7 @@ MODEL_FOR_MULTIPLE_CHOICE_MAPPING_NAMES = OrderedDict(
|
|||||||
MODEL_FOR_NEXT_SENTENCE_PREDICTION_MAPPING_NAMES = OrderedDict(
|
MODEL_FOR_NEXT_SENTENCE_PREDICTION_MAPPING_NAMES = OrderedDict(
|
||||||
[
|
[
|
||||||
("bert", "BertForNextSentencePrediction"),
|
("bert", "BertForNextSentencePrediction"),
|
||||||
|
("ernie", "ErnieForNextSentencePrediction"),
|
||||||
("fnet", "FNetForNextSentencePrediction"),
|
("fnet", "FNetForNextSentencePrediction"),
|
||||||
("megatron-bert", "MegatronBertForNextSentencePrediction"),
|
("megatron-bert", "MegatronBertForNextSentencePrediction"),
|
||||||
("mobilebert", "MobileBertForNextSentencePrediction"),
|
("mobilebert", "MobileBertForNextSentencePrediction"),
|
||||||
|
|||||||
@@ -121,6 +121,7 @@ else:
|
|||||||
),
|
),
|
||||||
),
|
),
|
||||||
("electra", ("ElectraTokenizer", "ElectraTokenizerFast" if is_tokenizers_available() else None)),
|
("electra", ("ElectraTokenizer", "ElectraTokenizerFast" if is_tokenizers_available() else None)),
|
||||||
|
("ernie", ("BertTokenizer", "BertTokenizerFast" if is_tokenizers_available() else None)),
|
||||||
("flaubert", ("FlaubertTokenizer", None)),
|
("flaubert", ("FlaubertTokenizer", None)),
|
||||||
("fnet", ("FNetTokenizer", "FNetTokenizerFast" if is_tokenizers_available() else None)),
|
("fnet", ("FNetTokenizer", "FNetTokenizerFast" if is_tokenizers_available() else None)),
|
||||||
("fsmt", ("FSMTTokenizer", None)),
|
("fsmt", ("FSMTTokenizer", None)),
|
||||||
|
|||||||
74
src/transformers/models/ernie/__init__.py
Normal file
74
src/transformers/models/ernie/__init__.py
Normal file
@@ -0,0 +1,74 @@
|
|||||||
|
# flake8: noqa
|
||||||
|
# There's no way to ignore "F401 '...' imported but unused" warnings in this
|
||||||
|
# module, but to preserve other warnings. So, don't check this module at all.
|
||||||
|
|
||||||
|
# Copyright 2022 The HuggingFace Team. All rights reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
from typing import TYPE_CHECKING
|
||||||
|
|
||||||
|
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_tensorflow_text_available, is_torch_available
|
||||||
|
|
||||||
|
|
||||||
|
_import_structure = {
|
||||||
|
"configuration_ernie": ["ERNIE_PRETRAINED_CONFIG_ARCHIVE_MAP", "ErnieConfig", "ErnieOnnxConfig"],
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
if not is_torch_available():
|
||||||
|
raise OptionalDependencyNotAvailable()
|
||||||
|
except OptionalDependencyNotAvailable:
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
_import_structure["modeling_ernie"] = [
|
||||||
|
"ERNIE_PRETRAINED_MODEL_ARCHIVE_LIST",
|
||||||
|
"ErnieForCausalLM",
|
||||||
|
"ErnieForMaskedLM",
|
||||||
|
"ErnieForMultipleChoice",
|
||||||
|
"ErnieForNextSentencePrediction",
|
||||||
|
"ErnieForPreTraining",
|
||||||
|
"ErnieForQuestionAnswering",
|
||||||
|
"ErnieForSequenceClassification",
|
||||||
|
"ErnieForTokenClassification",
|
||||||
|
"ErnieModel",
|
||||||
|
"ErniePreTrainedModel",
|
||||||
|
]
|
||||||
|
|
||||||
|
if TYPE_CHECKING:
|
||||||
|
from .configuration_ernie import ERNIE_PRETRAINED_CONFIG_ARCHIVE_MAP, ErnieConfig, ErnieOnnxConfig
|
||||||
|
|
||||||
|
try:
|
||||||
|
if not is_torch_available():
|
||||||
|
raise OptionalDependencyNotAvailable()
|
||||||
|
except OptionalDependencyNotAvailable:
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
from .modeling_ernie import (
|
||||||
|
ERNIE_PRETRAINED_MODEL_ARCHIVE_LIST,
|
||||||
|
ErnieForCausalLM,
|
||||||
|
ErnieForMaskedLM,
|
||||||
|
ErnieForMultipleChoice,
|
||||||
|
ErnieForNextSentencePrediction,
|
||||||
|
ErnieForPreTraining,
|
||||||
|
ErnieForQuestionAnswering,
|
||||||
|
ErnieForSequenceClassification,
|
||||||
|
ErnieForTokenClassification,
|
||||||
|
ErnieModel,
|
||||||
|
ErniePreTrainedModel,
|
||||||
|
)
|
||||||
|
|
||||||
|
else:
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
|
||||||
169
src/transformers/models/ernie/configuration_ernie.py
Normal file
169
src/transformers/models/ernie/configuration_ernie.py
Normal file
@@ -0,0 +1,169 @@
|
|||||||
|
# coding=utf-8
|
||||||
|
# Copyright 2022 The Google AI Language Team Authors and The HuggingFace Inc. team.
|
||||||
|
# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
""" ERNIE model configuration"""
|
||||||
|
from collections import OrderedDict
|
||||||
|
from typing import Mapping
|
||||||
|
|
||||||
|
from ...configuration_utils import PretrainedConfig
|
||||||
|
from ...onnx import OnnxConfig
|
||||||
|
from ...utils import logging
|
||||||
|
|
||||||
|
|
||||||
|
logger = logging.get_logger(__name__)
|
||||||
|
|
||||||
|
ERNIE_PRETRAINED_CONFIG_ARCHIVE_MAP = {
|
||||||
|
"nghuyong/ernie-1.0-base-zh": "https://huggingface.co/nghuyong/ernie-1.0-base-zh/resolve/main/config.json",
|
||||||
|
"nghuyong/ernie-2.0-base-en": "https://huggingface.co/nghuyong/ernie-2.0-base-en/resolve/main/config.json",
|
||||||
|
"nghuyong/ernie-2.0-large-en": "https://huggingface.co/nghuyong/ernie-2.0-large-en/resolve/main/config.json",
|
||||||
|
"nghuyong/ernie-3.0-base-zh": "https://huggingface.co/nghuyong/ernie-3.0-base-zh/resolve/main/config.json",
|
||||||
|
"nghuyong/ernie-3.0-medium-zh": "https://huggingface.co/nghuyong/ernie-3.0-medium-zh/resolve/main/config.json",
|
||||||
|
"nghuyong/ernie-3.0-mini-zh": "https://huggingface.co/nghuyong/ernie-3.0-mini-zh/resolve/main/config.json",
|
||||||
|
"nghuyong/ernie-3.0-micro-zh": "https://huggingface.co/nghuyong/ernie-3.0-micro-zh/resolve/main/config.json",
|
||||||
|
"nghuyong/ernie-3.0-nano-zh": "https://huggingface.co/nghuyong/ernie-3.0-nano-zh/resolve/main/config.json",
|
||||||
|
"nghuyong/ernie-gram-zh": "https://huggingface.co/nghuyong/ernie-gram-zh/resolve/main/config.json",
|
||||||
|
"nghuyong/ernie-health-zh": "https://huggingface.co/nghuyong/ernie-health-zh/resolve/main/config.json",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class ErnieConfig(PretrainedConfig):
|
||||||
|
r"""
|
||||||
|
This is the configuration class to store the configuration of a [`ErnieModel`] or a [`TFErnieModel`]. It is used to
|
||||||
|
instantiate a ERNIE model according to the specified arguments, defining the model architecture. Instantiating a
|
||||||
|
configuration with the defaults will yield a similar configuration to that of the ERNIE
|
||||||
|
[nghuyong/ernie-3.0-base-zh](https://huggingface.co/nghuyong/ernie-3.0-base-zh) architecture.
|
||||||
|
|
||||||
|
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
|
||||||
|
documentation from [`PretrainedConfig`] for more information.
|
||||||
|
|
||||||
|
|
||||||
|
Args:
|
||||||
|
vocab_size (`int`, *optional*, defaults to 30522):
|
||||||
|
Vocabulary size of the ERNIE model. Defines the number of different tokens that can be represented by the
|
||||||
|
`inputs_ids` passed when calling [`ErnieModel`] or [`TFErnieModel`].
|
||||||
|
hidden_size (`int`, *optional*, defaults to 768):
|
||||||
|
Dimensionality of the encoder layers and the pooler layer.
|
||||||
|
num_hidden_layers (`int`, *optional*, defaults to 12):
|
||||||
|
Number of hidden layers in the Transformer encoder.
|
||||||
|
num_attention_heads (`int`, *optional*, defaults to 12):
|
||||||
|
Number of attention heads for each attention layer in the Transformer encoder.
|
||||||
|
intermediate_size (`int`, *optional*, defaults to 3072):
|
||||||
|
Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
|
||||||
|
hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
|
||||||
|
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
|
||||||
|
`"relu"`, `"silu"` and `"gelu_new"` are supported.
|
||||||
|
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
|
||||||
|
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
|
||||||
|
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
|
||||||
|
The dropout ratio for the attention probabilities.
|
||||||
|
max_position_embeddings (`int`, *optional*, defaults to 512):
|
||||||
|
The maximum sequence length that this model might ever be used with. Typically set this to something large
|
||||||
|
just in case (e.g., 512 or 1024 or 2048).
|
||||||
|
type_vocab_size (`int`, *optional*, defaults to 2):
|
||||||
|
The vocabulary size of the `token_type_ids` passed when calling [`ErnieModel`] or [`TFErnieModel`].
|
||||||
|
task_type_vocab_size (`int`, *optional*, defaults to 3):
|
||||||
|
The vocabulary size of the `task_type_ids` for ERNIE2.0/ERNIE3.0 model
|
||||||
|
use_task_id (`bool`, *optional*, defaults to `False`):
|
||||||
|
Whether or not the model support `task_type_ids`
|
||||||
|
initializer_range (`float`, *optional*, defaults to 0.02):
|
||||||
|
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
|
||||||
|
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
|
||||||
|
The epsilon used by the layer normalization layers.
|
||||||
|
position_embedding_type (`str`, *optional*, defaults to `"absolute"`):
|
||||||
|
Type of position embedding. Choose one of `"absolute"`, `"relative_key"`, `"relative_key_query"`. For
|
||||||
|
positional embeddings use `"absolute"`. For more information on `"relative_key"`, please refer to
|
||||||
|
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
|
||||||
|
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
|
||||||
|
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
|
||||||
|
use_cache (`bool`, *optional*, defaults to `True`):
|
||||||
|
Whether or not the model should return the last key/values attentions (not used by all models). Only
|
||||||
|
relevant if `config.is_decoder=True`.
|
||||||
|
classifier_dropout (`float`, *optional*):
|
||||||
|
The dropout ratio for the classification head.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> from transformers import ErnieModel, ErnieConfig
|
||||||
|
|
||||||
|
>>> # Initializing a ERNIE nghuyong/ernie-3.0-base-zh style configuration
|
||||||
|
>>> configuration = ErnieConfig()
|
||||||
|
|
||||||
|
>>> # Initializing a model from the nghuyong/ernie-3.0-base-zh style configuration
|
||||||
|
>>> model = ErnieModel(configuration)
|
||||||
|
|
||||||
|
>>> # Accessing the model configuration
|
||||||
|
>>> configuration = model.config
|
||||||
|
```"""
|
||||||
|
model_type = "ernie"
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
vocab_size=30522,
|
||||||
|
hidden_size=768,
|
||||||
|
num_hidden_layers=12,
|
||||||
|
num_attention_heads=12,
|
||||||
|
intermediate_size=3072,
|
||||||
|
hidden_act="gelu",
|
||||||
|
hidden_dropout_prob=0.1,
|
||||||
|
attention_probs_dropout_prob=0.1,
|
||||||
|
max_position_embeddings=512,
|
||||||
|
type_vocab_size=2,
|
||||||
|
task_type_vocab_size=3,
|
||||||
|
use_task_id=False,
|
||||||
|
initializer_range=0.02,
|
||||||
|
layer_norm_eps=1e-12,
|
||||||
|
pad_token_id=0,
|
||||||
|
position_embedding_type="absolute",
|
||||||
|
use_cache=True,
|
||||||
|
classifier_dropout=None,
|
||||||
|
**kwargs
|
||||||
|
):
|
||||||
|
super().__init__(pad_token_id=pad_token_id, **kwargs)
|
||||||
|
|
||||||
|
self.vocab_size = vocab_size
|
||||||
|
self.hidden_size = hidden_size
|
||||||
|
self.num_hidden_layers = num_hidden_layers
|
||||||
|
self.num_attention_heads = num_attention_heads
|
||||||
|
self.hidden_act = hidden_act
|
||||||
|
self.intermediate_size = intermediate_size
|
||||||
|
self.hidden_dropout_prob = hidden_dropout_prob
|
||||||
|
self.attention_probs_dropout_prob = attention_probs_dropout_prob
|
||||||
|
self.max_position_embeddings = max_position_embeddings
|
||||||
|
self.type_vocab_size = type_vocab_size
|
||||||
|
self.task_type_vocab_size = task_type_vocab_size
|
||||||
|
self.use_task_id = use_task_id
|
||||||
|
self.initializer_range = initializer_range
|
||||||
|
self.layer_norm_eps = layer_norm_eps
|
||||||
|
self.position_embedding_type = position_embedding_type
|
||||||
|
self.use_cache = use_cache
|
||||||
|
self.classifier_dropout = classifier_dropout
|
||||||
|
|
||||||
|
|
||||||
|
class ErnieOnnxConfig(OnnxConfig):
|
||||||
|
@property
|
||||||
|
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||||
|
if self.task == "multiple-choice":
|
||||||
|
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||||
|
else:
|
||||||
|
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||||
|
return OrderedDict(
|
||||||
|
[
|
||||||
|
("input_ids", dynamic_axis),
|
||||||
|
("attention_mask", dynamic_axis),
|
||||||
|
("token_type_ids", dynamic_axis),
|
||||||
|
("task_type_ids", dynamic_axis),
|
||||||
|
]
|
||||||
|
)
|
||||||
1830
src/transformers/models/ernie/modeling_ernie.py
Normal file
1830
src/transformers/models/ernie/modeling_ernie.py
Normal file
File diff suppressed because it is too large
Load Diff
@@ -1875,6 +1875,79 @@ class EncoderDecoderModel(metaclass=DummyObject):
|
|||||||
requires_backends(self, ["torch"])
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
|
||||||
|
ERNIE_PRETRAINED_MODEL_ARCHIVE_LIST = None
|
||||||
|
|
||||||
|
|
||||||
|
class ErnieForCausalLM(metaclass=DummyObject):
|
||||||
|
_backends = ["torch"]
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
|
||||||
|
class ErnieForMaskedLM(metaclass=DummyObject):
|
||||||
|
_backends = ["torch"]
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
|
||||||
|
class ErnieForMultipleChoice(metaclass=DummyObject):
|
||||||
|
_backends = ["torch"]
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
|
||||||
|
class ErnieForNextSentencePrediction(metaclass=DummyObject):
|
||||||
|
_backends = ["torch"]
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
|
||||||
|
class ErnieForPreTraining(metaclass=DummyObject):
|
||||||
|
_backends = ["torch"]
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
|
||||||
|
class ErnieForQuestionAnswering(metaclass=DummyObject):
|
||||||
|
_backends = ["torch"]
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
|
||||||
|
class ErnieForSequenceClassification(metaclass=DummyObject):
|
||||||
|
_backends = ["torch"]
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
|
||||||
|
class ErnieForTokenClassification(metaclass=DummyObject):
|
||||||
|
_backends = ["torch"]
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
|
||||||
|
class ErnieModel(metaclass=DummyObject):
|
||||||
|
_backends = ["torch"]
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
|
||||||
|
class ErniePreTrainedModel(metaclass=DummyObject):
|
||||||
|
_backends = ["torch"]
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
|
||||||
FLAUBERT_PRETRAINED_MODEL_ARCHIVE_LIST = None
|
FLAUBERT_PRETRAINED_MODEL_ARCHIVE_LIST = None
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
0
tests/models/ernie/__init__.py
Normal file
0
tests/models/ernie/__init__.py
Normal file
577
tests/models/ernie/test_modeling_ernie.py
Normal file
577
tests/models/ernie/test_modeling_ernie.py
Normal file
@@ -0,0 +1,577 @@
|
|||||||
|
# coding=utf-8
|
||||||
|
# Copyright 2022 The HuggingFace Team. All rights reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
import os
|
||||||
|
import tempfile
|
||||||
|
import unittest
|
||||||
|
|
||||||
|
from transformers import ErnieConfig, is_torch_available
|
||||||
|
from transformers.models.auto import get_values
|
||||||
|
from transformers.testing_utils import require_torch, require_torch_gpu, slow, torch_device
|
||||||
|
|
||||||
|
from ...generation.test_generation_utils import GenerationTesterMixin
|
||||||
|
from ...test_configuration_common import ConfigTester
|
||||||
|
from ...test_modeling_common import ModelTesterMixin, floats_tensor, ids_tensor, random_attention_mask
|
||||||
|
|
||||||
|
|
||||||
|
if is_torch_available():
|
||||||
|
import torch
|
||||||
|
|
||||||
|
from transformers import (
|
||||||
|
MODEL_FOR_PRETRAINING_MAPPING,
|
||||||
|
ErnieForCausalLM,
|
||||||
|
ErnieForMaskedLM,
|
||||||
|
ErnieForMultipleChoice,
|
||||||
|
ErnieForNextSentencePrediction,
|
||||||
|
ErnieForPreTraining,
|
||||||
|
ErnieForQuestionAnswering,
|
||||||
|
ErnieForSequenceClassification,
|
||||||
|
ErnieForTokenClassification,
|
||||||
|
ErnieModel,
|
||||||
|
)
|
||||||
|
from transformers.models.ernie.modeling_ernie import ERNIE_PRETRAINED_MODEL_ARCHIVE_LIST
|
||||||
|
|
||||||
|
|
||||||
|
class ErnieModelTester:
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
parent,
|
||||||
|
batch_size=13,
|
||||||
|
seq_length=7,
|
||||||
|
is_training=True,
|
||||||
|
use_input_mask=True,
|
||||||
|
use_token_type_ids=True,
|
||||||
|
use_labels=True,
|
||||||
|
vocab_size=99,
|
||||||
|
hidden_size=32,
|
||||||
|
num_hidden_layers=5,
|
||||||
|
num_attention_heads=4,
|
||||||
|
intermediate_size=37,
|
||||||
|
hidden_act="gelu",
|
||||||
|
hidden_dropout_prob=0.1,
|
||||||
|
attention_probs_dropout_prob=0.1,
|
||||||
|
max_position_embeddings=512,
|
||||||
|
type_vocab_size=16,
|
||||||
|
type_sequence_label_size=2,
|
||||||
|
initializer_range=0.02,
|
||||||
|
num_labels=3,
|
||||||
|
num_choices=4,
|
||||||
|
scope=None,
|
||||||
|
):
|
||||||
|
self.parent = parent
|
||||||
|
self.batch_size = batch_size
|
||||||
|
self.seq_length = seq_length
|
||||||
|
self.is_training = is_training
|
||||||
|
self.use_input_mask = use_input_mask
|
||||||
|
self.use_token_type_ids = use_token_type_ids
|
||||||
|
self.use_labels = use_labels
|
||||||
|
self.vocab_size = vocab_size
|
||||||
|
self.hidden_size = hidden_size
|
||||||
|
self.num_hidden_layers = num_hidden_layers
|
||||||
|
self.num_attention_heads = num_attention_heads
|
||||||
|
self.intermediate_size = intermediate_size
|
||||||
|
self.hidden_act = hidden_act
|
||||||
|
self.hidden_dropout_prob = hidden_dropout_prob
|
||||||
|
self.attention_probs_dropout_prob = attention_probs_dropout_prob
|
||||||
|
self.max_position_embeddings = max_position_embeddings
|
||||||
|
self.type_vocab_size = type_vocab_size
|
||||||
|
self.type_sequence_label_size = type_sequence_label_size
|
||||||
|
self.initializer_range = initializer_range
|
||||||
|
self.num_labels = num_labels
|
||||||
|
self.num_choices = num_choices
|
||||||
|
self.scope = scope
|
||||||
|
|
||||||
|
def prepare_config_and_inputs(self):
|
||||||
|
input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size)
|
||||||
|
|
||||||
|
input_mask = None
|
||||||
|
if self.use_input_mask:
|
||||||
|
input_mask = random_attention_mask([self.batch_size, self.seq_length])
|
||||||
|
|
||||||
|
token_type_ids = None
|
||||||
|
if self.use_token_type_ids:
|
||||||
|
token_type_ids = ids_tensor([self.batch_size, self.seq_length], self.type_vocab_size)
|
||||||
|
|
||||||
|
sequence_labels = None
|
||||||
|
token_labels = None
|
||||||
|
choice_labels = None
|
||||||
|
if self.use_labels:
|
||||||
|
sequence_labels = ids_tensor([self.batch_size], self.type_sequence_label_size)
|
||||||
|
token_labels = ids_tensor([self.batch_size, self.seq_length], self.num_labels)
|
||||||
|
choice_labels = ids_tensor([self.batch_size], self.num_choices)
|
||||||
|
|
||||||
|
config = self.get_config()
|
||||||
|
|
||||||
|
return config, input_ids, token_type_ids, input_mask, sequence_labels, token_labels, choice_labels
|
||||||
|
|
||||||
|
def get_config(self):
|
||||||
|
"""
|
||||||
|
Returns a tiny configuration by default.
|
||||||
|
"""
|
||||||
|
return ErnieConfig(
|
||||||
|
vocab_size=self.vocab_size,
|
||||||
|
hidden_size=self.hidden_size,
|
||||||
|
num_hidden_layers=self.num_hidden_layers,
|
||||||
|
num_attention_heads=self.num_attention_heads,
|
||||||
|
intermediate_size=self.intermediate_size,
|
||||||
|
hidden_act=self.hidden_act,
|
||||||
|
hidden_dropout_prob=self.hidden_dropout_prob,
|
||||||
|
attention_probs_dropout_prob=self.attention_probs_dropout_prob,
|
||||||
|
max_position_embeddings=self.max_position_embeddings,
|
||||||
|
type_vocab_size=self.type_vocab_size,
|
||||||
|
is_decoder=False,
|
||||||
|
initializer_range=self.initializer_range,
|
||||||
|
)
|
||||||
|
|
||||||
|
def prepare_config_and_inputs_for_decoder(self):
|
||||||
|
(
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
) = self.prepare_config_and_inputs()
|
||||||
|
|
||||||
|
config.is_decoder = True
|
||||||
|
encoder_hidden_states = floats_tensor([self.batch_size, self.seq_length, self.hidden_size])
|
||||||
|
encoder_attention_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2)
|
||||||
|
|
||||||
|
return (
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
encoder_hidden_states,
|
||||||
|
encoder_attention_mask,
|
||||||
|
)
|
||||||
|
|
||||||
|
def create_and_check_model(
|
||||||
|
self, config, input_ids, token_type_ids, input_mask, sequence_labels, token_labels, choice_labels
|
||||||
|
):
|
||||||
|
model = ErnieModel(config=config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
result = model(input_ids, attention_mask=input_mask, token_type_ids=token_type_ids)
|
||||||
|
result = model(input_ids, token_type_ids=token_type_ids)
|
||||||
|
result = model(input_ids)
|
||||||
|
self.parent.assertEqual(result.last_hidden_state.shape, (self.batch_size, self.seq_length, self.hidden_size))
|
||||||
|
self.parent.assertEqual(result.pooler_output.shape, (self.batch_size, self.hidden_size))
|
||||||
|
|
||||||
|
def create_and_check_model_as_decoder(
|
||||||
|
self,
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
encoder_hidden_states,
|
||||||
|
encoder_attention_mask,
|
||||||
|
):
|
||||||
|
config.add_cross_attention = True
|
||||||
|
model = ErnieModel(config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
result = model(
|
||||||
|
input_ids,
|
||||||
|
attention_mask=input_mask,
|
||||||
|
token_type_ids=token_type_ids,
|
||||||
|
encoder_hidden_states=encoder_hidden_states,
|
||||||
|
encoder_attention_mask=encoder_attention_mask,
|
||||||
|
)
|
||||||
|
result = model(
|
||||||
|
input_ids,
|
||||||
|
attention_mask=input_mask,
|
||||||
|
token_type_ids=token_type_ids,
|
||||||
|
encoder_hidden_states=encoder_hidden_states,
|
||||||
|
)
|
||||||
|
result = model(input_ids, attention_mask=input_mask, token_type_ids=token_type_ids)
|
||||||
|
self.parent.assertEqual(result.last_hidden_state.shape, (self.batch_size, self.seq_length, self.hidden_size))
|
||||||
|
self.parent.assertEqual(result.pooler_output.shape, (self.batch_size, self.hidden_size))
|
||||||
|
|
||||||
|
def create_and_check_for_causal_lm(
|
||||||
|
self,
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
encoder_hidden_states,
|
||||||
|
encoder_attention_mask,
|
||||||
|
):
|
||||||
|
model = ErnieForCausalLM(config=config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
result = model(input_ids, attention_mask=input_mask, token_type_ids=token_type_ids, labels=token_labels)
|
||||||
|
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.seq_length, self.vocab_size))
|
||||||
|
|
||||||
|
def create_and_check_for_masked_lm(
|
||||||
|
self, config, input_ids, token_type_ids, input_mask, sequence_labels, token_labels, choice_labels
|
||||||
|
):
|
||||||
|
model = ErnieForMaskedLM(config=config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
result = model(input_ids, attention_mask=input_mask, token_type_ids=token_type_ids, labels=token_labels)
|
||||||
|
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.seq_length, self.vocab_size))
|
||||||
|
|
||||||
|
def create_and_check_model_for_causal_lm_as_decoder(
|
||||||
|
self,
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
encoder_hidden_states,
|
||||||
|
encoder_attention_mask,
|
||||||
|
):
|
||||||
|
config.add_cross_attention = True
|
||||||
|
model = ErnieForCausalLM(config=config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
result = model(
|
||||||
|
input_ids,
|
||||||
|
attention_mask=input_mask,
|
||||||
|
token_type_ids=token_type_ids,
|
||||||
|
labels=token_labels,
|
||||||
|
encoder_hidden_states=encoder_hidden_states,
|
||||||
|
encoder_attention_mask=encoder_attention_mask,
|
||||||
|
)
|
||||||
|
result = model(
|
||||||
|
input_ids,
|
||||||
|
attention_mask=input_mask,
|
||||||
|
token_type_ids=token_type_ids,
|
||||||
|
labels=token_labels,
|
||||||
|
encoder_hidden_states=encoder_hidden_states,
|
||||||
|
)
|
||||||
|
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.seq_length, self.vocab_size))
|
||||||
|
|
||||||
|
def create_and_check_decoder_model_past_large_inputs(
|
||||||
|
self,
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
encoder_hidden_states,
|
||||||
|
encoder_attention_mask,
|
||||||
|
):
|
||||||
|
config.is_decoder = True
|
||||||
|
config.add_cross_attention = True
|
||||||
|
model = ErnieForCausalLM(config=config).to(torch_device).eval()
|
||||||
|
|
||||||
|
# first forward pass
|
||||||
|
outputs = model(
|
||||||
|
input_ids,
|
||||||
|
attention_mask=input_mask,
|
||||||
|
encoder_hidden_states=encoder_hidden_states,
|
||||||
|
encoder_attention_mask=encoder_attention_mask,
|
||||||
|
use_cache=True,
|
||||||
|
)
|
||||||
|
past_key_values = outputs.past_key_values
|
||||||
|
|
||||||
|
# create hypothetical multiple next token and extent to next_input_ids
|
||||||
|
next_tokens = ids_tensor((self.batch_size, 3), config.vocab_size)
|
||||||
|
next_mask = ids_tensor((self.batch_size, 3), vocab_size=2)
|
||||||
|
|
||||||
|
# append to next input_ids and
|
||||||
|
next_input_ids = torch.cat([input_ids, next_tokens], dim=-1)
|
||||||
|
next_attention_mask = torch.cat([input_mask, next_mask], dim=-1)
|
||||||
|
|
||||||
|
output_from_no_past = model(
|
||||||
|
next_input_ids,
|
||||||
|
attention_mask=next_attention_mask,
|
||||||
|
encoder_hidden_states=encoder_hidden_states,
|
||||||
|
encoder_attention_mask=encoder_attention_mask,
|
||||||
|
output_hidden_states=True,
|
||||||
|
)["hidden_states"][0]
|
||||||
|
output_from_past = model(
|
||||||
|
next_tokens,
|
||||||
|
attention_mask=next_attention_mask,
|
||||||
|
encoder_hidden_states=encoder_hidden_states,
|
||||||
|
encoder_attention_mask=encoder_attention_mask,
|
||||||
|
past_key_values=past_key_values,
|
||||||
|
output_hidden_states=True,
|
||||||
|
)["hidden_states"][0]
|
||||||
|
|
||||||
|
# select random slice
|
||||||
|
random_slice_idx = ids_tensor((1,), output_from_past.shape[-1]).item()
|
||||||
|
output_from_no_past_slice = output_from_no_past[:, -3:, random_slice_idx].detach()
|
||||||
|
output_from_past_slice = output_from_past[:, :, random_slice_idx].detach()
|
||||||
|
|
||||||
|
self.parent.assertTrue(output_from_past_slice.shape[1] == next_tokens.shape[1])
|
||||||
|
|
||||||
|
# test that outputs are equal for slice
|
||||||
|
self.parent.assertTrue(torch.allclose(output_from_past_slice, output_from_no_past_slice, atol=1e-3))
|
||||||
|
|
||||||
|
def create_and_check_for_next_sequence_prediction(
|
||||||
|
self, config, input_ids, token_type_ids, input_mask, sequence_labels, token_labels, choice_labels
|
||||||
|
):
|
||||||
|
model = ErnieForNextSentencePrediction(config=config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
result = model(
|
||||||
|
input_ids,
|
||||||
|
attention_mask=input_mask,
|
||||||
|
token_type_ids=token_type_ids,
|
||||||
|
labels=sequence_labels,
|
||||||
|
)
|
||||||
|
self.parent.assertEqual(result.logits.shape, (self.batch_size, 2))
|
||||||
|
|
||||||
|
def create_and_check_for_pretraining(
|
||||||
|
self, config, input_ids, token_type_ids, input_mask, sequence_labels, token_labels, choice_labels
|
||||||
|
):
|
||||||
|
model = ErnieForPreTraining(config=config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
result = model(
|
||||||
|
input_ids,
|
||||||
|
attention_mask=input_mask,
|
||||||
|
token_type_ids=token_type_ids,
|
||||||
|
labels=token_labels,
|
||||||
|
next_sentence_label=sequence_labels,
|
||||||
|
)
|
||||||
|
self.parent.assertEqual(result.prediction_logits.shape, (self.batch_size, self.seq_length, self.vocab_size))
|
||||||
|
self.parent.assertEqual(result.seq_relationship_logits.shape, (self.batch_size, 2))
|
||||||
|
|
||||||
|
def create_and_check_for_question_answering(
|
||||||
|
self, config, input_ids, token_type_ids, input_mask, sequence_labels, token_labels, choice_labels
|
||||||
|
):
|
||||||
|
model = ErnieForQuestionAnswering(config=config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
result = model(
|
||||||
|
input_ids,
|
||||||
|
attention_mask=input_mask,
|
||||||
|
token_type_ids=token_type_ids,
|
||||||
|
start_positions=sequence_labels,
|
||||||
|
end_positions=sequence_labels,
|
||||||
|
)
|
||||||
|
self.parent.assertEqual(result.start_logits.shape, (self.batch_size, self.seq_length))
|
||||||
|
self.parent.assertEqual(result.end_logits.shape, (self.batch_size, self.seq_length))
|
||||||
|
|
||||||
|
def create_and_check_for_sequence_classification(
|
||||||
|
self, config, input_ids, token_type_ids, input_mask, sequence_labels, token_labels, choice_labels
|
||||||
|
):
|
||||||
|
config.num_labels = self.num_labels
|
||||||
|
model = ErnieForSequenceClassification(config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
result = model(input_ids, attention_mask=input_mask, token_type_ids=token_type_ids, labels=sequence_labels)
|
||||||
|
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.num_labels))
|
||||||
|
|
||||||
|
def create_and_check_for_token_classification(
|
||||||
|
self, config, input_ids, token_type_ids, input_mask, sequence_labels, token_labels, choice_labels
|
||||||
|
):
|
||||||
|
config.num_labels = self.num_labels
|
||||||
|
model = ErnieForTokenClassification(config=config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
result = model(input_ids, attention_mask=input_mask, token_type_ids=token_type_ids, labels=token_labels)
|
||||||
|
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.seq_length, self.num_labels))
|
||||||
|
|
||||||
|
def create_and_check_for_multiple_choice(
|
||||||
|
self, config, input_ids, token_type_ids, input_mask, sequence_labels, token_labels, choice_labels
|
||||||
|
):
|
||||||
|
config.num_choices = self.num_choices
|
||||||
|
model = ErnieForMultipleChoice(config=config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
multiple_choice_inputs_ids = input_ids.unsqueeze(1).expand(-1, self.num_choices, -1).contiguous()
|
||||||
|
multiple_choice_token_type_ids = token_type_ids.unsqueeze(1).expand(-1, self.num_choices, -1).contiguous()
|
||||||
|
multiple_choice_input_mask = input_mask.unsqueeze(1).expand(-1, self.num_choices, -1).contiguous()
|
||||||
|
result = model(
|
||||||
|
multiple_choice_inputs_ids,
|
||||||
|
attention_mask=multiple_choice_input_mask,
|
||||||
|
token_type_ids=multiple_choice_token_type_ids,
|
||||||
|
labels=choice_labels,
|
||||||
|
)
|
||||||
|
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.num_choices))
|
||||||
|
|
||||||
|
def prepare_config_and_inputs_for_common(self):
|
||||||
|
config_and_inputs = self.prepare_config_and_inputs()
|
||||||
|
(
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
) = config_and_inputs
|
||||||
|
inputs_dict = {"input_ids": input_ids, "token_type_ids": token_type_ids, "attention_mask": input_mask}
|
||||||
|
return config, inputs_dict
|
||||||
|
|
||||||
|
|
||||||
|
@require_torch
|
||||||
|
class ErnieModelTest(ModelTesterMixin, GenerationTesterMixin, unittest.TestCase):
|
||||||
|
all_model_classes = (
|
||||||
|
(
|
||||||
|
ErnieModel,
|
||||||
|
ErnieForCausalLM,
|
||||||
|
ErnieForMaskedLM,
|
||||||
|
ErnieForMultipleChoice,
|
||||||
|
ErnieForNextSentencePrediction,
|
||||||
|
ErnieForPreTraining,
|
||||||
|
ErnieForQuestionAnswering,
|
||||||
|
ErnieForSequenceClassification,
|
||||||
|
ErnieForTokenClassification,
|
||||||
|
)
|
||||||
|
if is_torch_available()
|
||||||
|
else ()
|
||||||
|
)
|
||||||
|
all_generative_model_classes = (ErnieForCausalLM,) if is_torch_available() else ()
|
||||||
|
fx_compatible = False
|
||||||
|
|
||||||
|
# special case for ForPreTraining model
|
||||||
|
def _prepare_for_class(self, inputs_dict, model_class, return_labels=False):
|
||||||
|
inputs_dict = super()._prepare_for_class(inputs_dict, model_class, return_labels=return_labels)
|
||||||
|
|
||||||
|
if return_labels:
|
||||||
|
if model_class in get_values(MODEL_FOR_PRETRAINING_MAPPING):
|
||||||
|
inputs_dict["labels"] = torch.zeros(
|
||||||
|
(self.model_tester.batch_size, self.model_tester.seq_length), dtype=torch.long, device=torch_device
|
||||||
|
)
|
||||||
|
inputs_dict["next_sentence_label"] = torch.zeros(
|
||||||
|
self.model_tester.batch_size, dtype=torch.long, device=torch_device
|
||||||
|
)
|
||||||
|
return inputs_dict
|
||||||
|
|
||||||
|
def setUp(self):
|
||||||
|
self.model_tester = ErnieModelTester(self)
|
||||||
|
self.config_tester = ConfigTester(self, config_class=ErnieConfig, hidden_size=37)
|
||||||
|
|
||||||
|
def test_config(self):
|
||||||
|
self.config_tester.run_common_tests()
|
||||||
|
|
||||||
|
def test_model(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_model(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_model_various_embeddings(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
for type in ["absolute", "relative_key", "relative_key_query"]:
|
||||||
|
config_and_inputs[0].position_embedding_type = type
|
||||||
|
self.model_tester.create_and_check_model(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_model_as_decoder(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs_for_decoder()
|
||||||
|
self.model_tester.create_and_check_model_as_decoder(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_model_as_decoder_with_default_input_mask(self):
|
||||||
|
# This regression test was failing with PyTorch < 1.3
|
||||||
|
(
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
encoder_hidden_states,
|
||||||
|
encoder_attention_mask,
|
||||||
|
) = self.model_tester.prepare_config_and_inputs_for_decoder()
|
||||||
|
|
||||||
|
input_mask = None
|
||||||
|
|
||||||
|
self.model_tester.create_and_check_model_as_decoder(
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
encoder_hidden_states,
|
||||||
|
encoder_attention_mask,
|
||||||
|
)
|
||||||
|
|
||||||
|
def test_for_causal_lm(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs_for_decoder()
|
||||||
|
self.model_tester.create_and_check_for_causal_lm(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_for_masked_lm(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_for_masked_lm(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_for_causal_lm_decoder(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs_for_decoder()
|
||||||
|
self.model_tester.create_and_check_model_for_causal_lm_as_decoder(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_decoder_model_past_with_large_inputs(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs_for_decoder()
|
||||||
|
self.model_tester.create_and_check_decoder_model_past_large_inputs(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_for_multiple_choice(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_for_multiple_choice(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_for_next_sequence_prediction(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_for_next_sequence_prediction(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_for_pretraining(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_for_pretraining(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_for_question_answering(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_for_question_answering(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_for_sequence_classification(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_for_sequence_classification(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_for_token_classification(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_for_token_classification(*config_and_inputs)
|
||||||
|
|
||||||
|
@slow
|
||||||
|
def test_model_from_pretrained(self):
|
||||||
|
for model_name in ERNIE_PRETRAINED_MODEL_ARCHIVE_LIST[:1]:
|
||||||
|
model = ErnieModel.from_pretrained(model_name)
|
||||||
|
self.assertIsNotNone(model)
|
||||||
|
|
||||||
|
@slow
|
||||||
|
@require_torch_gpu
|
||||||
|
def test_torchscript_device_change(self):
|
||||||
|
config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
|
||||||
|
for model_class in self.all_model_classes:
|
||||||
|
# ErnieForMultipleChoice behaves incorrectly in JIT environments.
|
||||||
|
if model_class == ErnieForMultipleChoice:
|
||||||
|
return
|
||||||
|
|
||||||
|
config.torchscript = True
|
||||||
|
model = model_class(config=config)
|
||||||
|
|
||||||
|
inputs_dict = self._prepare_for_class(inputs_dict, model_class)
|
||||||
|
traced_model = torch.jit.trace(
|
||||||
|
model, (inputs_dict["input_ids"].to("cpu"), inputs_dict["attention_mask"].to("cpu"))
|
||||||
|
)
|
||||||
|
|
||||||
|
with tempfile.TemporaryDirectory() as tmp:
|
||||||
|
torch.jit.save(traced_model, os.path.join(tmp, "ernie.pt"))
|
||||||
|
loaded = torch.jit.load(os.path.join(tmp, "ernie.pt"), map_location=torch_device)
|
||||||
|
loaded(inputs_dict["input_ids"].to(torch_device), inputs_dict["attention_mask"].to(torch_device))
|
||||||
Reference in New Issue
Block a user