Implement Roberta PreLayerNorm (#20305)

* Copy RoBERTa

* formatting

* implement RoBERTa with prelayer normalization

* update test expectations

* add documentation

* add convertion script for DinkyTrain weights

* update checkpoint repo

Unfortunately the original checkpoints assumes a hacked roberta model

* add to RoBERTa-PreLayerNorm docs to toc

* run utils/check_copies.py

* lint files

* remove unused import

* fix check_repo reporting wrongly a test is missing

* fix import error, caused by rebase

* run make fix-copies

* add RobertaPreLayerNormConfig to ROBERTA_EMBEDDING_ADJUSMENT_CONFIGS

* Fix documentation <Facebook> -> Facebook

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup: Fix documentation <Facebook> -> Facebook

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add missing Flax header

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* expected_slice -> EXPECTED_SLICE

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update copies after rebase

* add missing copied from statements

* make fix-copies

* make prelayernorm explicit in code

* fix checkpoint path for the original implementation

* add flax integration tests

* improve docs

* update utils/documentation_tests.txt

* lint files

* Remove Copyright notice

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make fix-copies

* Remove EXPECTED_SLICE calculation comments

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
Andreas Madsen
2022-12-19 03:30:17 -05:00
committed by GitHub
parent 7032e02032
commit b4b613b102
34 changed files with 7143 additions and 12 deletions

View File

@@ -112,6 +112,7 @@ IGNORE_NON_TESTED = PRIVATE_MODELS.copy() + [
"TFDPREncoder", # Building part of bigger (tested) model.
"TFElectraMainLayer", # Building part of bigger (tested) model (should it be a TFPreTrainedModel ?)
"TFRobertaForMultipleChoice", # TODO: fix
"TFRobertaPreLayerNormForMultipleChoice", # TODO: fix
"TrOCRDecoderWrapper", # Building part of bigger (tested) model.
"TFWhisperEncoder", # Building part of bigger (tested) model.
"TFWhisperDecoder", # Building part of bigger (tested) model.

View File

@@ -146,6 +146,9 @@ src/transformers/models/resnet/modeling_tf_resnet.py
src/transformers/models/roberta/configuration_roberta.py
src/transformers/models/roberta/modeling_roberta.py
src/transformers/models/roberta/modeling_tf_roberta.py
src/transformers/models/roberta_prelayernorm/configuration_roberta_prelayernorm.py
src/transformers/models/roberta_prelayernorm/modeling_roberta_prelayernorm.py
src/transformers/models/roberta_prelayernorm/modeling_tf_roberta_prelayernorm.py
src/transformers/models/roc_bert/modeling_roc_bert.py
src/transformers/models/roc_bert/tokenization_roc_bert.py
src/transformers/models/segformer/modeling_segformer.py