Implement Roberta PreLayerNorm (#20305)
* Copy RoBERTa * formatting * implement RoBERTa with prelayer normalization * update test expectations * add documentation * add convertion script for DinkyTrain weights * update checkpoint repo Unfortunately the original checkpoints assumes a hacked roberta model * add to RoBERTa-PreLayerNorm docs to toc * run utils/check_copies.py * lint files * remove unused import * fix check_repo reporting wrongly a test is missing * fix import error, caused by rebase * run make fix-copies * add RobertaPreLayerNormConfig to ROBERTA_EMBEDDING_ADJUSMENT_CONFIGS * Fix documentation <Facebook> -> Facebook Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup: Fix documentation <Facebook> -> Facebook Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add missing Flax header Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * expected_slice -> EXPECTED_SLICE Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update copies after rebase * add missing copied from statements * make fix-copies * make prelayernorm explicit in code * fix checkpoint path for the original implementation * add flax integration tests * improve docs * update utils/documentation_tests.txt * lint files * Remove Copyright notice Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * make fix-copies * Remove EXPECTED_SLICE calculation comments Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
@@ -112,6 +112,7 @@ IGNORE_NON_TESTED = PRIVATE_MODELS.copy() + [
|
||||
"TFDPREncoder", # Building part of bigger (tested) model.
|
||||
"TFElectraMainLayer", # Building part of bigger (tested) model (should it be a TFPreTrainedModel ?)
|
||||
"TFRobertaForMultipleChoice", # TODO: fix
|
||||
"TFRobertaPreLayerNormForMultipleChoice", # TODO: fix
|
||||
"TrOCRDecoderWrapper", # Building part of bigger (tested) model.
|
||||
"TFWhisperEncoder", # Building part of bigger (tested) model.
|
||||
"TFWhisperDecoder", # Building part of bigger (tested) model.
|
||||
|
||||
@@ -146,6 +146,9 @@ src/transformers/models/resnet/modeling_tf_resnet.py
|
||||
src/transformers/models/roberta/configuration_roberta.py
|
||||
src/transformers/models/roberta/modeling_roberta.py
|
||||
src/transformers/models/roberta/modeling_tf_roberta.py
|
||||
src/transformers/models/roberta_prelayernorm/configuration_roberta_prelayernorm.py
|
||||
src/transformers/models/roberta_prelayernorm/modeling_roberta_prelayernorm.py
|
||||
src/transformers/models/roberta_prelayernorm/modeling_tf_roberta_prelayernorm.py
|
||||
src/transformers/models/roc_bert/modeling_roc_bert.py
|
||||
src/transformers/models/roc_bert/tokenization_roc_bert.py
|
||||
src/transformers/models/segformer/modeling_segformer.py
|
||||
|
||||
Reference in New Issue
Block a user