* Refactor checkpoint name in ALBERT and ALBERT_tf
* Refactor checkpoint name in BART and BART_tf
* Refactor checkpoint name in BERT generation
* Refactor checkpoint name in Blenderbot_tf
* Refactor checkpoint name in Blenderbot_small_tf
* Refactor checkpoint name in ConvBERT AND CONVBERT_TF
* Refactor checkpoint name in CTRL AND CTRL_TF
* Refactor checkpoint name in DistilBERT AND DistilBERT_TF
* Refactor checkpoint name in DistilBERT redo
* Refactor checkpoint name in Electra and Electra_tf
* Refactor checkpoint name in FlauBERT and FlauBERT_tf
* Refactor checkpoint name in FSMT
* Refactor checkpoint name in GPT2 and GPT2_tf
* Refactor checkpoint name in IBERT
* Refactor checkpoint name in LED and LED_tf
* Refactor checkpoint name in Longformer and Longformer_tf
* Refactor checkpoint name in Lxmert and Lxmert_tf
* Refactor checkpoint name in Marian_tf
* Refactor checkpoint name in MBART and MBART_tf
* Refactor checkpoint name in MobileBERT and MobileBERT_tf
* Refactor checkpoint name in mpnet and mpnet_tf
* Refactor checkpoint name in openai and openai_tf
* Refactor checkpoint name in pegasus_tf
* Refactor checkpoint name in reformer
* Refactor checkpoint name in Roberta and Roberta_tf
* Refactor checkpoint name in SqueezeBert
* Refactor checkpoint name in Transformer_xl and Transformer_xl_tf
* Refactor checkpoint name in XLM and XLM_tf
* Refactor checkpoint name in XLNET and XLNET_tf
* Refactor checkpoint name in BERT_tf
* run make tests, style, quality, fixup
* Fix computation of attention_probs when head_mask is provided.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Apply changes to the template
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Update past_key_values in gpt2 (#9391)
* Update generation_utils, and rename some items
* Update modeling_gpt2 to avoid an error in gradient_checkpointing
* Remove 'reorder_cache' from util and add variations to XLNet, TransfoXL, GPT-2
* Change the location of '_reorder_cache' in modeling files
* Add '_reorder_cache' in modeling_ctrl
* Fix a bug of my last commit in CTRL
* Add '_reorder_cache' to GPT2DoubleHeadsModel
* Manage 'use_cache' in config of test_modeling_gpt2
* Clean up the doc string
* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix the doc string (GPT-2, CTRL)
* improve gradient_checkpointing_behavior
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Main init work
* Add version
* Change from absolute to relative imports
* Fix imports
* One more typo
* More typos
* Styling
* Make quality script pass
* Add necessary replace in template
* Fix typos
* Spaces are ignored in replace for some reason
* Forgot one models.
* Fixes for import
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
* Add documentation
* Styling
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
* add past_key_values
* add use_cache option
* make mask before cutting ids
* adjust position_ids according to past_key_values
* flatten past_key_values
* fix positional embeds
* fix _reorder_cache
* set use_cache to false when not decoder, fix attention mask init
* add test for caching
* add past_key_values for Roberta
* fix position embeds
* add caching test for roberta
* add doc
* make style
* doc, fix attention mask, test
* small fixes
* adress patrick's comments
* input_ids shouldn't start with pad token
* use_cache only when decoder
* make consistent with bert
* make copies consistent
* add use_cache to encoder
* add past_key_values to tapas attention
* apply suggestions from code review
* make coppies consistent
* add attn mask in tests
* remove copied from longformer
* apply suggestions from code review
* fix bart test
* nit
* simplify model outputs
* fix doc
* fix output ordering
* Resize the biases in same time than the embeddings
* Trigger CI
* Biases are not reset anymore
* Remove get_output_embeddings + better LM model detection in generation utils
* Apply style
* First test on BERT
* Update docstring + new name
* Apply the new resizing logic to all the models
* fix tests
* Apply style
* Update the template
* Fix naming
* Fix naming
* Apply style
* Apply style
* Remove unused import
* Revert get_output_embeddings
* Trigger CI
* Update num parameters
* Restore get_output_embeddings in TFPretrainedModel and add comments
* Style
* Add decoder resizing
* Style
* Fix tests
* Separate bias and decoder resize
* Fix tests
* Fix tests
* Apply style
* Add bias resizing in MPNet
* Trigger CI
* Apply style
* Remove "Model" suffix from Flax models to look more 🤗
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Initial working (forward + backward) for Flax MLM training example.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Simply code
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing comments, using module and moving to LM task.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Restore parameter name "module" wrongly renamed model.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Restore correct output ordering...
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Actually commit the example 😅
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Add FlaxBertModelForMaskedLM after rebasing.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make it possible to initialize the training from scratch
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Reuse flax linen example of cross entropy loss
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added specific data collator for flax
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove todo for data collator
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added evaluation step
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added ability to provide dtype to support bfloat16 on TPU
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Enable flax tensorboard output
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Enable jax.pmap support.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Ensure batches are correctly sized to be dispatched with jax.pmap
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Enable bfloat16 with --fp16 cmdline args
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Correctly export metrics to tensorboard
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added dropout and ability to use it.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Effectively enable & disable during training and evaluation steps.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Oops.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Enable specifying kernel initializer scale
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added warmup step to the learning rate scheduler.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix typo.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Print training loss
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make style
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* fix linter issue (flake8)
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix model matching
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix dummies
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix non default dtype on Flax models
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Use the same create_position_ids_from_input_ids for FlaxRoberta
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make Roberta attention as Bert
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* fix copy
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Wording.
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
* fix mems in xlnet
* fix use_mems
* fix use_mem_len
* fix use mems
* clean docs
* fix tf typo
* make xlnet tf for generation work
* fix tf test
* refactor use cache
* add use cache for missing models
* correct use_cache in generate
* correct use cache in tf generate
* fix tf
* correct getattr typo
* make sylvain happy
* change in docs as well
* do not apply to cookie cutter statements
* fix tf test
* make pytorch model fully backward compatible
* Support BERT relative position embeddings
* Fix typo in README.md
* Address review comment
* Fix failing tests
* [tiny] Fix style_doc.py check by adding an empty line to configuration_bert.py
* make fix copies
* fix configs of electra and albert and fix longformer
* remove copy statement from longformer
* fix albert
* fix electra
* Add bert variants forward tests for various position embeddings
* [tiny] Fix style for test_modeling_bert.py
* improve docstring
* [tiny] improve docstring and remove unnecessary dependency
* [tiny] Remove unused import
* re-add to ALBERT
* make embeddings work for ALBERT
* add test for albert
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Put models in subfolders
* Styling
* Fix imports in tests
* More fixes in test imports
* Sneaky hidden imports
* Fix imports in doc files
* More sneaky imports
* Finish fixing tests
* Fix examples
* Fix path for copies
* More fixes for examples
* Fix dummy files
* More fixes for example
* More model import fixes
* Is this why you're unhappy GitHub?
* Fix imports in conver command