* Feed forward chunking for Distilbert & Albert
* Added ff chunking for many other models
* Change model signature
* Added chunking for XLM
* Cleaned up by removing some variables.
* remove test_chunking flag
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* [testing] switch to a new TestCasePlus + get_auto_remove_tmp_dir() for auto-removal of tmp dirs
* respect after=True for tempfile, simplify code
* comments
* comment fix
* put `before` last in args, so can make debug even faster
Currently with the bug introduced we're taking two optimizer steps per
batch: one global one, where `xm.optimizer_step` injects a CRS between
all cores in training, and one without. This has been affecting training
accuracy (for example, XLNet GLUE on MNLI is not converging, etc.).
* Generation doc
* MBartForConditionalGeneration (#6441)
* add MBartForConditionalGeneration
* style
* rebase and fixes
* add mbart test in TEST_FILES_WITH_NO_COMMON_TESTS
* fix docs
* don't ignore mbart
* doc
* fix mbart fairseq link
* put mbart before bart
* apply doc suggestions
* Use hash to clean the test dirs (#6475)
* Use hash to clean the test dirs
* Use hash to clean the test dirs
* Use hash to clean the test dirs
* fix
* [EncoderDecoder] Add Cross Attention for GPT2 (#6415)
* add cross attention layers for gpt2
* make gpt2 cross attention work
* finish bert2gpt2
* add explicit comments
* remove attention mask since not yet supported
* revert attn mask in pipeline
* Update src/transformers/modeling_gpt2.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_encoder_decoder.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Sort unique_no_split_tokens to make it deterministic (#6461)
* change unique_no_split_tokens's type to set
* use sorted list instead of set
* style
* Import accuracy_score (#6480)
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address comments
* Styling
* Generation doc
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address comments
* Styling
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Co-authored-by: gijswijnholds <gijswijnholds@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Data collator with padding
* Add type annotation
* Support tensors as well
* Add comment
* Fix for labels wrong shape
* Data collator with padding
* Add type annotation
* Support tensors as well
* Add comment
* Fix for labels wrong shape
* Remove changes rendered unnecessary
* allow using tokenizer.pad as a collate_fn in pytorch
* allow using tokenizer.pad as a collate_fn in pytorch
* Add documentation and tests
* Make attention mask the right shape
* Better test
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* [wip] add get_polynomial_decay_schedule_with_warmup
* style
* add assert
* change lr_end to a much smaller default number
* check for exact equality
* [model_cards] electra-base-turkish-cased-ner (#6350)
* for electra-base-turkish-cased-ner
* Add metadata
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Temporarily de-activate TPU CI
* Update modeling_tf_utils.py (#6372)
fix typo: ckeckpoint->checkpoint
* the test now works again (#6371)
* correct pl link in readme (#6364)
* refactor almost identical tests (#6339)
* refactor almost identical tests
* important to add a clear assert error message
* make the assert error even more descriptive than the original bt
* Small docfile fixes (#6328)
* Patch models (#6326)
* TFAlbertFor{TokenClassification, MultipleChoice}
* Patch models
* BERT and TF BERT info
s
* Update check_repo
* Ci GitHub caching (#6382)
* Cache Github Actions CI
* Remove useless file
* Colab button (#6389)
* Add colab button
* Add colab link for tutorials
* Fix links for open in colab (#6391)
* Update src/transformers/optimization.py
consistently use lr_end=1e-7 default
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [wip] add get_polynomial_decay_schedule_with_warmup
* style
* add assert
* change lr_end to a much smaller default number
* check for exact equality
* Update src/transformers/optimization.py
consistently use lr_end=1e-7 default
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* remove dup (leftover from merge)
* convert the test into the new refactored format
* stick to using the current_step as is, without ++
Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Alexander Measure <ameasure@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Optimized banned token masking
* Avoid duplicate EOS masking if in bad_words_id
* Updated mask generation to handle empty banned token list
* Addition of unit tests for the updated bad_words_ids masking
* Updated timeout handling in `test_postprocess_next_token_scores_large_bad_words_list` unit test
* Updated timeout handling in `test_postprocess_next_token_scores_large_bad_words_list` unit test (timeout does not work on Windows)
* Moving Marian import to the test context to allow TF only environments to run
* Moving imports to torch_available test
* Updated operations device and test
* Updated operations device and test
* Added docstring and comment for in-place scores modification
* Moving test to own test_generation_utils, use of lighter models for testing
* removed unneded imports in test_modeling_common
* revert formatting change for ModelTesterMixin
* Updated caching, simplified eos token id test, removed unnecessary @require_torch
* formatting compliance
* Warn if debug requested without TPU fixes (#6308)
Check whether a PyTorch compatible TPU is available before attempting to print TPU metrics after training has completed. This way, users who apply `--debug` without reading the documentation aren't suprised by a stacktrace.
* Style
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Chunked feed forward for Bert
This is an initial implementation to test applying feed forward chunking for BERT.
Will need additional modifications based on output and benchmark results.
* Black and cleanup
* Feed forward chunking in BertLayer class.
* Isort
* add chunking for all models
* fix docs
* Fix typo
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* improve names and tests longformer
* more and better tests for longformer
* add first tf test
* finalize tf basic op functions
* fix merge
* tf shape test passes
* narrow down discrepancies
* make longformer local attn tf work
* correct tf longformer
* add first global attn function
* add more global longformer func
* advance tf longformer
* finish global attn
* upload big model
* finish all tests
* correct false any statement
* fix common tests
* make all tests pass except keras save load
* fix some tests
* fix torch test import
* finish tests
* fix test
* fix torch tf tests
* add docs
* finish docs
* Update src/transformers/modeling_longformer.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/modeling_tf_longformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply Lysandres suggestions
* reverse to assert statement because function will fail otherwise
* applying sylvains recommendations
* Update src/transformers/modeling_longformer.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* Update src/transformers/modeling_tf_longformer.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>