* working on LongformerForSequenceClassification
* add TFLongformerForMultipleChoice
* add TFLongformerForTokenClassification
* use add_start_docstrings_to_model_forward
* test TFLongformerForSequenceClassification
* test TFLongformerForMultipleChoice
* test TFLongformerForTokenClassification
* remove test from repo
* add test and doc for TFLongformerForSequenceClassification, TFLongformerForTokenClassification, TFLongformerForMultipleChoice
* add requested classes to modeling_tf_auto.py
update dummy_tf_objects
fix tests
fix bugs in requested classes
* pass all tests except test_inputs_embeds
* sync with master
* pass all tests except test_inputs_embeds
* pass all tests
* pass all tests
* work on test_inputs_embeds
* fix style and quality
* make multi choice work
* fix TFLongformerForTokenClassification signature
* fix TFLongformerForMultipleChoice, TFLongformerForSequenceClassification signature
* fix mult choice
* fix mc hint
* fix input embeds
* fix input embeds
* refactor input embeds
* fix copy issue
* apply sylvains changes and clean more
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Updated the Extractive Question Answering code snippets
The Extractive Question Answering code snippets do not work anymore since the models return task-specific output objects. This commit fixes the pytorch and tensorflow examples but adding `.values()` to the model call.
* Update task_summary.rst
Modified Model in Action section. The class `AutoModelWithLMHead` is deprecated so changed it to `AutoModelForSeq2SeqLM` for encoder-decoder models. Removed duplicate eos token.
* Adding PrefixConstrainedLogitsProcessor
* fixing RAG and style_doc
* fixing black (v20 instead of v19)
* Improving doc in generation_logits_process.py
* Improving docs and typing in generation_utils.py
* docs improvement
* adding test and fixing doc typo
* fixing doc_len
* isort on test
* fixed test
* improve docstring a bit
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* make tr_loss regular float
* Revert "make tr_loss regular float"
This reverts commit c9d7ccfaf0c4387187b0841694f01ec0ffd5f4ba.
* reset loss at each logging step
* keep track of total loss with _total_loss_scalar
* add remaining tr_loss at the end
* Tokenizers should be framework agnostic
* Run the slow tests
* Not testing
* Fix documentation
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* <small>tiny typo</small>
* Tokenizers: ability to load from model subfolder
* use subfolder for local files as well
* Uniformize model shortcut name => model id
* from s3 => from huggingface.co
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
* Put models in subfolders
* Styling
* Fix imports in tests
* More fixes in test imports
* Sneaky hidden imports
* Fix imports in doc files
* More sneaky imports
* Finish fixing tests
* Fix examples
* Fix path for copies
* More fixes for examples
* Fix dummy files
* More fixes for example
* More model import fixes
* Is this why you're unhappy GitHub?
* Fix imports in conver command
* Use the CI to identify failing tests
* Remove from all examples and tests
* More default switch
* Fixes
* More test fixes
* More fixes
* Last fixes hopefully
* Use the CI to identify failing tests
* Remove from all examples and tests
* More default switch
* Fixes
* More test fixes
* More fixes
* Last fixes hopefully
* Run on the real suite
* Fix slow tests
* Fix passing token_type_ids during GPT2DoubleHeadsModel.generate() if used
and for GPT2LMHeadModel too
* Update tests to check token_type_ids usage in GPT2 models
* Simply insert T5Tokenizer's prepare_seq2seq_batch
* Update/Add some 'import'
* fix RunTimeError caused by '.view'
* Moves .view related error avoidance from seq2seq_trainer to inside prophetnet
* Update test_tokenization_prophetnet.py
* Format the test code with black
* Re-format the test code
* Update test_tokenization_prophetnet.py
* Add importing require_torch in the test code
* Add importing BatchEncoding in the test code
* Re-format the test code on Colab
* Fixing roberta for slow-fast tests
* WIP getting equivalence on pipelines
* slow-to-fast equivalence - working on question-answering pipeline
* optional FAISS tests
* Pipeline Q&A
* Move pipeline tests to their own test job again
* update tokenizer to add sequence id methods
* update to tokenizers 0.9.4
* set sentencepiecce as optional
* clean up squad
* clean up pipelines to use sequence_ids
* style/quality
* wording
* Switch to use_fast = True by default
* update tests for use_fast at True by default
* fix rag tokenizer test
* removing protobuf from required dependencies
* fix NER test for use_fast = True by default
* fixing example tests (Q&A examples use slow tokenizers for now)
* protobuf in main deps extras["sentencepiece"] and example deps
* fix protobug install test
* try to fix seq2seq by switching to slow tokenizers for now
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update some tests
* Small update
* Apply style
* Use max_position_embeddings
* Create a fake attribute
* Create a fake attribute
* Update wrong name
* Wrong TransfoXL model file
* Keep the common tests agnostic
* Update deploy-docs dependencies on CI to enable Flax
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added pair of ""
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* First addition of Flax/Jax documentation
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* make style
* Ensure input order match between Bert & Roberta
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Install dependencies "all" when building doc
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* wraps build_doc deps with ""
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing @sgugger comments.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Use list to highlight JAX features.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Make style.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Let's not look to much into the future for now.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Style
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Create modeling_tf_dpr.py
* Add TFDPR
* Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot
last commit accidentally deleted these 4 lines, so I recover them back
* Add TFDPR
* Add TFDPR
* clean up some comments, add TF input-style doc string
* Add TFDPR
* Make return_dict=False as default
* Fix return_dict bug (in .from_pretrained)
* Add get_input_embeddings()
* Create test_modeling_tf_dpr.py
The current version is already passed all 27 tests!
Please see the test run at :
https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing
* fix quality
* delete init weights
* run fix copies
* fix repo consis
* del config_class, load_tf_weights
They shoud be 'pytorch only'
* add config_class back
after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion
* newline after .. note::
* import tf, np (Necessary for ModelIntegrationTest)
* slow_test from_pretrained with from_pt=True
At the moment we don't have TF weights (since we don't have official official TF model)
Previously, I did not run slow test, so I missed this bug
* Add simple TFDPRModelIntegrationTest
Note that this is just a test that TF and Pytorch gives approx. the same output.
However, I could not test with the official DPR repo's output yet
* upload correct tf model
* remove position_ids as missing keys
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patrickvonplaten <patrick@huggingface.co>
The new run_ner.py script tries to run prediction on the input
test set `datasets["test"]`, but it should be the tokenized set
`tokenized_datasets["test"]`
* Add next sentence prediction loss computation
* Apply style
* Fix tests
* Add forgotten import
* Add forgotten import
* Use a new parameter
* Remove kwargs and use positional arguments
* [testing utils] get_auto_remove_tmp_dir default change
Now that I have been using `get_auto_remove_tmp_dir default change` for a while, I realized that the defaults aren't most optimal.
99% of the time we want the tmp dir to be empty at the beginning of the test - so changing the default to `before=True` - this shouldn't impact any tests since this feature is used only during debug.
* simplify things
* update docs
* fix doc layout
* style
* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* better 3-state doc
* style
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* s/tmp/temporary/ + style
* correct the statement
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix typo
* rm use_cdn & references, and implement new hf_bucket_url
* I'm pretty sure we don't need to `read` this file
* same here
* [BIG] file_utils.networking: do not gobble up errors anymore
* Fix CI 😇
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Tiny doc tweak
* Add doc + pass kwarg everywhere
* Add more tests and explain
cc @sshleifer let me know if better
Co-Authored-By: Sam Shleifer <sshleifer@gmail.com>
* Also implement revision in pipelines
In the case where we're passing a task name or a string model identifier
* Fix CI 😇
* Fix CI
* [hf_api] new methods + command line implem
* make style
* Final endpoints post-migration
* Fix post-migration
* Py3.6 compat
cc @stefan-it
Thank you @stas00
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* add a multi-gpu job for all example tests
* run only ported tests
* rename
* explain why env is re-activated on each step
* mark all unported/checked tests with @require_torch_non_multigpu_but_fix_me
* style
* Apply suggestions from code review
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* add training tests
* correct longformer
* fix docs
* fix some tests
* fix some more train tests
* remove ipdb
* fix multiple edge case model training
* fix funnel and prophetnet
* clean gpt models
* undo renaming of albert
* Add new token classification example
* Remove txt file
* Add test
* With actual testing done
* Less warmup is better
* Update examples/token-classification/run_ner_new.py
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Address review comments
* Fix test
* Make Lysandre happy
* Last touches and rename
* Rename in tests
* Address review comments
* More run_ner -> run_ner_old
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Output cross-attention with decoder attention output
* Update src/transformers/modeling_bert.py
* add cross-attention for t5 and bart as well
* fix tests
* correct typo in docs
* add sylvains and sams comments
* correct typo
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* use decorator
* remove hardcoded paths
* make the test use more data and do real quality tests
* shave off 10 secs
* add --eval_beams 2, reformat
* reduce train size, use smaller custom dataset
* Make Trainer evaluation handle dynamic seq_length
* Document behavior.
* Fix test
* Better fix
* Fixes for realsies this time
* Address review comments
* Without forgetting to save...
* Output global_attentions in Longformer models
* make style
* small refactoring
* fix tests
* make fix-copies
* add for tf as well
* remove comments in test
* make fix-copies
* make style
* add docs
* make docstring pretty
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* change TokenClassificationTask class methods to static methods
Since we do not require self in the class methods of TokenClassificationTask we should probably switch to static methods. Also, since the class TokenClassificationTask does not contain a constructor it is currently unusable as is. By switching to static methods this fixes the issue of having to document the intent of the broken class.
Also, since the get_labels and read_examples_from_file methods are ought to be implemented. Static method definitions are unchanged even after inheritance, which means that it can be overridden, similar to other class methods.
* Trigger Build
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
- The issue is that with previous code we would have the following:
```python
qa_pipeline = (...)
qa_pipeline(question="Where was he born ?", context="")
-> IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
```
The goal here is to improve this to actually return a ValueError
wherever possible.
While at it, I tried to simplify QuestionArgumentHandler's code to
make it smaller and more compat while keeping backward compat.
* Bug fix: NER pipeline shouldn't group separate entities of same type
* style fix
* [Bug Fix] Shouldn't group entities that are both 'B' even if they are same type
(B-type1 B-type1) != (B-type1 I-type1)
[Bug Fix] add an option `ignore_subwords` to ignore subsequent ##wordpieces in predictions. Because some models train on only the first token of a word and not on the subsequent wordpieces (BERT NER default). So it makes sense doing the same thing at inference time.
The simplest fix is to just group the subwords with the first wordpiece.
[TODO] how to handle ignored scores? just set them to 0 and calculate zero invariant mean ?
[TODO] handle different wordpiece_prefix ## ? possible approaches:
get it from tokenizer? but currently most tokenizers dont have a wordpiece_prefix property?
have an _is_subword(token)
[Feature add] added option to `skip_special_tokens`. Cause It was harder to remove them after grouping.
[Additional Changes] remove B/I prefix on returned grouped_entities
[Feature Request/TODO] Return indexes?
[Bug TODO] can't use fast tokenizer with grouped_entities ('BertTokenizerFast' object has no attribute 'convert_tokens_to_string')
* use offset_mapping to fix [UNK] token problem
* ignore score for subwords
* modify ner_pipeline test
* modify ner_pipeline test
* modify ner_pipeline test
* ner_pipeline change ignore_subwords default to true
* add ner_pipeline ignore_subword=False test case
* fix offset_mapping index
* fix style again duh
* change is_subword and convert_tokens_to_string logic
* merge tests with new test structure
* change test names
* remove old tests
* ner tests for fast tokenizer
* fast tokenizers have convert_tokens_to_string
* Fix the incorrect merge
Co-authored-by: Ceyda Cinarel <snu-ceyda@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* make it possible to invoke testconf.py in both test suites without crashing on having the same option added
* perl -pi -e 's|--make_reports|--make-reports|' to be consistent with other opts
* add `pytest --make-reports` to all CIs (and artifacts)
* fix
* Updated ConversationalPipeline to work with encoder-decoder models (e.g. BlenderBot)
* Addition of integration test for EncoderDecoder conversation model
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* [FIX] TextGenerationPipeline is currently broken.
It's most likely due to #8180.
What's missing is a multi vs single string handler at the beginning of
the pipe.
And also there was no testing of this pipeline.
* Fixing Conversational tests too.
* first draft
* show design proposition for new generate method
* up
* make better readable
* make first version
* gpt2 tests pass
* make beam search for gpt2 work
* add first encoder-decoder code
* delete typo
* make t5 work
* save indermediate
* make bart work with beam search
* finish beam search bart / t5
* add default kwargs
* make more tests pass
* fix no bad words sampler
* some fixes and tests for all distribution processors
* fix test
* fix rag slow tests
* merge to master
* add nograd to generate
* make all slow tests pass
* speed up generate
* fix edge case bug
* small fix
* correct typo
* add type hints and docstrings
* fix typos in tests
* add beam search tests
* add tests for beam scorer
* fix test rag
* finish beam search tests
* move generation tests in seperate file
* fix generation tests
* more tests
* add aggressive generation tests
* fix tests
* add gpt2 sample test
* add more docstring
* add more docs
* finish doc strings
* apply some more of sylvains and sams comments
* fix some typos
* make fix copies
* apply lysandres and sylvains comments
* final corrections on examples
* small fix for reformer
* Make line by line optional in run_mlm
* Add option to disable dynamic padding
* Add option to plm too and update README
* Typos
* More typos
* Even more typos
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* make sure that logging_first_step evaluates
* fix bug with incorrect loss on logging_first_step
* fix style
* logging_first_step only logs, not evals
* Minor style improvements:
1. Use `@nn.compact` rather than `@compact` (as to not make it seem
like compact is a standard Python decorator.
2. Move attribute docstrings from two `__call__` methods to comments
on the attributes themselves. (This was probably a remnant from
the pre-Linen version where the attributes were arguments to
`call`.)
* Use black on the Flax modeling code
* Replace swish with silu
* revert nn.silu to nn.swish due to older version
* simplify optimized silu conditional and fix format
* Update activations.py
* Update activations_tf.py
* Update modeling_flax_utils.py
* Update modeling_openai.py
* add swish testcase
* add pytorch swish testcase
* Add more robust python version check
* more formatting fixes
Co-authored-by: TFUsers <TFUsers@gmail.com>
* Test TF GPU CI
* Change cache
* Fix missing torch requirement
* Fix some model tests
Style
* LXMERT
* MobileBERT
* Longformer skip test
* XLNet
* The rest of the tests
* RAG goes OOM in multi gpu setup
* YAML test files
* Last fixes
* Skip doctests
* Fill mask tests
* Yaml files
* Last test fix
* Style
* Update cache
* Change ONNX tests to slow + use tiny model
* Add a template for example scripts and apply it to mlm
* Formatting
* Fix test
* Add plm script
* Add a template for example scripts and apply it to mlm
* Formatting
* Fix test
* Add plm script
* Add a template for example scripts and apply it to mlm
* Formatting
* Fix test
* Add plm script
* Styling
* Add model card for Gujarati-XLM-R-Base
* Update README.md
Add the model card for the Gujarati-XLM-R-Base.
* Apply suggestions from code review
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* move the helper code into testing_utils
* port test_trainer_distributed to work with pytest
* improve docs
* simplify notes
* doc
* doc
* style
* doc
* further improvements
* torch might not be available
* real fix
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* New run_clm script
* Formatting
* More comments
* Remove unused imports
* Apply suggestions from code review
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Address review comments
* Change link to the hub
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* first attempt to add AzureML callbacks
* func arg fix
* var name fix, but still won't fix error...
* fixing as in https://discuss.huggingface.co/t/how-to-integrate-an-azuremlcallback-for-logging-in-azure/1713/2
* Avoid lint check of azureml import
* black compliance
* Make isort happy
* Fix point typo in docs
* Add AzureML to Callbacks docs
* Attempt to make sphinx happy
* Format callback docs
* Make documentation style happy
* Make docs compliant to style
Co-authored-by: Davide Fiocco <davide.fiocco@frontiersin.net>
* better reports
* a whole bunch of reports in their own files
* clean up
* improvements
* github artifacts experiment
* style
* complete the report generator with multiple improvements/fixes
* fix
* save all reports under one dir to easy upload
* can remove temp failing tests
* doc fix
* some cleanup
* Fix comet_ml import and add ensure availability
* Make isort happy
* Make flake8 happy
* Don't show comet_ml warn if COMET_MODE=DISABLED
* Make isort happy
* Important files
* Styling them all
* Revert "Styling them all"
This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e.
* Syling them for realsies
* Fix syntax error
* Fix benchmark_utils
* More fixes
* Fix modeling auto and script
* Remove new line
* Fixes
* More fixes
* Fix more files
* Style
* Add FSMT
* More fixes
* More fixes
* More fixes
* More fixes
* Fixes
* More fixes
* More fixes
* Last fixes
* Make sphinx happy
* mc for new cross lingual sentence model
* fat text
* url spelling fix
* more url spelling fixes
* slight thanks change
* small improvements in text
* multilingual word xchange
* change colab link
* xval fold number
* add model links
* line break in model names
* Update README.md
* Update README.md
* new examples link
* new examples link
* add evaluation dataset name
* add more about multi lingual
* typo fix
* typo
* typos
* hyperparameter typos
* hyperparameter typo
* add metadata
* add metadata
* Update README.md
* typo fix
* Small improvement
* Add MLflow integration class
Add integration code for MLflow in integrations.py along with the code
that checks that MLflow is installed.
* Add MLflowCallback import
Add import of MLflowCallback in trainer.py
* Handle model argument
Allow the callback to handle model argument and store model config items as hyperparameters.
* Log parameters to MLflow in batches
MLflow cannot log more than a hundred parameters at once.
Code added to split the parameters into batches of 100 items and log the batches one by one.
* Fix style
* Add docs on MLflow callback
* Fix issue with unfinished runs
The "fluent" api used in MLflow integration allows only one run to be active at any given moment. If the Trainer is disposed off and a new one is created, but the training is not finished, it will refuse to log the results when the next trainer is created.
* Add MLflow integration class
Add integration code for MLflow in integrations.py along with the code
that checks that MLflow is installed.
* Add MLflowCallback import
Add import of MLflowCallback in trainer.py
* Handle model argument
Allow the callback to handle model argument and store model config items as hyperparameters.
* Log parameters to MLflow in batches
MLflow cannot log more than a hundred parameters at once.
Code added to split the parameters into batches of 100 items and log the batches one by one.
* Fix style
* Add docs on MLflow callback
* Fix issue with unfinished runs
The "fluent" api used in MLflow integration allows only one run to be active at any given moment. If the Trainer is disposed off and a new one is created, but the training is not finished, it will refuse to log the results when the next trainer is created.
* Make Seq2Seq Trainer more similar to Trainer
* fix typo
* fix seq2seq trainer
* remove from tests
* remove lock
* remove train files
* delete test files
* correct typo
* check at init
* make sure trainer is not slowed down on TPU
* correct isort
* remove use cache
* fix use cache
* add last use chache = false
* model card German Sentence Embeddings V2
- for German RoBERTa for Sentence Embeddings V2
- marked old as outdated
* small correction
* small improvement in description
* small spelling fix
* spelling fix
* add evaluation results
* spearman explanation
* add number of trials
Updating the run_squad training script to handle the "longformer" `model_type`. The longformer is trained in the same was as RoBERTa, so I've added the "longformer" `model_type` (that's the right hugginface name for the LongFormer model, right?) everywhere there was a "roberta" `model_type` reference. The longformer (like RoBERTa) doesn't use `token_type_ids` (as I understand from looking at the [longformer notebook](https://github.com/patil-suraj/Notebooks/blob/master/longformer_qa_training.ipynb), which is what gets updated after this change.
This fix might be related to [this issue](https://github.com/huggingface/transformers/issues/7249) with SQuAD training when using run_squad.py
* WIP refactoring pipeline tests - switching to fast tokenizers
* fix dialog pipeline and fill-mask
* refactoring pipeline tests backbone
* make large tests slow
* fix tests (tf Bart inactive for now)
* fix doc...
* clean up for merge
* fixing tests - remove bart from summarization until there is TF
* fix quality and RAG
* Add new translation pipeline tests - fix JAX tests
* only slow for dialog
* Fixing the missing TF-BART imports in modeling_tf_auto
* spin out pipeline tests in separate CI job
* adding pipeline test to CI YAML
* add slow pipeline tests
* speed up tf and pt join test to avoid redoing all the standalone pt and tf tests
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* Update src/transformers/pipelines.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/pipelines.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add require_torch and require_tf in is_pt_tf_cross_test
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Start simplification
* More progress
* Finished script
* Address comments and update tests instructions
* Wrong test
* Accept files as inputs and fix test
* Update src/transformers/trainer_utils.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Fix labels and add combined score
* Add special labels
* Update TPU command
* Revert to old label strategy
* Use model labels
* Fix for STT-B
* Styling
* Apply suggestions from code review
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Code styling
* Fix review comments
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Actually make the "translation", "translation_XX_to_YY" task behave correctly.
Background:
- Currently "translation_cn_to_ar" does not work. (only 3 pairs are
supported)
- Some models, contain in their config the correct values for the (src,
tgt) pair they can translate. It's usually just one pair, and we can
infer it automatically from the `model.config.task_specific_params`. If
it's not defined we can still probably load the TranslationPipeline
nevertheless.
Proposed fix:
- A simplified version of what could become more general which is
a `parametrized` task. "translation" + (src, tgt) in this instance
it what we need in the general case. The way we go about it for now
is simply parsing "translation_XX_to_YY". If cases of parametrized task arise
we should preferably go in something closer to what `datasets` propose
which is having a secondary argument `task_options`? that will be close
to what that task requires.
- Should be backward compatible in all cases for instance
`pipeline(task="translation_en_to_de") should work out of the box.
- Should provide a warning when a specific translation pair has been
selected on behalf of the user using
`model.config.task_specific_params`.
* Update src/transformers/pipelines.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* fix config save
* add test
* add config class variable and another test
* line break
* fix fsmt and typo
* god am I making many errors today :-/
* Update src/transformers/configuration_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Looking at the current community notebooks, it seems that few are targeted for absolute beginners and even fewer are written with TensorFlow. This notebook describes absolutely everything a beginner would need to know, including how to save/load their model and use it for new predictions (this is often omitted in tutorials)
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* slow tests should be slow
* exception note
* style
* integrate LysandreJik's notes with some expansions
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* another slow test
* fix link, and prose
* clarify.
* note from Sam
* typo
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make the save_load special key tests common
* handle mbart
* cleaner solution
* fix
* move test_save_load_missing_keys back into fstm for now
* restore
* style
* add marian
* add pegasus
* blenderbot
* revert - no static embed
I'm using transformers 3.3.1 and run a training script with `--save_total_limit 3`. I hit the exception below, and after debugging the code found that it wrongly tries to index into the `best_model_checkpoint`'s *str* rather than the `sorted_checkpoints` array. When running without the fix I got this exception:
```
Traceback (most recent call last):
File "/<HOME>/.conda/envs/transformers/lib/python3.7/site-packages/transformers/trainer.py", line 921, in _save_training
self._rotate_checkpoints(use_mtime=True)
File "/<HOME>/.conda/envs/transformers/lib/python3.7/site-packages/transformers/trainer.py", line 1283, in _rotate_checkpoints
checkpoints_sorted = self._sorted_checkpoints(use_mtime=use_mtime)
File "/<HOME>/.conda/envs/transformers/lib/python3.7/site-packages/transformers/trainer.py", line 1274, in _sorted_checkpoints
checkpoints_sorted[best_model_index],
TypeError: 'str' object does not support item assignment
```
Seeing error when sending `decoder_config` as a parameter while initializing a encoder-decoder model from pretrained.
fixed "UnboundLocalError: local variable 'decoder_config' referenced before assignment"
* add CustomHFIndex
* typo in config
* update tests
* add custom dataset example
* clean script
* update test data
* minor in test
* docs
* docs
* style
* fix imports
* allow to pass the indexed dataset directly
* update tests
* use multiset DPR
* address thom and patrick's comments
* style
* update dpr tokenizer
* add output_dir flag in use_own_knowledge_dataset.py
* allow custom datasets in examples/rag/finetune.py
* add test for custom dataset in distributed rag retriever
* fix 5990
* accomodate iterable dataset without predefined length
* set it as 1 use case: provide max_steps, and NO num_epochs
* Is a merge of master and PR 5995
* fix trainer test under TF
* fix only for torch
* TF trainer untouched
* trainer tests are skipped when no torch
* address comments
* fix quality checks
* remove torch.dataset from test_trainer
* unnecessary inheritance
* RegressionDataset implements all needed methods __len__ and __getitem__
* fix quality checks
* restore RegressionDataset
* was wrongly under is_torch_available()
* WIP flax bert
* Initial commit Bert Jax/Flax implementation.
* Embeddings working and equivalent to PyTorch.
* Move embeddings in its own module BertEmbeddings
* Added jax.jit annotation on forward call
* BertEncoder on par with PyTorch ! :D
* Add BertPooler on par with PyTorch !!
* Working Jax+Flax implementation of BertModel with < 1e-5 differences on the last layer.
* Fix pooled output to take only the first token of the sequence.
* Refactoring to use BertConfig from transformers.
* Renamed FXBertModel to FlaxBertModel
* Model is now initialized in FlaxBertModel constructor and reused.
* WIP JaxPreTrainedModel
* Cleaning up the code of FlaxBertModel
* Added ability to load Flax model saved through save_pretrained()
* Added ability to convert Pytorch Bert model to FlaxBert
* FlaxBert can now load every Pytorch Bert model with on-the-fly conversion
* Fix hardcoded shape values in conversion scripts.
* Improve the way we handle LayerNorm conversion from PyTorch to Flax.
* Added positional embeddings as parameter of BertModel with default to np.arange.
* Let's roll FlaxRoberta !
* Fix missing position_ids parameters on predict for Bert
* Flax backend now supports batched inputs
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Make it possible to load msgpacked model on convert from pytorch in last resort.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Moved save_pretrained to Jax base class along with more constructor parameters.
* Use specialized, model dependent conversion functio.
* Expose `is_flax_available` in file_utils.
* Added unittest for Flax models.
* Added run_tests_flax to the CI.
* Introduce FlaxAutoModel
* Added more unittests
* Flax model reference the _MODEL_ARCHIVE_MAP from PyTorch model.
* Addressing review comments.
* Expose seed in both Bert and Roberta
* Fix typo suggested by @stefan-it
Co-Authored-By: Stefan Schweter <stefan@schweter.it>
* Attempt to make style
* Attempt to make style in tests too
* Added jax & jaxlib to the flax optional dependencies.
* Attempt to fix flake8 warnings ...
* Redo black again and again
* When black and flake8 fight each other for a space ... 💥💥💥
* Try removing trailing comma to make both black and flake happy!
* Fix invalid is_<framework>_available call, thanks @LysandreJik 🎉
* Fix another invalid import in flax_roberta test
* Bump and pin flax release to 0.1.0.
* Make flake8 happy, remove unused jax import
* Change the type of the catch for msgpack.
* Remove unused import.
* Put seed as optional constructor parameter.
* trigger ci again
* Fix too much parameters in BertAttention.
* Formatting.
* Simplify Flax unittests to avoid machine crashes.
* Fix invalid number of arguments when raising issue for an unknown model.
* Address @bastings comment in PR, moving jax.jit decorated outside of __call__
* Fix incorrect path to require_flax/require_pytorch functions.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Attempt to make style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Correct rebasing of circle-ci dependencies
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix import sorting.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix unused imports.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Again import sorting...
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Installing missing nlp dependency for flax unittests.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix laoding of model for Flax implementations.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* jit the inner function call to make JAX-compatible
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Format !
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Flake one more time 🎶
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Rewrites BERT in Flax to the new Linen API (#7211)
* Rewrite Flax HuggingFace PR to Linen
* Some fixes
* Fix tests
* Fix CI with change of name of nlp (#7054)
* nlp -> datasets
* More nlp -> datasets
* Woopsie
* More nlp -> datasets
* One last
* Expose `is_flax_available` in file_utils.
* Added run_tests_flax to the CI.
* Attempt to make style
* trigger ci again
* Fix import sorting.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Revert "Rewrites BERT in Flax to the new Linen API (#7211)"
This reverts commit 23703a5eb3364e26a1cbc3ee34b4710d86a674b0.
* Remove jnp.lax references
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Reintroduce Linen changes ...
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Use jax native's gelu function.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Renaming BertModel to BertModule to highlight the fact this is the Flax Module object.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Rewrite FlaxAutoModel test to not rely on pretrained_model_archive_map
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove unused variable in BertModule.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove unused variable in BertModule again
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Attempt to have is_flax_available working again.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Introduce JAX TensorType
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Improve ImportError message when trying to convert to various TensorType format.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Makes Flax model jittable.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Ensure flax models are jittable in unittests.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Remove unused imports.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Ensure jax imports are guarded behind is_flax_available.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make style again
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make style again again
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make style again again again
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Update src/transformers/file_utils.py
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
* Bump flax to it's latest version
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
* Bump jax version to at least 0.2.0
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Update the unittest to use TensorType.JAX
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* isort import in tests.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Match new flax parameters name "params"
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove unused imports.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Add flax models to transformers __init__
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Attempt to address all CI related comments.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Correct circle.yml indent.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Correct circle.yml indent (2)
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove coverage from flax tests
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Addressing many naming suggestions from comments
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Simplify for loop logic to interate over layers in FlaxBertLayerCollection
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* use f-string syntax for formatting logs.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Use config property from FlaxPreTrainedModel.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* use "cls_token" instead of "first_token" variable name.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* use "hidden_state" instead of "h" variable name.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Correct class reference in docstring to link to Flax related modules.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added HF + Google Flax team copyright.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make Roberta independent from Bert
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Move activation functions to flax_utils.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Move activation functions to flax_utils for bert.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added docstring for BERT
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Update import for Bert and Roberta tokenizers
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* fix-copies
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Correct FlaxRobertaLayer to match PyTorch.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Use the same store_artifact for flax unittest
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make sure gradient are disabled only locally for flax unittest using torch equivalence.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Use relative imports
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Propagating n_docs as parameter to all RagModel's related functions that defaults to self.config.n_docs
* Making n_docs parameter's default value to None in marginalize function
* Fixing code quality issues
* Handle the special case when generator is of T5PreTrainedModel instance type. T5PreTrainedModel do not have n_docs as parameter
* T5PreTrainedModel do not have n_docs as parameter
* Addressing review comment
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Correcting comment by addressing review comment
* Adding assert statement verifying that n_docs is correctly set. n_docs should be the same for both retriever and generator.
* Fixing flake8 reported issue
* Correcting test datasets for rag
* Using doc_scores instead of context_input_ids to check assert as in RagSequenceForGeneration context_input_ids can be null
* doc_scores second dimension have number of retrieved docs
* Changing assert comment
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* splitting fast and slow tokenizers [WIP]
* [WIP] splitting sentencepiece and tokenizers dependencies
* update dummy objects
* add name_or_path to models and tokenizers
* prefix added to file names
* prefix
* styling + quality
* spliting all the tokenizer files - sorting sentencepiece based ones
* update tokenizer version up to 0.9.0
* remove hard dependency on sentencepiece 🎉
* and removed hard dependency on tokenizers 🎉
* update conversion script
* update missing models
* fixing tests
* move test_tokenization_fast to main tokenization tests - fix bugs
* bump up tokenizers
* fix bert_generation
* update ad fix several tokenizers
* keep sentencepiece in deps for now
* fix funnel and deberta tests
* fix fsmt
* fix marian tests
* fix layoutlm
* fix squeezebert and gpt2
* fix T5 tokenization
* fix xlnet tests
* style
* fix mbart
* bump up tokenizers to 0.9.2
* fix model tests
* fix tf models
* fix seq2seq examples
* fix tests without sentencepiece
* fix slow => fast conversion without sentencepiece
* update auto and bert generation tests
* fix mbart tests
* fix auto and common test without tokenizers
* fix tests without tokenizers
* clean up tests lighten up when tokenizers + sentencepiece are both off
* style quality and tests fixing
* add sentencepiece to doc/examples reqs
* leave sentencepiece on for now
* style quality split hebert and fix pegasus
* WIP Herbert fast
* add sample_text_no_unicode and fix hebert tokenization
* skip FSMT example test for now
* fix style
* fix fsmt in example tests
* update following Lysandre and Sylvain's comments
* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* HerBERT transformer model for Polish language understanding.
* HerbertTokenizerFast generated with HerbertConverter
* Herbert base and large model cards
* Herbert model cards with tags
* Herbert tensorflow models
* Herbert model tests based on Bert test suit
* src/transformers/tokenization_herbert.py edited online with Bitbucket
* src/transformers/tokenization_herbert.py edited online with Bitbucket
* docs/source/model_doc/herbert.rst edited online with Bitbucket
* Herbert tokenizer tests and bug fixes
* src/transformers/configuration_herbert.py edited online with Bitbucket
* Copyrights and tests for TFHerbertModel
* model_cards/allegro/herbert-base-cased/README.md edited online with Bitbucket
* model_cards/allegro/herbert-large-cased/README.md edited online with Bitbucket
* Bug fixes after testing
* Reformat modified_only_fixup
* Proper order of configuration
* Herbert proper documentation formatting
* Formatting with make modified_only_fixup
* Dummies fixed
* Adding missing models to documentation
* Removing HerBERT model as it is a simple extension of BERT
* Update model_cards/allegro/herbert-base-cased/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Update model_cards/allegro/herbert-large-cased/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* HerbertTokenizer deprecated configuration removed
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
in `tests/test_utils_check_copies.py` I was getting intermittently:
```
utils/check_copies.py:52
/mnt/nvme1/code/transformers-comet/utils/check_copies.py:52: DeprecationWarning: invalid escape sequence \s
while line_index < len(lines) and re.search(f"^{indent}(class|def)\s+{name}", lines[line_index]) is None:
```
So this should fix it.
* model card for bert-base-NER
* add meta data up top
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
- TFAutoModelForCausalLM
- TFAutoModelForMaskedLM
- TFAutoModelForSeq2SeqLM
as per deprecation warning. No tests as it simply removes current
warnings from tests.
* Improving Pipelines by defaulting to framework='tf' when
pytorch seems unavailable.
* Actually changing the default resolution order to account for model
defaults
Adding a new tests for each pipeline to check that pipeline(task) works
too without manually adding the framework too.
* use DDP no_sync when possible
* fix is_nlp_available addition mistake
* reformat trainer.py
* reformat trainer.py
* drop support for pytorch < 1.2
* return support for pytorch < 1.2
* Add Documentation for GPT-1 Classification
* Add GPT-1 with Classification head
* Add tests for GPT-1 Classification
* Add GPT-1 For Classification to auto models
* Remove authorized missing keys, change checkpoint to openai-gpt
Added is_torch_tpu_available() to the condition
for saving a model as xla model. "xla_device"
property of config can also be True on a non-xla
device, when loading a checkpointthat was trained
on xla before.
Resolves#7695
* Import intergration libraries first
* isort and black happiness
* flake8 happiness
* Add a test
* Black reformat
* Ignore import order in tests
* A heavy-handed method of disabling comet for tests
* Remove comet_ml tests
* Run black on setup.py
* Reintroduce clean_text call which was removed by mistake in #4723
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added unittest for clean_text parameter on Bert tokenizer.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Better unittest name.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Adapt unittest to use untrained tokenizer.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Code quality + update test
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* [WIP] SP tokenizers
* fixing tests for T5
* WIP tokenizers
* serialization
* update T5
* WIP T5 tokenization
* slow to fast conversion script
* Refactoring to move tokenzier implementations inside transformers
* Adding gpt - refactoring - quality
* WIP adding several tokenizers to the fast world
* WIP Roberta - moving implementations
* update to dev4 switch file loading to in-memory loading
* Updating and fixing
* advancing on the tokenizers - updating do_lower_case
* style and quality
* moving forward with tokenizers conversion and tests
* MBart, T5
* dumping the fast version of transformer XL
* Adding to autotokenizers + style/quality
* update init and space_between_special_tokens
* style and quality
* bump up tokenizers version
* add protobuf
* fix pickle Bert JP with Mecab
* fix newly added tokenizers
* style and quality
* fix bert japanese
* fix funnel
* limite tokenizer warning to one occurence
* clean up file
* fix new tokenizers
* fast tokenizers deep tests
* WIP adding all the special fast tests on the new fast tokenizers
* quick fix
* adding more fast tokenizers in the fast tests
* all tokenizers in fast version tested
* Adding BertGenerationFast
* bump up setup.py for CI
* remove BertGenerationFast (too early)
* bump up tokenizers version
* Clean old docstrings
* Typo
* Update following Lysandre comments
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
* Replaced torch.load for loading the pretrained vocab of TransformerXL to pickle.load
* Replaced torch.save with pickle.dump when saving the vocabulary
* updating transformer-xl
* uploaded on S3 - compatibility
* fix tests
* style
* Address review comments
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* [Model card] SinhalaBERTo model.
This is the model card for keshan/SinhalaBERTo model.
* Update model_cards/keshan/SinhalaBERTo/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Initial callback proposal
* Finish various callbacks
* Post-rebase conflicts
* Fix tests
* Don't use something that's not set
* Documentation
* Remove unwanted print.
* Document all models can work
* Add tests + small fixes
* Update docs/source/internal/trainer_utils.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comments
* Fix TF tests
* Real fix this time
* This one should work
* Fix typo
* Really fix typo
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
- Use cuda:10.2 image instead of 10.1 (to address version mismatch
warning with pytorch)
- Use devel version that is built on the runtime and includes headers
and development tools (was otherwise failing to build apex)
* Create README.md
Model description for all LEGAL-BERT models, published as part of "LEGAL-BERT: The Muppets straight out of Law School". Chalkidis et al., 2018, In Findings of EMNLP 2020
* Update model_cards/nlpaueb/legal-bert-base-uncased/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Fixing top_k and min_length assertions, and a typo fix
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* PoC on RAG
* Format class name/obj name
* Better name in message
* PoC on one TF model
* Add PyTorch and TF dummy objects + script
* Treat scikit-learn
* Bad copy pastes
* Typo
'The class `AutoModelWithLMHead` is deprecated and will be removed in a future version. Please use `AutoModelForCausalLM` for causal language models, `AutoModelForMaskedLM` for masked language models and `AutoModelForSeq2SeqLM` for encoder-decoder models.'
I dont know how to change the 'How to use this model directly from the 🤗/transformers library:' part since it is not part of the model-paper
* 🚩 Add `power` argument for TF PolynomialDecay
* 🚩 Create default optimizer with power
* 🚩 Add argument to training args
* 🚨 Clean code format
* 🚨 Fix black warning
* 🚨 Fix code format
* configuration_squeezebert.py
thin wrapper around bert tokenizer
fix typos
wip sb model code
wip modeling_squeezebert.py. Next step is to get the multi-layer-output interface working
set up squeezebert to use BertModelOutput when returning results.
squeezebert documentation
formatting
allow head mask that is an array of [None, ..., None]
docs
docs cont'd
path to vocab
docs and pointers to cloud files (WIP)
line length and indentation
squeezebert model cards
formatting of model cards
untrack modeling_squeezebert_scratchpad.py
update aws paths to vocab and config files
get rid of stub of NSP code, and advise users to pretrain with mlm only
fix rebase issues
redo rebase of modeling_auto.py
fix issues with code formatting
more code format auto-fixes
move squeezebert before bert in tokenization_auto.py and modeling_auto.py because squeezebert inherits from bert
tests for squeezebert modeling and tokenization
fix typo
move squeezebert before bert in modeling_auto.py to fix inheritance problem
disable test_head_masking, since squeezebert doesn't yet implement head masking
fix issues exposed by the test_modeling_squeezebert.py
fix an issue exposed by test_tokenization_squeezebert.py
fix issue exposed by test_modeling_squeezebert.py
auto generated code style improvement
issue that we inherited from modeling_xxx.py: SqueezeBertForMaskedLM.forward() calls self.cls(), but there is no self.cls, and I think the goal was actually to call self.lm_head()
update copyright
resolve failing 'test_hidden_states_output' and remove unused encoder_hidden_states and encoder_attention_mask
docs
add integration test. rename squeezebert-mnli --> squeezebert/squeezebert-mnli
autogenerated formatting tweaks
integrate feedback from patrickvonplaten and sgugger to programming style and documentation strings
* tiny change to order of imports
* LayoutLM: add exception handling for bbox values
To replicate unhandled error:
- In `test_modelling_layoutlm.py` set `range_bbox=1025`, i.e. greater 1024
- Run `pytest tests/test_modeling_layoutlm.py`
Requirement for bbox values to be within the range 0-1000 is documented
but if it is violated then it isa not clear what is the issue from error
message.
* Update src/transformers/modeling_layoutlm.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Trainer should not modify its TrainingArguments
* Trainer should not modify its TrainingArguments
* Trainer should not modify its TrainingArguments
* Add test of resumed training
* Fixes
* Non multiGPU test
* Clean Trainer state
* Add more to the state
* Documentation
* One last test
* Make resume training test more complete
* Unwanted changes
* t5 t5 community notebook added
* author link updated
* t5 t5 community notebook added
* author link updated
* new colab link updated
Co-authored-by: harris <muhammad.harris@visionx.io>
* GPT2 gradient checkpointing
* find_unused_parameters removed if checkpointing
* find_unused_parameters removed if checkpointing
* Update src/transformers/configuration_gpt2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Added a test for generation with checkpointing
* Update src/transformers/configuration_gpt2.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [makefile] check/fix only modified since branching files
* fix phonies
* parametrize dirs
* have only one source for dirs to check
* look ma, no autoformatters here
* [code quality] merge style and quality targets
Any reason why we don't run `flake8` in `make style`? I find myself needing to run `make style` and `make quality` all the time, but I need the latter just for the last 2 checks. Since we have no control over the source code why bother with separating checking and fixing - let's just have one target that fixes and then performs the remaining checks, as we know the first two have been done already.
This PR suggests to merge the 2 targets into one efficient target.
I will edit the docs if this change resonates with the team.
* move checks into style, re-use target
* better name
* add fixup target
* document new target
Previously, the TFTrainingArguments object did a check to see if XLA was enabled, but did this by referencing `self.args.xla`, when it should be `self.xla`, because it is the args object. This can be verified a few lines above, where the XLA field is set.
* Changed name to all no_... arguments and all references to them, inverting the boolean condition
* Change benchmark tests to use new Benchmark Args
* Update src/transformers/benchmark/benchmark_args_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/benchmark/benchmark.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix Style. Add --no options in help
* fix some part of tests
* Update src/transformers/benchmark/benchmark_args_utils.py
* Update src/transformers/benchmark/benchmark_args_utils.py
* Update src/transformers/benchmark/benchmark_args_utils.py
* fix all tests
* make style
* add backwards compability
* make backwards compatible
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: fmcurti <fcurti@DESKTOP-RRQURBM.localdomain>
* Ensure that intergrations are imported before transformers or ml libs
* Black reformatter wanted a newline
* isort requests
* black requests
* flake8 requests
* Clean up model documentation
* Formatting
* Preparation work
* Long lines
* Main work on rst files
* Cleanup all config files
* Syntax fix
* Clean all tokenizers
* Work on first models
* Models beginning
* FaluBERT
* All PyTorch models
* All models
* Long lines again
* Fixes
* More fixes
* Update docs/source/model_doc/bert.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update docs/source/model_doc/electra.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Last fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* skip decorators: docs, tests, bugs
* another important note
* style
* bloody style
* add @pytest.mark.parametrize
* add note
* no idea what it wants :(
* fix confused flake
We run `black --target-version py35 ...` but flake8 doesn't know that, so currently with py38 flake8 fails suggesting that black should have reformatted 63 files. Indeed if I run:
```
black --line-length 119 --target-version py38 examples templates tests src utils
```
it indeed reformats 63 files.
The only solution I found is to create a black config file as explained at https://github.com/psf/black#configuration-format, which is what this PR adds.
Now flake8 knows that py35 is the standard and no longer gets confused regardless of the user's python version.
* adjust the other files that will now rely on black's config file
* Add dataloader_num_workers to TrainingArguments
This argument is meant to be used to set the
number of workers for the PyTorch DataLoader.
* Pass num_workers argument on DataLoader init
* added rag WIP
* path fix
* Formatting / renaming prior to actual work
* added rag WIP
* path fix
* Formatting / renaming prior to actual work
* added rag WIP
* path fix
* Formatting / renaming prior to actual work
* added rag WIP
* Formatting / renaming prior to actual work
* First commit
* improve comments
* Retrieval evaluation scripts
* refactor to include modeling outputs + MPI retriever
* Fix rag-token model + refactor
* Various fixes + finetuning logic
* use_bos fix
* Retrieval refactor
* Finetuning refactoring and cleanup
* Add documentation and cleanup
* Remove set_up_rag_env.sh file
* Fix retrieval wit HF index
* Fix import errors
* Fix quality errors
* Refactor as per suggestions in https://github.com/huggingface/transformers/pull/6813#issuecomment-687208867
* fix quality
* Fix RAG Sequence generation
* minor cleanup plus initial tests
* fix test
* fix tests 2
* Comments fix
* post-merge fixes
* Improve readme + post-rebase refactor
* Extra dependencied for tests
* Fix tests
* Fix tests 2
* Refactor test requirements
* Fix tests 3
* Post-rebase refactor
* rename nlp->datasets
* RAG integration tests
* add tokenizer to slow integration test and allow retriever to run on cpu
* add tests; fix position ids warning
* change structure
* change structure
* add from encoder generator
* save working solution
* make all integration tests pass
* add RagTokenizer.save/from_pretrained and RagRetriever.save/from_pretrained
* don't save paths
* delete unnecessary imports
* pass config to AutoTokenizer.from_pretrained for Rag tokenizers
* init wiki_dpr only once
* hardcode legacy index and passages paths (todo: add the right urls)
* finalize config
* finalize retriver api and config api
* LegacyIndex index download refactor
* add dpr to autotokenizer
* make from pretrained more flexible
* fix ragfortokengeneration
* small name changes in tokenizer
* add labels to models
* change default index name
* add retrieval tests
* finish token generate
* align test with previous version and make all tests pass
* add tests
* finalize tests
* implement thoms suggestions
* add first version of test
* make first tests work
* make retriever platform agnostic
* naming
* style
* add legacy index URL
* docstrings + simple retrieval test for distributed
* clean model api
* add doc_ids to retriever's outputs
* fix retrieval tests
* finish model outputs
* finalize model api
* fix generate problem for rag
* fix generate for other modles
* fix some tests
* save intermediate
* set generate to default
* big refactor generate
* delete rag_api
* correct pip faiss install
* fix auto tokenization test
* fix faiss install
* fix test
* move the distributed logic to examples
* model page
* docs
* finish tests
* fix dependencies
* fix import in __init__
* Refactor eval_rag and finetune scripts
* start docstring
* add psutil to test
* fix tf test
* move require torch to top
* fix retrieval test
* align naming
* finish automodel
* fix repo consistency
* test ragtokenizer save/load
* add rag model output docs
* fix ragtokenizer save/load from pretrained
* fix tokenizer dir
* remove torch in retrieval
* fix docs
* fixe finetune scripts
* finish model docs
* finish docs
* remove auto model for now
* add require torch
* remove solved todos
* integrate sylvains suggestions
* sams comments
* correct mistake on purpose
* improve README
* Add generation test cases
* fix rag token
* clean token generate
* fix test
* add note to test
* fix attention mask
* add t5 test for rag
* Fix handling prefix in finetune.py
* don't overwrite index_name
Co-authored-by: Patrick Lewis <plewis@fb.com>
Co-authored-by: Aleksandra Piktus <piktus@devfair0141.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5102.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5067.h2.fair>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
* Copy code from Bert to Roberta and add safeguard script
* Fix docstring
* Comment code
* Formatting
* Update src/transformers/modeling_roberta.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Add test and fix bugs
* Fix style and make new comand
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* fix USE_CUDA, add pipeline
* USE_CUDA fix
* recode SinusoidalPositionalEmbedding into nn.Embedding subclass
was needed for torchscript to work - this is now part of the state_dict, so will have to remove these keys during save_pretrained
* back out (ci debug)
* restore
* slow last?
* facilitate not saving certain keys and test
* remove no longer used keys
* style
* fix logging import
* cleanup
* Update src/transformers/modeling_utils.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* fix bug in max_positional_embeddings
* rename keys to keys_to_never_save per suggestion, improve the setup
* Update src/transformers/modeling_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Two new pre-trained models "vinai/bertweet-covid19-base-cased" and "vinai/bertweet-covid19-base-uncased" are resulted by further pre-training the pre-trained model "vinai/bertweet-base" on a corpus of 23M COVID-19 English Tweets for 40 epochs.
* Add BERTweet and PhoBERT models
* Update modeling_auto.py
Re-add `bart` to LM_MAPPING
* Update tokenization_auto.py
Re-add `from .configuration_mobilebert import MobileBertConfig`
not sure why it's replaced by `from transformers.configuration_mobilebert import MobileBertConfig`
* Add BERTweet and PhoBERT to pretrained_models.rst
* Update tokenization_auto.py
Remove BertweetTokenizer and PhobertTokenizer out of tokenization_auto.py (they are currently not supported by AutoTokenizer.
* Update BertweetTokenizer - without nltk
* Update model card for BERTweet
* PhoBERT - with Auto mode - without import fastBPE
* PhoBERT - with Auto mode - without import fastBPE
* BERTweet - with Auto mode - without import fastBPE
* Add PhoBERT and BERTweet to TF modeling auto
* Improve Docstrings for PhobertTokenizer and BertweetTokenizer
* Update PhoBERT and BERTweet model cards
* Fixed a merge conflict in tokenization_auto
* Used black to reformat BERTweet- and PhoBERT-related files
* Used isort to reformat BERTweet- and PhoBERT-related files
* Reformatted BERTweet- and PhoBERT-related files based on flake8
* Updated test files
* Updated test files
* Updated tf test files
* Updated tf test files
* Updated tf test files
* Updated tf test files
* Update commits from huggingface
* Delete unnecessary files
* Add tokenizers to auto and init files
* Add test files for tokenizers
* Revised model cards
* Update save_vocabulary function in BertweetTokenizer and PhobertTokenizer and test files
* Revised test files
* Update orders of Phobert and Bertweet tokenizers in auto tokenization file
* [model cards] ported allenai Deep Encoder, Shallow Decoder models
* typo
* fix references
* add allenai/wmt19-de-en-6-6 model cards
* fill-in the missing info for the build script as provided by the searcher.
* ready for PR
* cleanup
* correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST
* fix
* perfectionism
* revert change from another PR
* odd, already committed this one
* non-interactive upload workaround
* backup the failed experiment
* store langs in config
* workaround for localizing model path
* doc clean up as in https://github.com/huggingface/transformers/pull/6956
* style
* back out debug mode
* document: run_eval.py --num_beams 10
* remove unneeded constant
* typo
* re-use bart's Attention
* re-use EncoderLayer, DecoderLayer from bart
* refactor
* send to cuda and fp16
* cleanup
* revert (moved to another PR)
* better error message
* document run_eval --num_beams
* solve the problem of tokenizer finding the right files when model is local
* polish, remove hardcoded config
* add a note that the file is autogenerated to avoid losing changes
* prep for org change, remove unneeded code
* switch to model4.pt, update scores
* s/python/bash/
* missing init (but doesn't impact the finetuned model)
* cleanup
* major refactor (reuse-bart)
* new model, new expected weights
* cleanup
* cleanup
* full link
* fix model type
* merge porting notes
* style
* cleanup
* have to create a DecoderConfig object to handle vocab_size properly
* doc fix
* add note (not a public class)
* parametrize
* - add bleu scores integration tests
* skip test if sacrebleu is not installed
* cache heavy models/tokenizers
* some tweaks
* remove tokens that aren't used
* more purging
* simplify code
* switch to using decoder_start_token_id
* add doc
* Revert "major refactor (reuse-bart)"
This reverts commit 226dad15ca6a9ef4e26178526e878e8fc5c85874.
* decouple from bart
* remove unused code #1
* remove unused code #2
* remove unused code #3
* update instructions
* clean up
* move bleu eval to examples
* check import only once
* move data+gen script into files
* reuse via import
* take less space
* add prepare_seq2seq_batch (auto-tested)
* cleanup
* recode test to use json instead of yaml
* ignore keys not needed
* use the new -y in transformers-cli upload -y
* [xlm tok] config dict: fix str into int to match definition (#7034)
* [s2s] --eval_max_generate_length (#7018)
* Fix CI with change of name of nlp (#7054)
* nlp -> datasets
* More nlp -> datasets
* Woopsie
* More nlp -> datasets
* One last
* extending to support allen_nlp wmt models
- allow a specific checkpoint file to be passed
- more arg settings
- scripts for allen_nlp models
* sync with changes
* s/fsmt-wmt/wmt/ in model names
* s/fsmt-wmt/wmt/ in model names (p2)
* s/fsmt-wmt/wmt/ in model names (p3)
* switch to a better checkpoint
* typo
* make non-optional args such - adjust tests where possible or skip when there is no other choice
* consistency
* style
* adjust header
* cards moved (model rename)
* use best custom hparams
* update info
* remove old cards
* cleanup
* s/stas/facebook/
* update scores
* s/allen_nlp/allenai/
* url maps aren't needed
* typo
* move all the doc / build /eval generators to their own scripts
* cleanup
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* fix indent
* duplicated line
* style
* use the correct add_start_docstrings
* oops
* resizing can't be done with the core approach, due to 2 dicts
* check that the arg is a list
* style
* style
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Removed 'tgt_len' and 'ext_len' from Transfomer-XL
* Some changes are still to be done
* Removed 'tgt_len' and 'ext_len' from Transfomer-XL (2)
* Removed comments
* Fixed quality
* Changed warning to info
* added multilabel classification using distilbert notebook to community notebooks
* added multilabel classification using distilbert notebook to community notebooks
```
/home/circleci/.local/lib/python3.6/site-packages/isort/main.py:915: UserWarning: W0501: The following deprecated CLI flags were used and ignored: --recursive!
"W0501: The following deprecated CLI flags were used and ignored: "
```
* fix ZeroDivisionError and epoch counting
* Add test for num_train_epochs calculation in trainer.py
* Remove @require_non_multigpu for test_num_train_epochs_in_training
* create branch for issue #6968
* First attempt to fix incorrect tf trainer loss calculation
* Fix training loss in metric
* fix tf trainer evaluation loss
* apply count_instances_in_batch() for eval and test datasets
* prototype of using a new argument in trainer_tf.py to fix loss issue
* some renaming and fix, in particular for evaluation methods
* fix bugs to have a running version
* change to @staticmethod
* apply style
* Add Tuna Mirror for Downloads from China
* format fix
* Use preset instead of hardcoding URL
* Fix
* make style
* update the mirror option doc
* update the mirror
* adding demo
* Update examples/lxmert/requirements.txt
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update examples/lxmert/checkpoint.sh
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* added user input for .py demo
* updated model loading, data extrtaction, checkpoints, and lots of other automation
* adding normalizing for bounding boxes
* Update requirements.txt
* some optimizations for extracting data
* added data extracting file
* added data extraction file
* minor fixes to reqs and readme
* Style
* remove options
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Add tests and fix various bugs in ModelOutput
* Update tests/test_model_output.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix to ensure that returned tensors after the tokenization is Long
* fix to ensure that returned tensors after the tokenization is Long
Co-authored-by: Ashwin Geet Dsa <adsa@grvingt-6.nancy.grid5000.fr>
* add dataset for albert pretrain
* datacollator for albert pretrain
* naming, comprehension, file reading change
* data cleaning is no needed after this modification
* delete prints
* fix a bug
* file structure change
* add tests for albert datacollator
* remove random seed
* add back len and get item function
* sample file for testing and test code added
* format change for black
* more format change
* Style
* var assignment issue resolve
* add back wrongly deleted DataCollatorWithPadding in init file
* Style
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Currently beam search returns inconsistent outputs - if hypos have different lengths we get eos, if they are the same - we don't.
This PR makes the output consistent.
Also why not also replace:
```
if sent_lengths[i] < max_length:
decoded[i, sent_lengths[i]] = eos_token_id
```
with:
```
decoded[i, sent_lengths[i]] = eos_token_id
```
Shouldn't eos always be there? If the data gets truncated, the caller needs to user a larger `max_length`.
Please correct me if my logic is flawed.
* Should check if `torch` is available
* fixed samples_count error, distributed_concat arguments
* style
* Import torch at beginning of file
Co-authored-by: TevenLeScao <teven.lescao@gmail.com>
* Initial model
* Fix upsampling
* Add special cls token id and test
* Formatting
* Test and fist FunnelTokenizerFast
* Common tests
* Fix the check_repo script and document Funnel
* Doc fixes
* Add all models
* Write doc
* Fix test
* Initial model
* Fix upsampling
* Add special cls token id and test
* Formatting
* Test and fist FunnelTokenizerFast
* Common tests
* Fix the check_repo script and document Funnel
* Doc fixes
* Add all models
* Write doc
* Fix test
* Fix copyright
* Forgot some layers can be repeated
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/modeling_funnel.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comments
* Update src/transformers/modeling_funnel.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Address review comments
* Update src/transformers/modeling_funnel.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* Slow integration test
* Make small integration test
* Formatting
* Add checkpoint and separate classification head
* Formatting
* Expand list, fix link and add in pretrained models
* Styling
* Add the model in all summaries
* Typo fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* fixed trainer tr_loss memory leak
* detached returned training loss from computation graph in the Trainer class' training_step() method
* Revert "fixed trainer tr_loss memory leak"
This reverts commit 47226e4e
ParsBERT v2.0 is a fine-tuned and vocab-reconstructed version of ParsBERT, and it's able to be used in other scopes!
It includes these features:
- We added some unused-vocab for use in summarization and other scopes.
- We fine-tuned the model on vast styles of writing in the Persian language.
my flake8 wasn't up-to-date enough `make quality` wasn't reporting the same things CI did - this PR adds the actual required version.
Thinking more about some of these minimal versions - CI will always install afresh and thus will always run the latest version. Is there a way to tell pip to always install the latest versions of certain dependencies on `pip install -i ".[dev]"`, rather than hardcoding the minimals which quickly become outdated?
* [gen utils] missing else case
1. `else` is missing - I hit that case while porting a model. Probably needs to assert there?
2. also the comment on top seems to be outdated (just vocab_size is being set there)
* typo
unittest doesn't support pytest's super-handy `@pytest.mark.parametrize`, I researched and there are many proposed workarounds, most tedious at best. If we include https://pypi.org/project/parameterized/ in dev dependencies - it will provide a very easy to write parameterization in tests. Same as pytest's fixture, plus quite a few other ways.
Example:
```
from parameterized import parameterized
@parameterized([
(2, 2, 4),
(2, 3, 8),
(1, 9, 1),
(0, 9, 0),
])
def test_pow(base, exponent, expected):
assert_equal(math.pow(base, exponent), expected)
```
(extra `self`var if inside a test class)
To remind the pytest style is slightly different:
```
@pytest.mark.parametrize("test_input,expected", [("3+5", 8), ("2+4", 6), ("6*9", 42)])
def test_eval(test_input, expected):
```
More examples here: https://pypi.org/project/parameterized
May I suggest that it will make it much easier to write some types of tests?
* Create Readme.MD for KanBERTo
KanBERTo language model readme for Kannada language.
* Update model_cards/Naveen-k/KanBERTo/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Remove hard-coded uses of float32 to fix mixed precision use in TF Distilbert
* fix style
* fix gelu dtype issue in TF Distilbert
* fix numeric overflow while using half precision
Since `generate()` does:
```
num_beams = num_beams if num_beams is not None else self.config.num_beams
```
This test fails if `model.config.num_beams > 1` (which is the case in the model I'm porting).
This fix makes the test setup unambiguous by passing an explicit `num_beams=1` to `generate()`.
Thanks.
* Add cache_dir to save features TextDataset
This is in case the dataset is in a RO filesystem, for which is the case
in tests (GKE TPU tests).
* style
* Introduce HPO checkpointing for PBT
* Moved checkpoint saving
* Fixed checkpoint subdir pass
* Fixed style
* Enable/disable checkpointing, check conditions for various tune schedulers incl. PBT
* Adjust number of GPUs to number of jobs
* Avoid mode pickling in ray
* Move hp search to integrations
* Only access loss tensor every logging_steps
* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards
* Fix style (#6803)
* t5 model should make decoder_attention_mask (#6800)
* [s2s] Test hub configs in self-scheduled CI (#6809)
* [s2s] round runtime in run_eval (#6798)
* Pegasus finetune script: add --adafactor (#6811)
* [bart] rename self-attention -> attention (#6708)
* [tests] fix typos in inputs (#6818)
* Fixed open in colab link (#6825)
* Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827)
* BR_BERTo model card (#6793)
* clearly indicate shuffle=False (#6312)
* Clarify shuffle
* clarify shuffle
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* [s2s README] Add more dataset download instructions (#6737)
* Style
* Patch logging issue
* Set default logging level to `WARNING` instead of `INFO`
* TF Flaubert w/ pre-norm (#6841)
* Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644)
* add datacollator and dataset for next sentence prediction task
* bug fix (numbers of special tokens & truncate sequences)
* bug fix (+ dict inputs support for data collator)
* add padding for nsp data collator; renamed cached files to avoid conflict.
* add test for nsp data collator
* Style
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Fix in Adafactor docstrings (#6845)
* Fix resuming training for Windows (#6847)
* Only access loss tensor every logging_steps
* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards
* comments
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Thomas Ashish Cherian <6967017+PandaWhoCodes@users.noreply.github.com>
Co-authored-by: Zane Lim <zyuanlim@gmail.com>
Co-authored-by: Rodolfo De Nadai <rdenadai@gmail.com>
Co-authored-by: xujiaze13 <37360975+xujiaze13@users.noreply.github.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Huang Lianzhe <hlz@pku.edu.cn>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add datacollator and dataset for next sentence prediction task
* bug fix (numbers of special tokens & truncate sequences)
* bug fix (+ dict inputs support for data collator)
* add padding for nsp data collator; renamed cached files to avoid conflict.
* add test for nsp data collator
* Style
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Improved tokenization with sacremoses
* The TransfoXLTokenizer is now using sacremoses for tokenization
* Added tokenization of comma-separated and floating point numbers.
* Removed prepare_for_tokenization() from tokenization_transfo_xl.py because punctuation is handled by sacremoses
* Added corresponding tests
* Removed test comapring TransfoXLTokenizer and TransfoXLTokenizerFast
* Added deprecation warning to TransfoXLTokenizerFast
* isort change
Co-authored-by: Teven <teven.lescao@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* AdaFactor optimizer ported from fairseq. Tested for T5 finetuning and MLM -- reduced memory consumption compared to ADAM.
* update PR fixes, add basic test
* bug -- incorrect params in test
* bugfix -- import Adafactor into test
* bugfix -- removed accidental T5 include
* resetting T5 to master
* bugfix -- include Adafactor in __init__
* longer loop for adafactor test
* remove double error class declare
* lint
* black
* isort
* Update src/transformers/optimization.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* single docstring
* Cleanup docstring
Co-authored-by: Nikolai Y <nikolai.yakovenko@point72.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* add tf graph compile tests
* fix conflict
* remove more tf transpose statements
* fix conflicts
* fix comment typos
* move function to class function
* fix black
* fix black
* make style
* add tie_word_embeddings
* correct word embeddings in modeling utils
* make style
* make config param only relevant for torch
* make style
* correct typo
* delete deprecated arg in transo-xl
* Allow tests in examples to use cuda or fp16,if they are available
The tests in examples didn't use the cuda or fp16 even if they where available.
- The text classification example (`run_glue.py`) didn't use the fp16 even if it was available but
the device was take based on the availablity(cuda/cpu).
- The language-modeling example (`run_language_modeling.py`) was having `--no_cuda` argument
which made the test to work without cuda. This example is having issue when running with fp16
thus it not enabled (got an assertion error for perplexity due to it higher value).
- The cuda and fp16 is not enabled for question-answering example (`run_squad.py`) as it is having a
difference in the f1 score.
- The text-generation example (`run_generation.py`) will take the cuda or fp16 whenever it is available.
Resolves some of: #5057
* Unwanted import of is_apex_available was removed
* Made changes to test examples file to have the pass --fp16 only if cuda and apex is avaliable
- run_glue.py: Removed the check for cuda and fp16.
- run_generation.py: Removed the check for cuda and fp16 also removed unwanted flag creation.
* Incorrectly sorted imports fixed
* The model needs to be converted to half precision
* Formatted single line if condition statement to multiline
* The torch_device also needed to be checked before running the test on examples
- The tests in examples which uses cuda should also depend from the USE_CUDA flag,
similarly to the rest of the test suite. Even if we decide to set USE_CUDA to
True by default, setting USE_CUDA to False should result in the examples not using CUDA
* Format some of the code in test_examples file
* The improper import of is_apex_available was sorted
* Formatted the code to keep the style standards
* The comma at the end of list giving a flake8 issue was fixed
* Import sort was fixed
* Removed the clean_test_dir function as its not used right now
* Add model card for singbert.
Adding a model card for singbert- bert for singlish and manglish.
* Update README.md
Add additional tags and model name.
* Update README.md
Fix tag for malay.
* Update model_cards/zanelim/singbert/README.md
Fix language
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* Add examples and custom widget input.
Add examples and custom widget input.
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* Create PULL_REQUEST_TEMPLATE.md
Proposing to copy this neat feature from pytorch. This is a small template that let's a PR submitter tell which issue that PR closes.
* Update .github/PULL_REQUEST_TEMPLATE.md
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* Add optuna hyperparameter search to Trainer
* @julien-c suggestions
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Make compute_objective an arg function
* Formatting
* Rework to make it easier to add ray
* Formatting
* Initial support for Ray
* Formatting
* Polish and finalize
* Add trial id to checkpoint with Ray
* Smaller default
* Use GPU in ray if available
* Formatting
* Fix test
* Update install instruction
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Address review comments
* Formatting post-merge
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Tested in a local build of the docs.
e.g. Just above https://huggingface.co/transformers/task_summary.html#causal-language-modeling
Copy will copy the full code, e.g.
for token in top_5_tokens:
print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))
Instead of currently only:
for token in top_5_tokens:
>>> for token in top_5_tokens:
... print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help reduce our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help increase our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help decrease our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help offset our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help improve our carbon footprint.
Docs for the option fix:
https://sphinx-copybutton.readthedocs.io/en/latest/
* Feed forward chunking for Distilbert & Albert
* Added ff chunking for many other models
* Change model signature
* Added chunking for XLM
* Cleaned up by removing some variables.
* remove test_chunking flag
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
As discussed at https://github.com/huggingface/transformers/issues/6317 codecov currently sends an invalid report when it fails to find a code coverage report for the base it checks against, so this gets fixed by:
- require_base: yes # don't report if there is no base coverage report
let's add this for clarity, this supposedly is already the default.
- require_head: yes # don't report if there is no head coverage report
and perhaps no point reporting on doc changes as they don't make any difference and it just generates noise:
- require_changes: true # only comment if there was change in coverage
* [doc] multiple corrections to "Summary of the tasks"
* fix indentation
* correction
* fix links, add links to examples/seq2seq/README.md instead of non-existing script
* [testing] switch to a new TestCasePlus + get_auto_remove_tmp_dir() for auto-removal of tmp dirs
* respect after=True for tempfile, simplify code
* comments
* comment fix
* put `before` last in args, so can make debug even faster
Currently with the bug introduced we're taking two optimizer steps per
batch: one global one, where `xm.optimizer_step` injects a CRS between
all cores in training, and one without. This has been affecting training
accuracy (for example, XLNet GLUE on MNLI is not converging, etc.).
* Generation doc
* MBartForConditionalGeneration (#6441)
* add MBartForConditionalGeneration
* style
* rebase and fixes
* add mbart test in TEST_FILES_WITH_NO_COMMON_TESTS
* fix docs
* don't ignore mbart
* doc
* fix mbart fairseq link
* put mbart before bart
* apply doc suggestions
* Use hash to clean the test dirs (#6475)
* Use hash to clean the test dirs
* Use hash to clean the test dirs
* Use hash to clean the test dirs
* fix
* [EncoderDecoder] Add Cross Attention for GPT2 (#6415)
* add cross attention layers for gpt2
* make gpt2 cross attention work
* finish bert2gpt2
* add explicit comments
* remove attention mask since not yet supported
* revert attn mask in pipeline
* Update src/transformers/modeling_gpt2.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_encoder_decoder.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Sort unique_no_split_tokens to make it deterministic (#6461)
* change unique_no_split_tokens's type to set
* use sorted list instead of set
* style
* Import accuracy_score (#6480)
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address comments
* Styling
* Generation doc
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address comments
* Styling
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Co-authored-by: gijswijnholds <gijswijnholds@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Add more token classification examples
* POS tagging example
* Phrase chunking example
* PR review fixes
* Add conllu to third party list (used in token classification examples)
* cleanup torch unittests: part 2
* remove trailing comma added by isort, and which breaks flake
* one more comma
* revert odd balls
* part 3: odd cases
* more ["key"] -> .key refactoring
* .numpy() is not needed
* more unncessary .numpy() removed
* more simplification
* Data collator with padding
* Add type annotation
* Support tensors as well
* Add comment
* Fix for labels wrong shape
* Data collator with padding
* Add type annotation
* Support tensors as well
* Add comment
* Fix for labels wrong shape
* Remove changes rendered unnecessary
* allow using tokenizer.pad as a collate_fn in pytorch
* allow using tokenizer.pad as a collate_fn in pytorch
* Add documentation and tests
* Make attention mask the right shape
* Better test
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* replace capsys with the more refined CaptureStderr/CaptureStdout
* Update examples/seq2seq/test_seq2seq_examples.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* [wip] add get_polynomial_decay_schedule_with_warmup
* style
* add assert
* change lr_end to a much smaller default number
* check for exact equality
* [model_cards] electra-base-turkish-cased-ner (#6350)
* for electra-base-turkish-cased-ner
* Add metadata
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Temporarily de-activate TPU CI
* Update modeling_tf_utils.py (#6372)
fix typo: ckeckpoint->checkpoint
* the test now works again (#6371)
* correct pl link in readme (#6364)
* refactor almost identical tests (#6339)
* refactor almost identical tests
* important to add a clear assert error message
* make the assert error even more descriptive than the original bt
* Small docfile fixes (#6328)
* Patch models (#6326)
* TFAlbertFor{TokenClassification, MultipleChoice}
* Patch models
* BERT and TF BERT info
s
* Update check_repo
* Ci GitHub caching (#6382)
* Cache Github Actions CI
* Remove useless file
* Colab button (#6389)
* Add colab button
* Add colab link for tutorials
* Fix links for open in colab (#6391)
* Update src/transformers/optimization.py
consistently use lr_end=1e-7 default
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [wip] add get_polynomial_decay_schedule_with_warmup
* style
* add assert
* change lr_end to a much smaller default number
* check for exact equality
* Update src/transformers/optimization.py
consistently use lr_end=1e-7 default
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* remove dup (leftover from merge)
* convert the test into the new refactored format
* stick to using the current_step as is, without ++
Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Alexander Measure <ameasure@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Optimized banned token masking
* Avoid duplicate EOS masking if in bad_words_id
* Updated mask generation to handle empty banned token list
* Addition of unit tests for the updated bad_words_ids masking
* Updated timeout handling in `test_postprocess_next_token_scores_large_bad_words_list` unit test
* Updated timeout handling in `test_postprocess_next_token_scores_large_bad_words_list` unit test (timeout does not work on Windows)
* Moving Marian import to the test context to allow TF only environments to run
* Moving imports to torch_available test
* Updated operations device and test
* Updated operations device and test
* Added docstring and comment for in-place scores modification
* Moving test to own test_generation_utils, use of lighter models for testing
* removed unneded imports in test_modeling_common
* revert formatting change for ModelTesterMixin
* Updated caching, simplified eos token id test, removed unnecessary @require_torch
* formatting compliance
* Warn if debug requested without TPU fixes (#6308)
Check whether a PyTorch compatible TPU is available before attempting to print TPU metrics after training has completed. This way, users who apply `--debug` without reading the documentation aren't suprised by a stacktrace.
* Style
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* add pl_glue example test
* for now just test that it runs, next validate results of eval or predict?
* complete the run_pl_glue test to validate the actual outcome
* worked on my machine, CI gets less accuracy - trying higher epochs
* match run_pl.sh hparms
* more epochs?
* trying higher lr
* for now just test that the script runs to a completion
* correct the comment
* if cuda is available, add --fp16 --gpus=1 to cover more bases
* style
* Chunked feed forward for Bert
This is an initial implementation to test applying feed forward chunking for BERT.
Will need additional modifications based on output and benchmark results.
* Black and cleanup
* Feed forward chunking in BertLayer class.
* Isort
* add chunking for all models
* fix docs
* Fix typo
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* improve names and tests longformer
* more and better tests for longformer
* add first tf test
* finalize tf basic op functions
* fix merge
* tf shape test passes
* narrow down discrepancies
* make longformer local attn tf work
* correct tf longformer
* add first global attn function
* add more global longformer func
* advance tf longformer
* finish global attn
* upload big model
* finish all tests
* correct false any statement
* fix common tests
* make all tests pass except keras save load
* fix some tests
* fix torch test import
* finish tests
* fix test
* fix torch tf tests
* add docs
* finish docs
* Update src/transformers/modeling_longformer.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/modeling_tf_longformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply Lysandres suggestions
* reverse to assert statement because function will fail otherwise
* applying sylvains recommendations
* Update src/transformers/modeling_longformer.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* Update src/transformers/modeling_tf_longformer.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
as discussed with @sshleifer, removing this TODO to switch to a tiny model, since it won't be able to test the results of the evaluation (i.e. the results are meaningless).
* Add a script to check all models are tested and documented
* Apply suggestions from code review
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* Address comments
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* Single workflow cache test
Remove cache dir, re-trigger cache
Only pip archives
Not sudo when pip
* All workflow cache
Remove no-cache-dir instruction
Remove last sudo occurrences
v0.3
* Support for Comet.ml
* Need to import comet first
* Log this model, not the one in the backprop step
* Log args as hyperparameters; use framework to allow fine control
* Log hyperparameters with context
* Apply black formatting
* isort fix integrations
* isort fix __init__
* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/trainer_tf.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Address review comments
* Style + Quality, remove Tensorboard import test
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Add strip_accents to basic tokenizer
* Add tests for strip_accents.
* fix style with black
* Fix strip_accents test
* empty commit to trigger CI
* Improved strip_accents check
* Add code quality with is not False
* TF outputs and test on BERT
* Albert to DistilBert
* All remaining TF models except T5
* Documentation
* One file forgotten
* TF outputs and test on BERT
* Albert to DistilBert
* All remaining TF models except T5
* Documentation
* One file forgotten
* Add new models and fix issues
* Quality improvements
* Add T5
* A bit of cleanup
* Fix for slow tests
* Style
* Add SequenceClassification and MultipleChoice TF models to Electra
* Apply style
* Add summary_proj_to_labels to Electra config
* Finally mirroring the PT version of these models
* Apply style
* Fix Electra test
* support --lr_scheduler with multiple possibilities
* correct the error message
* add a note about supported schedulers
* cleanup
* cleanup2
* needs the argument default
* style
* add another assert in the test
* implement requested changes
* cleanups
* fix relative import
* cleanup
* Update to match renamed attributes in fairseq master
RobertaModel no longer have model.encoder and args.num_classes attributes as of 5/28/20.
* Quality
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Adding docs for how to load encoder_decoder pretrained model with individual config objects
* Adding docs for loading encoder_decoder config from pretrained folder
* Fixing W293 blank line contains whitespace
* Update src/transformers/modeling_encoder_decoder.py
* Update src/transformers/modeling_encoder_decoder.py
* Update src/transformers/modeling_encoder_decoder.py
* Apply suggestions from code review
model file should only show examples for how to load save model
* Update src/transformers/configuration_encoder_decoder.py
* Update src/transformers/configuration_encoder_decoder.py
* fix space
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* improve unit tests
this is a sample of one test according to the request in https://github.com/huggingface/transformers/issues/5973
before I apply it to the rest
* batch 1
* batch 2
* batch 3
* batch 4
* batch 5
* style
* non-tf template
* last deletion of check_loss_output
* Fix TF Serving when output_hidden_states and output_attentions are True
* Add tests for saved model creation + bug fix for multiple choices models
* remove unused import
* Fix the input for several layers
* Fix test
* Fix conflict printing
* Apply style
* Fix XLM and Flaubert for TensorFlow
* Apply style
* Fix TF check version
* Apply style
* Trigger CI
* Add script to convert tf2.x checkpoint to pytorch
The script converts the newer TF2.x checkpoints (as published on their official GitHub: https://github.com/tensorflow/models/tree/master/official/nlp/bert) to Pytorch.
* rename file in order to stay consistent with naming convention
* Replace mecab-python3 with fugashi
This replaces mecab-python3 with fugashi for Japanese tokenization. I am
the maintainer of both projects.
Both projects are MeCab wrappers, so the underlying C++ code is the
same. fugashi is the newer wrapper and doesn't use SWIG, so for basic
use of the MeCab API it's easier to use.
This code insures the use of a version of ipadic installed via pip,
which should make versioning and tracking down issues easier.
fugashi has wheels for Windows, OSX, and Linux, which will help with
issues with installing old versions of mecab-python3 on Windows.
Compared to mecab-python3, because fugashi doesn't use SWIG, it doesn't
require a C++ runtime to be installed on Windows.
In adding this change I removed some code dealing with `cursor`,
`token_start`, and `token_end` variables. These variables didn't seem to
be used for anything, it is unclear to me why they were there.
I ran the tests and they passed, though I couldn't figure out how to run
the slow tests (`--runslow` gave an error) and didn't try testing with
Tensorflow.
* Style fix
* Remove unused variable
Forgot to delete this...
* Adapt doc with install instructions
* Fix typo
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* enable easy checkout switch
allow having multiple repository checkouts and not needing to remember to rerun 'pip install -e .[dev]' when switching between checkouts and running tests.
* make isort happy
* examples needs one too
* fixed type; add Pytorch Native CUDA AMP support
* reverted commit on modeling_utils
* confirming to HF black formatting rule
* changed bool value of _use_apex
* scaler support for gradient clipping
* fix inplace operation of clip_grad_norm
* removed not while version comparison
* Add onnxruntime transformers optimization support
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added Optimization section in ONNX/ONNXRuntime documentation.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Improve note reference
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fixing imports order.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Add warning about different level of optimization between torch and tf export.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Address @LysandreJik wording suggestion
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address @LysandreJik wording suggestion
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Always optimize model before quantization for maximum performances.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Address comments on the documentation.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Improve TensorFlow optimization message as suggested by @yufenglee
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Removed --optimize parameter
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Warn the user about current quantization limitation when model is larger than 2GB.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Trigger CI for last check
* Small change in print for the optimization section.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* initial commit for pipeline implementation
Addition of input processing and history concatenation
* Conversation pipeline tested and working for single & multiple conversation inputs
* Added docstrings for dialogue pipeline
* Addition of dialogue pipeline integration tests
* Delete test_t5.py
* Fixed max code length
* Updated styling
* Fixed test broken by formatting tools
* Removed unused import
* Added unit test for DialoguePipeline
* Fixed Tensorflow compatibility
* Fixed multi-framework support using framework flag
* - Fixed docstring
- Added `min_length_for_response` as an initialization parameter
- Renamed `*args` to `conversations`, `conversations` being a `Conversation` or a `List[Conversation]`
- Updated truncation to truncate entire segments of conversations, instead of cutting in the middle of a user/bot input
* - renamed pipeline name from dialogue to conversational
- removed hardcoded default value of 1000 and use config.max_length instead
- added `append_response` and `set_history` method to the Conversation class to avoid direct fields mutation
- fixed bug in history truncation method
* - Updated ConversationalPipeline to accept only active conversations (otherwise a ValueError is raised)
* - Simplified input tensor conversion
* - Updated attention_mask value for Tensorflow compatibility
* - Updated last dialogue reference to conversational & fixed integration tests
* Fixed conflict with master
* Updates following review comments
* Updated formatting
* Added Conversation and ConversationalPipeline to the library __init__, addition of docstrings for Conversation, added both to the docs
* Update src/transformers/pipelines.py
Updated docsting following review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Switch from return_tuple to return_dict
* Fix test
* [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614)
* Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests
* AutoModels
Tiny tweaks
* Style
* Final changes before merge
* Re-order for simpler review
* Final fixes
* Addressing @sgugger's comments
* Test MultipleChoice
* Rework TF trainer (#6038)
* Fully rework training/prediction loops
* fix method name
* Fix variable name
* Fix property name
* Fix scope
* Fix method name
* Fix tuple index
* Fix tuple index
* Fix indentation
* Fix variable name
* fix eval before log
* Add drop remainder for test dataset
* Fix step number + fix logging datetime
* fix eval loss value
* use global step instead of step + fix logging at step 0
* Fix logging datetime
* Fix global_step usage
* Fix breaking loop + logging datetime
* Fix step in prediction loop
* Fix step breaking
* Fix train/test loops
* Force TF at least 2.2 for the trainer
* Use assert_cardinality to facilitate the dataset size computation
* Log steps per epoch
* Make tfds compliant with TPU
* Make tfds compliant with TPU
* Use TF dataset enumerate instead of the Python one
* revert previous commit
* Fix data_dir
* Apply style
* rebase on master
* Address Sylvain's comments
* Address Sylvain's and Lysandre comments
* Trigger CI
* Remove unused import
* Switch from return_tuple to return_dict
* Fix test
* Add recent model
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
* Fully rework training/prediction loops
* fix method name
* Fix variable name
* Fix property name
* Fix scope
* Fix method name
* Fix tuple index
* Fix tuple index
* Fix indentation
* Fix variable name
* fix eval before log
* Add drop remainder for test dataset
* Fix step number + fix logging datetime
* fix eval loss value
* use global step instead of step + fix logging at step 0
* Fix logging datetime
* Fix global_step usage
* Fix breaking loop + logging datetime
* Fix step in prediction loop
* Fix step breaking
* Fix train/test loops
* Force TF at least 2.2 for the trainer
* Use assert_cardinality to facilitate the dataset size computation
* Log steps per epoch
* Make tfds compliant with TPU
* Make tfds compliant with TPU
* Use TF dataset enumerate instead of the Python one
* revert previous commit
* Fix data_dir
* Apply style
* rebase on master
* Address Sylvain's comments
* Address Sylvain's and Lysandre comments
* Trigger CI
* Remove unused import
* Added capability to quantize a model while exporting through ONNX.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
We do not support multiple extensions
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Reformat files
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* More quality
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Ensure test_generate_identified_name compares the same object types
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added documentation everywhere on ONNX exporter
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Use pathlib.Path instead of plain-old string
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Use f-string everywhere
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Use the correct parameters for black formatting
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Use Python 3 super() style.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Use packaging.version to ensure installed onnxruntime version match requirements
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fixing imports sorting order.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Missing raise(s)
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added quantization documentation
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix some spelling.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix bad list header format
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Move torchscript and add ONNX documentation under modle_export
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Let's follow guidelines by the gurus: Renamed torchscript.rst to serialization.rst
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove previously introduced tree element
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* WIP doc
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* ONNX documentation
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix invalid link
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Improve spelling
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Final wording pass
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Moving rom transformers statements to relative imports in some files under src/
* Import order
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Currently, it's hard to derive which example tests were run on CI, and which weren't. Adding `-rA` flag to `pytest`, will now include a summary like:
```
==================================================================== short test summary info =====================================================================
PASSED examples/test_examples.py::ExamplesTests::test_generation
PASSED examples/test_examples.py::ExamplesTests::test_run_glue
PASSED examples/test_examples.py::ExamplesTests::test_run_language_modeling
PASSED examples/test_examples.py::ExamplesTests::test_run_squad
FAILED examples/test_examples.py::ExamplesTests::test_run_pl_glue - AttributeError: 'Namespace' object has no attribute 'gpus'
============================================================ 1 failed, 4 passed, 8 warnings in 42.96s ============================================================
```
which makes it easier to validate whether some example is being covered by CI or not.
* Ensure OpenAI GPT position_ids is correctly initialized and registered as buffer at init.
This will make it compatible with TorchScript export.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix missing slice operator on the tensor data accessor.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Style.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fixed BertEmbedding position_ids buffer created at forward.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fixed MobileBertEmbedding position_ids buffer created at forward.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fixed XLM position_ids buffer created at forward.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Describe usage of sentence model
* fix typo usage
* add use and description to readme
* fix typo in readme
* readme formatting
* add training procedure to readme
* description name and company
* readme formatting
* dataset training readme
* typo
* readme
* minor doc fixes
correct superclass name and small grammar fixes
* correct the instance name in the error message
It appears to be `BaseTokenizer` from looking at:
`from tokenizers.implementations import BaseTokenizer as BaseTokenizerFast`
and not `Tokenizer` as it currently says.
* Attempt to fix the way squad_convert_examples_to_features pad the elements for the QA pipeline.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Quality
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make the code easier to read and avoid testing multiple test the same thing.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* missing enum value on truncation_strategy.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Rethinking for the easiest fix: expose the padding strategy on squad_convert_examples_to_features.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove unused imports.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Created model card for my extreme summarization model
* Update model_cards/yuvraj/xSumm/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Created model card for my summarization model
* Update model_cards/yuvraj/summarizer-cnndm/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* DataParallel fixes:
1. switched to a more precise check
- if self.args.n_gpu > 1:
+ if isinstance(model, nn.DataParallel):
2. fix tests - require the same fixup under DataParallel as the training module
* another fix
* Don't pass sampler for iterable dataset
* Added check for test and eval dataloaders.
* Formatting
* Don't pass sampler for iterable dataset
* Added check for test and eval dataloaders.
* Formatting
* Cleaner if nesting.
* Added test for trainer and iterable dataset
* Formatting for test
* Fixed import when torch is available only.
* Added require torch decorator to helper class
* Moved dataset class inside unittest
* Removed nested if and changed model in test
* Checking torch availability for IterableDataset
Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
* fix merge rebase
* add intermediate reformer code
* save intermediate caching results
* save intermediate
* save intermediate results
* save intermediate
* upload next step
* fix generate tests
* make tests work
* add named tuple output
* Apply suggestions from code review
* fix use_cache for False case
* fix tensor to gpu
* fix tensor to gpu
* refactor
* refactor and make style
* language tag addition on albert-mongolian
* Update model_cards/bayartsogt/albert-mongolian/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Reformer model head classification implementation for text classification
* Reformat the reformer model classification code
* PR review comments, and test case implementation for reformer for classification head changes
* CI/CD reformer for classification head test import error fix
* CI/CD test case implementation added ReformerForSequenceClassification to all_model_classes
* Code formatting- fixed
* Normal test cases added for reformer classification head
* Fix test cases implementation for the reformer classification head
* removed token_type_id parameter from the reformer classification head
* fixed the test case for reformer classification head
* merge conflict with master fixed
* merge conflict, changed reformer classification to accept the choice_label parameter added in latest code
* refactored the the reformer classification head test code
* reformer classification head, common transform test cases fixed
* final set of the review comment, rearranging the reformer classes and docstring add to classification forward method
* fixed the compilation error and text case fix for reformer classification head
* Apply suggestions from code review
Remove unnecessary dup
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix longformer global attention output
* fix multi gpu problem
* replace -10000 with 0
* better comment
* make attention output equal local and global
* Update src/transformers/modeling_longformer.py
* Add model type check for pipelines
* Add model type check for pipelines
* rename func
* Fix the init parameters
* Fix format
* rollback unnecessary refactor
* Pytorch gpu => cpu proper device
* Memoryless XLNet warning + fixed memories during generation
* Revert "Memoryless XLNet warning + fixed memories during generation"
This reverts commit 3d3251ff
* Took the operations on the generated_sequence out of the ensure_device scope
* Ensure padding and question cannot have higher probs than context.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Add bart the the list of tokenizers adding two <sep> tokens for squad_convert_example_to_feature
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Format.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Addressing @patrickvonplaten comments.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Addressing @patrickvonplaten comments about masking non-context element when generating the answer.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Addressing @sshleifer comments.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make sure we mask CLS after handling impossible answers
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Mask in the correct vectors ...
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Add B I handling to grouping
* Add fix to include separate entity as last token
* move last_idx definition outside loop
* Use first entity in entity group as reference for entity type
* Add test cases
* Take out extra class accidentally added
* Return tf ner grouped test to original
* Take out redundant last entity
* Get last_idx safely
Co-authored-by: ColleterVi <36503688+ColleterVi@users.noreply.github.com>
* Fix first entity comment
* Create separate functions for group_sub_entities and group_entities (splitting call method to testable functions)
* Take out unnecessary last_idx
* Remove additional forward pass test
* Move token classification basic tests to separate class
* Move token classification basic tests back to monocolumninputtestcase
* Move base ner tests to nerpipelinetests
* Take out unused kwargs
* Add back mandatory_keys argument
* Add unitary tests for group_entities in _test_ner_pipeline
* Fix last entity handling
* Fix grouping fucntion used
* Add typing to group_sub_entities and group_entities
Co-authored-by: ColleterVi <36503688+ColleterVi@users.noreply.github.com>
* Add deebert code
* Add readme of deebert
* Add test for deebert
Update test for Deebert
* Update DeeBert (README, class names, function refactoring); remove requirements.txt
* Format update
* Update test
* Update readme and model init methods
* Default decoder inputs to encoder ones for T5 if neither are specified.
* Fixing typo, now all tests are passing.
* Changing einsum to operations supported by onnx
* Adding a test to ensure T5 can be exported to onnx op>9
* Modified test for onnx export to make it faster
* Styling changes.
* Styling changes.
* Changing notation for matrix multiplication
Co-authored-by: Abel Riboulot <tkai@protomail.com>
* Added data collator for XLNet language modeling and related calls
Added DataCollatorForXLNetLanguageModeling in data/data_collator.py
to generate necessary inputs for language modeling training with
XLNetLMHeadModel. Also added related arguments, logic and calls in
examples/language-modeling/run_language_modeling.py.
Resolves: #4739, #2008 (partially)
* Changed name to `DataCollatorForPermutationLanguageModeling`
Changed the name of `DataCollatorForXLNetLanguageModeling` to the more general `DataCollatorForPermutationLanguageModelling`.
Removed the `--mlm` flag requirement for the new collator and defined a separate `--plm_probability` flag for its use.
CTRL uses a CLM loss just like GPT and GPT-2, so should work out of the box with this script (provided `past` is taken care of
similar to `mems` for XLNet).
Changed calls and imports appropriately.
* Added detailed comments, changed variable names
Added more detailed comments to `DataCollatorForPermutationLanguageModeling` in `data/data_collator.py` to explain working. Also cleaned up variable names and made them more informative.
* Added tests for new data collator
Added tests in `tests/test_trainer.py` for DataCollatorForPermutationLanguageModeling based on those in DataCollatorForLanguageModeling. A specific test has been added to check for odd-length sequences.
* Fixed styling issues
* Exposing prepare_for_model for both slow & fast tokenizers
* Update method signature
* The traditional style commit
* Hide the warnings behind the verbose flag
* update default truncation strategy and prepare_for_model
* fix tests and prepare_for_models methods
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Make QA pipeline supports models with more than 2 outputs such as BART assuming start/end are the two first outputs.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* When using the new padding/truncation paradigm setting padding="max_length" + max_length=X actually pads the input up to max_length.
This result in every sample going through QA pipelines to be of size 384 whatever the actual input size is making the overall pipeline very slow.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Mask padding & question before applying softmax. Softmax has been refactored to operate in log space for speed and stability.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Format.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Use PaddingStrategy.LONGEST instead of DO_NOT_PAD
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Revert "When using the new padding/truncation paradigm setting padding="max_length" + max_length=X actually pads the input up to max_length."
This reverts commit 1b00a9a2
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Trigger CI after unattended failure
* Trigger CI
* Work on tokenizer summary
* Finish tutorial
* Link to it
* Apply suggestions from code review
Co-authored-by: Anthony MOI <xn1t0x@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Add vocab definition
Co-authored-by: Anthony MOI <xn1t0x@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Added PipelineException
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* fill-mask pipeline raises exception when more than one mask_token detected.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Put everything in a function.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added tests on pipeline fill-mask when input has != 1 mask_token
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix numel() computation for TF
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Addressing PR comments.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove function typing to avoid import on specific framework.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Quality.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Retry typing with @julien-c tip.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Quality².
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Simplify fill-mask mask_token checking.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Trigger CI
* Add support for past states
* Style and forgotten self
* You mean, documenting is not enough? I have to actually add it too?
* Add memory support during evaluation
* Fix tests in eval and add TF support
* No need to change this line anymore
Otherwise, if label is not specified, the following error occurs:
Traceback (most recent call last):
File "run_ner.py", line 303, in <module>
main()
File "run_ner.py", line 101, in main
model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
File "/home/user/anaconda3/envs/bert/lib/python3.7/site-packages/transformers/hf_argparser.py", line 159, in parse_json_file
obj = dtype(**inputs)
TypeError: __init__() missing 1 required positional argument: 'labels'
* Fix the bug 'Attempted relative import with no known parent package' when using the bertabs example. Also change the used model from bertabs-finetuned-cnndm, since it seems not be accessible anymore
* Update run_summarization.py
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* remove references to old API in docstring - update data processors
* style
* fix tests - better type checking error messages
* better type checking
* include awesome fix by @LysandreJik for #5310
* updated doc and examples
* Add new parameter `pad_to_multiple_of` on tokenizers.
* unittest for pad_to_multiple_of
* Add .name when logging enum.
* Fix missing .items() on dict in tests.
* Add special check + warning if the tokenizer doesn't have proper pad_token.
* Use the correct logger format specifier.
* Ensure tokenizer with no pad_token do not modify the underlying padding strategy.
* Skip test if tokenizer doesn't have pad_token
* Fix RobertaTokenizer on empty input
* Format.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* fix and updating to simpler API
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* avoid recursion in id checks for fast tokenizers
* better typings and fix#5232
* align slow and fast tokenizers behaviors for Roberta and GPT2
* style and quality
* fix tests - improve typings
* fix-5181
Padding to max sequence length while truncation to another length was wrong on slow tokenizers
* clean up and fix#5155
* fix XLM test
* Fix tests for Transfo-XL
* logging only above WARNING in tests
* switch slow tokenizers tests in @slow
* fix Marian truncation tokenization test
* style and quality
* make the test a lot faster by limiting the sequence length used in tests
* Add return lengths
* make pad a bit more flexible so it can be used as collate_fn
* check all kwargs sent to encoding method are known
* fixing kwargs in encodings
* New AddedToken class in python
This class let you specify specifique tokenization behaviors for some special tokens. Used in particular for GPT2 and Roberta, to control how white spaces are stripped around special tokens.
* style and quality
* switched to hugginface tokenizers library for AddedTokens
* up to tokenizer 0.8.0-rc3 - update API to use AddedToken state
* style and quality
* do not raise an error on additional or unused kwargs for tokenize() but only a warning
* transfo-xl pretrained model requires torch
* Update src/transformers/tokenization_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Cleaner warning when loading pretrained models
This make more explicit logging messages when using the various `from_pretrained` methods. It also make these messages as `logging.warning` because it's a common source of silent mistakes.
* Update src/transformers/modeling_utils.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* style and quality
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* fix#5081 and improve backward compatibility (slightly)
* add nlp to setup.cfg - style and quality
* align default to previous default
* remove test that doesn't generalize
* add support for gradient checkpointing in BERT
* fix unit tests
* isort
* black
* workaround for `torch.utils.checkpoint.checkpoint` not accepting bool
* Revert "workaround for `torch.utils.checkpoint.checkpoint` not accepting bool"
This reverts commit 5eb68bb804f5ffbfc7ba13c45a47717f72d04574.
* workaround for `torch.utils.checkpoint.checkpoint` not accepting bool
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Configure all models to use output_hidden_states as argument passed to foward()
* Pass all tests
* Remove cast_bool_to_primitive in TF Flaubert model
* correct tf xlnet
* add pytorch test
* add tf test
* Fix broken tests
* Configure all models to use output_hidden_states as argument passed to foward()
* Pass all tests
* Remove cast_bool_to_primitive in TF Flaubert model
* correct tf xlnet
* add pytorch test
* add tf test
* Fix broken tests
* Refactor output_hidden_states for mobilebert
* Reset and remerge to master
Co-authored-by: Joseph Liu <joseph.liu@coinflex.com>
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* Fixed resize_token_embeddings for transfo_xl model
* Fixed resize_token_embeddings for transfo_xl.
Added custom methods to TransfoXLPreTrainedModel for resizing layers of
the AdaptiveEmbedding.
* Updated docstring
* Fixed resizinhg cutoffs; added check for new size of embedding layer.
* Added test for resize_token_embeddings
* Fixed code quality
* Fixed unchanged cutoffs in model.config
* Added feature to move added tokens in tokenizer.
* Fixed code quality
* Added feature to move added tokens in tokenizer.
* Fixed code quality
* Fixed docstring, renamed sym to oken.
Co-authored-by: Rafael Weingartner <rweingartner.its-b2015@fh-salzburg.ac.at>
* Add BERT Loses Patience (Patience-based Early Exit)
* update model archive
* update format
* sort import
* flake8
* Add results
* full results
* align the table
* refactor to inherit
* default per gpu eval = 1
* Formatting
* Formatting
* isort
* modify readme
* Add check
* Fix format
* Fix format
* Doc strings
* ALBERT & BERT for sequence classification don't inherit from the original anymore
* Remove incorrect comments
* Remove incorrect comments
* Remove incorrect comments
* Sync up with new code
* Sync up with new code
* Add a test
* Add a test
* Add a test
* Add a test
* Add a test
* Add a test
* Finishing up!
* add ElectraForMultipleChoice
* add test_for_multiple_choice
* add ElectraForMultipleChoice in auto model
* add ElectraForMultipleChoice in all_model_classes
* add SequenceSummary related parameters
* get rid pooler, use SequenceSummary instead
* add electra multiple choice test
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Added is_fast property on BatchEncoding to indicate if the object comes from a Fast Tokenizer.
* Added __get_state__() & __set_state__() to be pickable.
* Correct tokens() return type from List[int] to List[str]
* Added unittest for BatchEncoding pickle/unpickle
* Added unittest for BatchEncoding is_fast
* More careful checking on BatchEncoding unpickle tests.
* Formatting.
* is_fast should assertTrue on Rust tokenizers.
* Ensure tensorflow has correct way of checking array_equal
* More formatting.
* Update hans data to be able to use Trainer
* Fixes
* Deal with tokenizer that don't have token_ids
* Clean up things
* Simplify data use
* Fix the input dict
* Formatting + proper path in README
* Fixed resize_token_embeddings for transfo_xl model
* Fixed resize_token_embeddings for transfo_xl.
Added custom methods to TransfoXLPreTrainedModel for resizing layers of
the AdaptiveEmbedding.
* Updated docstring
* Fixed resizinhg cutoffs; added check for new size of embedding layer.
* Added test for resize_token_embeddings
* Fixed code quality
* Fixed unchanged cutoffs in model.config
Co-authored-by: Rafael Weingartner <rweingartner.its-b2015@fh-salzburg.ac.at>
* check type before logging to ensure it's a scalar
* log when Trainer attempts to add a non-scalar value using TensorboardX's writer.add_scalar so we know what kinds of fixes are appropriate
* black it
* rephrase log message to clarify attribute was dropped
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* ElectraForQuestionAnswering
* udate __init__
* add test for electra qa model
* add ElectraForQuestionAnswering in auto models
* add ElectraForQuestionAnswering in all_model_classes
* fix outputs, input_ids defaults to None
* add ElectraForQuestionAnswering in docs
* remove commented line
* DOC: Replace instances of ``config.output_attentions`` with function argument ``output_attentions``
* DOC: Apply Black Formatting
* Fix errors where output_attentions was undefined
* Remove output_attentions in classes per review
* Fix regressions on tests having `output_attention`
* Fix further regressions in tests relating to `output_attentions`
Ensure proper propagation of `output_attentions` as a function parameter
to all model subclasses
* Fix more regressions in `test_output_attentions`
* Fix issues with BertEncoder
* Rename related variables to `output_attentions`
* fix pytorch tests
* fix bert and gpt2 tf
* Fix most TF tests for `test_output_attentions`
* Fix linter errors and more TF tests
* fix conflicts
* DOC: Apply Black Formatting
* Fix errors where output_attentions was undefined
* Remove output_attentions in classes per review
* Fix regressions on tests having `output_attention`
* fix conflicts
* fix conflicts
* fix conflicts
* fix conflicts
* fix pytorch tests
* fix conflicts
* fix conflicts
* Fix linter errors and more TF tests
* fix tf tests
* make style
* fix isort
* improve output_attentions
* improve tensorflow
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add tpu and torchscipt for benchmark
* fix name in tests
* "fix email"
* make style
* better log message for tpu
* add more print and info for tpu
* allow possibility to print tpu metrics
* correct cpu usage
* fix test for non-install
* remove bugus file
* include psutil in testing
* run a couple of times before tracing in torchscript
* do not allow tpu memory tracing for now
* make style
* add torchscript to env
* better name for torch tpu
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Better None gradients handling
* Apply Style
* Apply Style
* Create a loss class per task to compute its respective loss
* Add loss classes to the ALBERT TF models
* Add loss classes to the BERT TF models
* Add question answering and multiple choice to TF Camembert
* Remove prints
* Add multiple choice model to TF DistilBERT + loss computation
* Add question answering model to TF Electra + loss computation
* Add token classification, question answering and multiple choice models to TF Flaubert
* Add multiple choice model to TF Roberta + loss computation
* Add multiple choice model to TF XLM + loss computation
* Add multiple choice and question answering models to TF XLM-Roberta
* Add multiple choice model to TF XLNet + loss computation
* Remove unused parameters
* Add task loss classes
* Reorder TF imports + add new model classes
* Add new model classes
* Bugfix in TF T5 model
* Bugfix for TF T5 tests
* Bugfix in TF T5 model
* Fix TF T5 model tests
* Fix T5 tests + some renaming
* Fix inheritance issue in the AutoX tests
* Add tests for TF Flaubert and TF XLM Roberta
* Add tests for TF Flaubert and TF XLM Roberta
* Remove unused piece of code in the TF trainer
* bugfix and remove unused code
* Bugfix for TF 2.2
* Apply Style
* Divide TFSequenceClassificationAndMultipleChoiceLoss into their two respective name
* Apply style
* Mirror the PT Trainer in the TF one: fp16, optimizers and tb_writer as class parameter and better dataset handling
* Fix TF optimizations tests and apply style
* Remove useless parameter
* Bugfix and apply style
* Fix TF Trainer prediction
* Now the TF models return the loss such as their PyTorch couterparts
* Apply Style
* Ignore some tests output
* Take into account the SQuAD cls_index, p_mask and is_impossible parameters for the QuestionAnswering task models.
* Fix names for SQuAD data
* Apply Style
* Fix conflicts with 2.11 release
* Fix conflicts with 2.11
* Fix wrongname
* Add better documentation on the new create_optimizer function
* Fix isort
* logging_dir: use same default as PyTorch
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* ner: add preprocessing script for examples that splits longer sentences
* ner: example shell scripts use local preprocessing now
* ner: add new example section for WNUT’17 NER task. Remove old English CoNLL-03 results
* ner: satisfy black and isort
* Refactor tensor creation in tokenizers.
* Make sure to convert string to TensorType
* Refactor convert_to_tensors_
* Introduce numpy tensor creation
* Format
* Add unittest for TensorType creation from str
* sorting imports
* Added unittests for numpy tensor conversion.
* Do not use in-place version for squeeze as numpy doesn't provide such feature.
* Added extra parameter prepend_batch_axis: bool on prepare_for_model.
* Ensure test_np_encode_plus_sent_to_model is not executed if encoder/decoder model.
* style.
* numpy tests require_torch for now while flax not merged.
* Hopefully will make flake8 happy.
* One more time 🎶
* Ensure tokens in never_split are not splitted when using basic tokenizer before wordpiece.
* never_split only use membership attempt to use a set() which is 10x faster for this operation.
* Use union to concatenate two sets.
* Updated docstring for never_split parameter.
* Avoid set.union() if never_split is None
* Added comments.
* Correct docstring format.
* Added links to more community notebooks
Added links to 3 more community notebooks from the git repo: https://github.com/abhimishra91/transformers-tutorials
Different Transformers models are fine tuned on Dataset using PyTorch
* Update README.md
* Update README.md
* Update README.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [hf_api] Attach all unknown attributes for future-proof compatibility
* [Pipeline] NerPipeline is really a TokenClassificationPipeline
* modelcard.py: I don't think we need to force the download
* Remove config, tokenizer from SUPPORTED_TASKS as we're moving to one model = one weight + one tokenizer
* FillMaskPipeline: also output token in string form
* TextClassificationPipeline: option to return all scores, not just the argmax
* Update docs/source/main_classes/pipelines.rst
* Glue task cleaup
* Enable writing cache to cache_dir in case dataset lives in readOnly
filesystem.
* Differentiate match vs mismatch for MNLI metrics.
* Style
* Fix pytype
* Fix type
* Use cache_dir in mnli mismatch eval dataset
* Small Tweaks
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Kill model archive maps
* Fixup
* Also kill model_archive_map for MaskedBertPreTrainedModel
* Unhook config_archive_map
* Tokenizers: align with model id changes
* make style && make quality
* Fix CI
* pass on tokenizer to pipeline
* order input names when convert to onnx
* update style
* remove unused imports
* make ordered inputs list needs to be mutable
* add test custom bert model
* remove unused imports
* better api
* improve automatic setting of global attention mask
* fix longformer bug
* fix global attention mask in test
* fix global attn mask flatten
* fix slow tests
* update docstring
* update docs and make more robust
* improve attention mask
* add multiple choice for longformer
* add models to docs
* adapt docstring
* add test to longformer
* add longformer for mc in init and modeling auto
* fix tests
The option `--do_lower_case` is currently required by the uncased models (i.e., bert-base-uncased, bert-large-uncased).
Results:
BERT-BASE without --do_lower_case: 'exact': 73.83, 'f1': 82.22
BERT-BASE with --do_lower_case: 'exact': 81.02, 'f1': 88.34
* make transformers-cli cross-platform
Using "scripts" is a useful option in setup.py particularly when you want to get access to non-python scripts. However, in this case we want to have an entry point into some of our own Python scripts. To do this in a concise, cross-platfom way, we can use entry_points.console_scripts. This change is necessary to provide the CLI on different platforms, which "scripts" does not ensure. Usage remains the same, but the "transformers-cli" script has to be moved (be part of the library) and renamed (underscore + extension)
* make style & quality
* added LongformerForQuestionAnswering
* add LongformerForQuestionAnswering
* fix import for LongformerForMaskedLM
* add LongformerForQuestionAnswering
* hardcoded sep_token_id
* compute attention_mask if not provided
* combine global_attention_mask with attention_mask when provided
* update example in docstring
* add assert error messages, better attention combine
* add test for longformerForQuestionAnswering
* typo
* cast gloabl_attention_mask to long
* make style
* Update src/transformers/configuration_longformer.py
* Update src/transformers/configuration_longformer.py
* fix the code quality
* Merge branch 'longformer-for-question-answering' of https://github.com/patil-suraj/transformers into longformer-for-question-answering
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add Type Hints to modeling_utils.py Closes#3911
Add Type Hints to methods in `modeling_utils.py`
Note: The coverage isn't 100%. Mostly skipped internal methods.
* Reformat according to `black` and `isort`
* Use typing.Iterable instead of Sequence
* Parameterize Iterable by its generic type
* Use typing.Optional when None is the default value
* Adhere to style guideline
* Update src/transformers/modeling_utils.py
* Update src/transformers/modeling_utils.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Warn the user about max_len being on the path to be deprecated.
* Ensure better backward compatibility when max_len is provided to a tokenizer.
* Make sure to override the parameter and not the actual instance value.
* Format & quality
* Adds predict stage for glue tasks, and generate result files which could be submitted to gluebenchmark.com website.
* Use Split enum + always output the label name
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* first commit
* bug fixes
* better examples
* undo padding
* remove wrong VOCAB_FILES_NAMES
* License
* make style
* make isort happy
* unit tests
* integration test
* make `black` happy by undoing `isort` changes!!
* lint
* no need for the padding value
* batch_size not bsz
* remove unused type casting
* seqlen not seq_len
* staticmethod
* `bert` selfattention instead of `n2`
* uint8 instead of bool + lints
* pad inputs_embeds using embeddings not a constant
* black
* unit test with padding
* fix unit tests
* remove redundant unit test
* upload model weights
* resolve todo
* simpler _mask_invalid_locations without lru_cache + backward compatible masked_fill_
* increase unittest coverage
* Distributed eval: SequentialDistributedSampler + gather all results
* For consistency only write to disk from world_master
Close https://github.com/huggingface/transformers/issues/4272
* Working distributed eval
* Hook into scripts
* Fix#3721 again
* TPU.mesh_reduce: stay in tensor space
Thanks @jysohn23
* Just a small comment
* whitespace
* torch.hub: pip install packaging
* Add test scenarii
* makes fetching last learning late in trainer backward compatible
* split comment to multiple lines
* fixes black styling issue
* uses version to create a more explicit logic
- add a citation.
- modify the table of the BLUE benchmark.
The table of the first version was not displayed correctly on https://huggingface.co/seiya/oubiobert-base-uncased.
Could you please confirm that this fix will allow you to display it correctly?
* Adding optimizations block from ONNXRuntime.
* Turn off external data format by default for PyTorch export.
* Correct the way use_external_format is passed through the cmdline args.
* Add index to be returned by NerPipeline to allow for the creation of
* Add entity groups
* Convert entity list to dict
* Add entity to entity_group_disagg atfter updating entity gorups
* Change 'group' parameter to 'grouped_entities'
* Add unit tests for grouped NER pipeline case
* Correct variable name typo for NER_FINETUNED_MODELS
* Sync grouped tests to recent test updates
* Added generic ONNX conversion script for PyTorch model.
* WIP initial TF support.
* TensorFlow/Keras ONNX export working.
* Print framework version info
* Add possibility to check the model is correctly loading on ONNX runtime.
* Remove quantization option.
* Specify ONNX opset version when exporting.
* Formatting.
* Remove unused imports.
* Make functions more generally reusable from other part of the code.
* isort happy.
* flake happy
* Export only feature-extraction for now
* Correctly check inputs order / filter before export.
* Removed task variable
* Fix invalid args call in load_graph_from_args.
* Fix invalid args call in convert.
* Fix invalid args call in infer_shapes.
* Raise exception and catch in caller function instead of exit.
* Add 04-onnx-export.ipynb notebook
* More WIP on the notebook
* Remove unused imports
* Simplify & remove unused constants.
* Export with constant_folding in PyTorch
* Let's try to put function args in the right order this time ...
* Disable external_data_format temporary
* ONNX notebook draft ready.
* Updated notebooks charts + wording
* Correct error while exporting last chart in notebook.
* Adressing @LysandreJik comment.
* Set ONNX opset to 11 as default value.
* Set opset param mandatory
* Added ONNX export unittests
* Quality.
* flake8 happy
* Add keras2onnx dependency on extras["tf"]
* Pin keras2onnx on github master to v1.6.5
* Second attempt.
* Third attempt.
* Use the right repo URL this time ...
* Do the same for onnxconverter-common
* Added keras2onnx and onnxconveter-common to 1.7.0 to supports TF2.2
* Correct commit hash.
* Addressing PR review: Optimization are enabled by default.
* Addressing PR review: small changes in the notebook
* setup.py comment about keras2onnx versioning.
* Add QA trainer example for TF
* Make data_dir optional
* Fix parameter logic
* Fix feature convert
* Update the READMEs to add the question-answering task
* Apply style
* Change 'sequence-classification' to 'text-classification' and prefix with 'eval' all the metric names
* Apply style
* Apply style
* Improvements to the wandb integration
* small reorg + no global necessary
* feat(trainer): log epoch and final metrics
* Simplify logging a bit
* Fixup
* Fix crash when just running eval
Co-authored-by: Chris Van Pelt <vanpelt@gmail.com>
Co-authored-by: Boris Dayma <boris.dayma@gmail.com>
* Allow BatchEncoding to be initialized empty.
This is required by recent changes introduced in TF 2.2.
* Attempt to unpin Tensorflow to 2.2 with the previous commit.
* catch gpu len 1 set to gpu0
* Add mpc to trainer
* Add MPC for TF
* fix TF automodel for MPC and add Albert
* Apply style
* Fix import
* Note to self: double check
* Make shape None, None for datasetgenerator output shapes
* Add from_pt bool which doesnt seem to work
* Original checkpoint dir
* Fix docstrings for automodel
* Update readme and apply style
* Colab should probably not be from users
* Colabs should probably not be from users
* Add colab
* Update README.md
* Update README.md
* Cleanup __intit__
* Cleanup flake8 trailing comma
* Update src/transformers/training_args_tf.py
* Update src/transformers/modeling_tf_auto.py
Co-authored-by: Viktor Alm <viktoralm@pop-os.localdomain>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* simplify cache vars and allow for TRANSFORMERS_CACHE env
As it currently stands, "TRANSFORMERS_CACHE" is not an accepted variable. It seems that the these variables were not updated when moving from version pytorch_transformers to transformers. In addition, the fallback procedure could be improved. and simplified. Pathlib seems redundant here.
* Update file_utils.py
"Migrating from pytorch-transformers to transformers" is missing in the main document. It is available in the main `readme` thought. Just move it to the document.
* Fix the issue to properly run the accumulator with TF 2.2
* Apply style
* Fix training_args_tf for TF 2.2
* Fix the TF training args when only one GPU is available
* Remove the fixed version of TF in setup.py
* example updated to use generation pipeline
* Update model_cards/LorenzoDeMattei/GePpeTto/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Created using Colaboratory
* [examples] reorganize files
* remove run_tpu_glue.py as superseded by TPU support in Trainer
* Bugfix: int, not tuple
* move files around
* Rewritten batch support in pipelines.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix imports sorting 🔧
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Set pad_to_max_length=True by default on Pipeline.
* Set pad_to_max_length=False for generation pipelines.
Most of generation models doesn't have padding token.
* Address @joeddav review comment: Uniformized *args.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Address @joeddav review comment: Uniformized *args (second).
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* first copy & past commit from Bert and morgans LSH code
* add easy way to compare to trax original code
* translate most of function
* make trax lsh self attention deterministic with numpy seed + copy paste code
* add same config
* add same config
* make layer init work
* implemented hash_vectors function for lsh attention
* continue reformer translation
* hf LSHSelfAttentionLayer gives same output as trax layer
* refactor code
* refactor code
* refactor code
* refactor
* refactor + add reformer config
* delete bogus file
* split reformer attention layer into two layers
* save intermediate step
* save intermediate step
* make test work
* add complete reformer block layer
* finish reformer layer
* implement causal and self mask
* clean reformer test and refactor code
* fix merge conflicts
* fix merge conflicts
* update init
* fix device for GPU
* fix chunk length init for tests
* include morgans optimization
* improve memory a bit
* improve comment
* factorize num_buckets
* better testing parameters
* make whole model work
* make lm model work
* add t5 copy paste tokenizer
* add chunking feed forward
* clean config
* add improved assert statements
* make tokenizer work
* improve test
* correct typo
* extend config
* add complexer test
* add new axial position embeddings
* add local block attention layer
* clean tests
* refactor
* better testing
* save intermediate progress
* clean test file
* make shorter input length work for model
* allow variable input length
* refactor
* make forward pass for pretrained model work
* add generation possibility
* finish dropout and init
* make style
* refactor
* add first version of RevNet Layers
* make forward pass work and add convert file
* make uploaded model forward pass work
* make uploaded model forward pass work
* refactor code
* add namedtuples and cache buckets
* correct head masks
* refactor
* made reformer more flexible
* make style
* remove set max length
* add attention masks
* fix up tests
* fix lsh attention mask
* make random seed optional for the moment
* improve memory in reformer
* add tests
* make style
* make sure masks work correctly
* detach gradients
* save intermediate
* correct backprob through gather
* make style
* change back num hashes
* rename to labels
* fix rotation shape
* fix detach
* update
* fix trainer
* fix backward dropout
* make reformer more flexible
* fix conflict
* fix
* fix
* add tests for fixed seed in reformer layer
* fix trainer typo
* fix typo in activations
* add fp16 tests
* add fp16 training
* support fp16
* correct gradient bug in reformer
* add fast gelu
* re-add dropout for embedding dropout
* better naming
* better naming
* renaming
* finalize test branch
* finalize tests
* add more tests
* finish tests
* fix
* fix type trainer
* fix fp16 tests
* fix tests
* fix tests
* fix tests
* fix issue with dropout
* fix dropout seeds
* correct random seed on gpu
* finalize random seed for dropout
* finalize random seed for dropout
* remove duplicate line
* correct half precision bug
* make style
* refactor
* refactor
* docstring
* remove sinusoidal position encodings for reformer
* move chunking to modeling_utils
* make style
* clean config
* make style
* fix tests
* fix auto tests
* pretrained models
* fix docstring
* update conversion file
* Update pretrained_models.rst
* fix rst
* fix rst
* update copyright
* fix test path
* fix test path
* fix small issue in test
* include reformer in generation tests
* add docs for axial position encoding
* finish docs
* Update convert_reformer_trax_checkpoint_to_pytorch.py
* remove isort
* include sams comments
* remove wrong comment in utils
* correct typos
* fix typo
* Update reformer.rst
* applied morgans optimization
* make style
* make gpu compatible
* remove bogus file
* big test refactor
* add example for chunking
* fix typo
* add to README
* First commit to add a TF version of the trainer.
* Make the TF trainer closer to what looks the PT trainer
* Refactoring common code between the PT and TF trainer into an util file.
* Some bugfix + better similarity with the PT trainer
* Add missing class in transformers init
* Bugfix over prediction + use classification report instead of simple metrics
* Fix name error
* Fix optimization tests + style
* Apply style
* Several bugfix for multi-gpu training
* Apply style
* Apply style
* Add glue example for the TF trainer
* Several bugix + address the reviews
* Fix on the TF training args file
* Add a debug mode
* Bugfix in utils_ner.py when segment_ids is None
* Apply style
* Apply style
* Add TPU strategy
* Fix selection strategy
There's an inconsistency right now where:
- we load some models into CACHE_DIR
- and some models in the default cache
- and often, in both for the same models
When running the RUN_SLOW tests, this takes a lot of disk space, time, and bandwidth.
I'd rather always use the default cache
* Update sqrt computation so it can survive a torch.jit.trace
* Update modeling_gpt2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add GenerationPipeline
* Fix parameter names
* Correct parameter __call__ parameters
* Add model type attribute and correct function calls for prepare_input
* Take out trailing commas from init attributes
* Remove unnecessary tokenization line
* Implement support for multiple text inputs
* Apply generation support for multiple input text prompts
* Take out tensor coersion
* Take out batch index
* Add text prompt to return sequence
* Squeeze token tensore before decoding
* Return only a single list of sequences if only one prompt was used
* Correct results variable name
* Add GenerationPipeline to SUPPORTED_TASKS with the alias , initalized w GPT2
* Registedred AutoModelWithLMHead for both pt and t
* Update docstring for GenerationPipeline
* Add kwargs parameter to mode.generate
* Take out kwargs parameter after all
* Add generation pipeline example in pipeline docstring
* Fix max length by squeezing tokens tensor
* Apply ensure_tensor_on_device to pytorch tensor
* Include generation step in torch.no_grad
* Take out input from prepare_xlm_input and set 'en' as default xlm_language
* Apply framework specific encoding during prepare_input
* Format w make style
* Move GenerationPipeline import to follow proper import sorting
* Take out training comma from generation dict
* Apply requested changes
* Change name to TextGenerationPipeline
* Apply TextGenerationPipeline rename to __init___
* Changing alias to
* Set input mapping as input to ensure_tensor_on_device
* Fix assertion placement
* Add test_text_generation
* Add TextGenerationPipeline to PipelineCommonTests
* Take out whitespace
* Format __init__ w black
* Fix __init__ style
* Forman __init___
* Add line to end of __init__
* Correct model tokenizer set for test_text_generation
* Ensure to return list of list, not list of string (to pass test)
* Limit test models to only 3 to limit runtime to address circleCI timeout error
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Remove argument docstring, __init__, add additional __call__ arguments, and reformat results to list of dict
* Fix blank result list
* Add TextGenerationPipeline to pipelines.rst
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix typos from adding PADDING_TEXT_TOKEN_LENGTH
* Fix incorrectly moved result list
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Add back generation line and make style
* Take out blank whitespace
* Apply new alis, text-generation, to test_pipelines
* Fix text generation alias in test
* Update src/transformers/pipelines.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* doc
* [tests] Add sample files for a regression task
* [HUGE] Trainer
* Feedback from @sshleifer
* Feedback from @thomwolf + logging tweak
* [file_utils] when downloading concurrently, get_from_cache will use the cached file for subsequent processes
* [glue] Use default max_seq_length of 128 like before
* [glue] move DataTrainingArguments around
* [ner] Change interface of InputExample, and align run_{tf,pl}
* Re-align the pl scripts a little bit
* ner
* [ner] Add integration test
* Fix language_modeling with API tweak
* [ci] Tweak loss target
* Don't break console output
* amp.initialize: model must be on right device before
* [multiple-choice] update for Trainer
* Re-align to 827d6d6ef0
* First pass on utility classes and python tokenizers
* finishing cleanup pass
* style and quality
* Fix tests
* Updating following @mfuntowicz comment
* style and quality
* Fix Roberta
* fix batch_size/seq_length inBatchEncoding
* add alignement methods + tests
* Fix OpenAI and Transfo-XL tokenizers
* adding trim_offsets=True default for GPT2 et RoBERTa
* style and quality
* fix tests
* add_prefix_space in roberta
* bump up tokenizers to rc7
* style
* unfortunately tensorfow does like these - removing shape/seq_len for now
* Update src/transformers/tokenization_utils.py
Co-Authored-By: Stefan Schweter <stefan@schweter.it>
* Adding doc and docstrings
* making flake8 happy
Co-authored-by: Stefan Schweter <stefan@schweter.it>
token_type_id is converted into the segment embedding. For question answering,
this needs to highlight whether a token belongs to sequence 0 or 1.
encode_plus takes care of correctly setting this parameter automatically.
* Refactored use of newstest2013 to newstest2014. Fixed bug where argparse consumed first command line argument as model_size argument rather than using default model_size by forcing explicit --model_size flag inclusion
* More pythonic file handling through 'with' context
* COSMETIC - ran Black and isort
* Fixed reference to number of lines in newstest2014
* Fixed failing test. More pythonic file handling
* finish PR from tholiao
* remove outcommented lines
* make style
* make isort happy
Co-authored-by: Thomas Liao <tholiao@gmail.com>
* remove output_past from pt
* make style
* add optional input length for gpt2
* add use cache to prepare input
* save memory in gpt2
* correct gpt2 test inputs
* make past input optional for gpt2
* finish use_cache for all models
* make style
* delete modeling_gpt2 change in test file
* correct docstring
* correct is true statements for gpt2
* added model_cards for polish squad models
* corrected mistake in polish design cards
* updated model_cards for squad2_dutch model
* added links to benchmark models
Co-authored-by: Henryk Borzymowski <henryk.borzymowski@pwc.com>
* Initial commit to get BERT + run_glue.py on TPU
* Add README section for TPU and address comments.
* Cleanup TPU bits from run_glue.py (#3)
TPU runner is currently implemented in:
https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py.
We plan to upstream this directly into `huggingface/transformers`
(either `master` or `tpu`) branch once it's been more thoroughly tested.
* Cleanup TPU bits from run_glue.py
TPU runner is currently implemented in:
https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py.
We plan to upstream this directly into `huggingface/transformers`
(either `master` or `tpu`) branch once it's been more thoroughly tested.
* No need to call `xm.mark_step()` explicitly (#4)
Since for gradient accumulation we're accumulating on batches from
`ParallelLoader` instance which on next() marks the step itself.
* Resolve R/W conflicts from multiprocessing (#5)
* Add XLNet in list of models for `run_glue_tpu.py` (#6)
* Add RoBERTa to list of models in TPU GLUE (#7)
* Add RoBERTa and DistilBert to list of models in TPU GLUE (#8)
* Use barriers to reduce duplicate work/resources (#9)
* Shard eval dataset and aggregate eval metrics (#10)
* Shard eval dataset and aggregate eval metrics
Also, instead of calling `eval_loss.item()` every time do summation with
tensors on device.
* Change defaultdict to float
* Reduce the pred, label tensors instead of metrics
As brought up during review some metrics like f1 cannot be aggregated
via averaging. GLUE task metrics depends largely on the dataset, so
instead we sync the prediction and label tensors so that the metrics can
be computed accurately on those instead.
* Only use tb_writer from master (#11)
* Apply huggingface black code formatting
* Style
* Remove `--do_lower_case` as example uses cased
* Add option to specify tensorboard logdir
This is needed for our testing framework which checks regressions
against key metrics writtern by the summary writer.
* Using configuration for `xla_device`
* Prefix TPU specific comments.
* num_cores clarification and namespace eval metrics
* Cache features file under `args.cache_dir`
Instead of under `args.data_dir`. This is needed as our test infra uses
data_dir with a read-only filesystem.
* Rename `run_glue_tpu` to `run_tpu_glue`
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
* [examples] Generate argparsers from type hints on dataclasses
* [HfArgumentParser] way simpler API
* Restore run_language_modeling.py for easier diff
* [HfArgumentParser] final tweaks from code review
* Big cleanup of `glue_convert_examples_to_features`
* Use batch_encode_plus
* Cleaner wrapping of glue_convert_examples_to_features for TF
@lysandrejik
* Cleanup syntax, thanks to @mfuntowicz
* Raise explicit error in case of user error
* Optimize causal mask using torch.where
Instead of multiplying by 1.0 float mask, use torch.where with a bool mask for increased performance.
* Maintain compatiblity with torch 1.0.0 - thanks for PR feedback
* Fix typo
* reformat line for CI
* Renamed num_added_tokens to num_special_tokens_to_add
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Cherry-Pick: Partially fix space only input without special tokens added to the output #3091
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added property is_fast on PretrainedTokenizer and PretrainedTokenizerFast
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Make fast tokenizers unittests work on Windows.
* Entirely refactored unittest for tokenizers fast.
* Remove ABC class for CommonFastTokenizerTest
* Added embeded_special_tokens tests from allenai @dirkgr
* Make embeded_special_tokens tests from allenai more generic
* Uniformize vocab_size as a property for both Fast and normal tokenizers
* Move special tokens handling out of PretrainedTokenizer (SpecialTokensMixin)
* Ensure providing None input raise the same ValueError than Python tokenizer + tests.
* Fix invalid input for assert_padding when testing batch_encode_plus
* Move add_special_tokens from constructor to tokenize/encode/[batch_]encode_plus methods parameter.
* Ensure tokenize() correctly forward add_special_tokens to rust.
* Adding None checking on top on encode / encode_batch for TransfoXLTokenizerFast.
Avoid stripping on None values.
* unittests ensure tokenize() also throws a ValueError if provided None
* Added add_special_tokens unittest for all supported models.
* Style
* Make sure TransfoXL test run only if PyTorch is provided.
* Split up tokenizers tests for each model type.
* Fix invalid unittest with new tokenizers API.
* Filter out Roberta openai detector models from unittests.
* Introduce BatchEncoding on fast tokenizers path.
This new structure exposes all the mappings retrieved from Rust.
It also keeps the current behavior with model forward.
* Introduce BatchEncoding on slow tokenizers path.
Backward compatibility.
* Improve error message on BatchEncoding for slow path
* Make add_prefix_space True by default on Roberta fast to match Python in majority of cases.
* Style and format.
* Added typing on all methods for PretrainedTokenizerFast
* Style and format
* Added path for feeding pretokenized (List[str]) input to PretrainedTokenizerFast.
* Style and format
* encode_plus now supports pretokenized inputs.
* Remove user warning about add_special_tokens when working on pretokenized inputs.
* Always go through the post processor.
* Added support for pretokenized input pairs on encode_plus
* Added is_pretokenized flag on encode_plus for clarity and improved error message on input TypeError.
* Added pretokenized inputs support on batch_encode_plus
* Update BatchEncoding methods name to match Encoding.
* Bump setup.py tokenizers dependency to 0.7.0rc1
* Remove unused parameters in BertTokenizerFast
* Make sure Roberta returns token_type_ids for unittests.
* Added missing typings
* Update add_tokens prototype to match tokenizers side and allow AddedToken
* Bumping tokenizers to 0.7.0rc2
* Added documentation for BatchEncoding
* Added (unused) is_pretokenized parameter on PreTrainedTokenizer encode_plus/batch_encode_plus methods.
* Added higher-level typing for tokenize / encode_plus / batch_encode_plus.
* Fix unittests failing because add_special_tokens was defined as a constructor parameter on Rust Tokenizers.
* Fix text-classification pipeline using the wrong tokenizer
* Make pipelines works with BatchEncoding
* Turn off add_special_tokens on tokenize by default.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Remove add_prefix_space from tokenize call in unittest.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Style and quality
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Correct message for batch_encode_plus none input exception.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix invalid list comprehension for offset_mapping overriding content every iteration.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* TransfoXL uses Strip normalizer.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Bump tokenizers dependency to 0.7.0rc3
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Support AddedTokens for special_tokens and use left stripping on mask for Roberta.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* SpecilaTokenMixin can use slots to faster access to underlying attributes.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Remove update_special_tokens from fast tokenizers.
* Ensure TransfoXL unittests are run only when torch is available.
* Style.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Style
* Style 🙏🙏
* Remove slots on SpecialTokensMixin, need deep dive into pickle protocol.
* Remove Roberta warning on __init__.
* Move documentation to Google style.
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
* Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py
`convert_examples_to_fes atures` sets `pad_token=0` by default, which is correct for BERT but incorrect for RoBERTa (`pad_token=1`) and XLNet (`pad_token=5`). I think the other arguments to `convert_examples_to_features` are correct, but it might be helpful if someone checked who is more familiar with this part of the codebase.
* Simplifying change to match recent commits
# Check on the job periodically. Set the status code depending on what
# happened to the job in Kubernetes. If we try max_checks times and
# still the job hasn't finished, give up and return the starting
# non-zero status code.
while [ $i -lt $max_checks ]; do ((i++)); if kubectl get jobs $job_name -o jsonpath='Failed:{.status.failed}' | grep "Failed:1"; then status_code=1 && break; elif kubectl get jobs $job_name -o jsonpath='Succeeded:{.status.succeeded}' | grep "Succeeded:1" ; then status_code=0 && break; else echo "Job not finished yet"; fi; sleep 30; done && \
echo "Done waiting. Job status code: $status_code" && \
pod_name=$(kubectl get po -l controller-uid=`kubectl get job $job_name -o "jsonpath={.metadata.labels.controller-uid}"` | awk 'match($0,!/NAME/) {print $1}') && \
Congratulations! You've made it this far! You're not quite done yet though.
Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution.
Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change.
Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost.
-->
<!-- Remove if not applicable -->
Fixes # (issue)
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md#start-contributing-pull-requests),
Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link
to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the
[documentation guidelines](https://github.com/huggingface/transformers/tree/master/docs), and
[here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/master/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?
## Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.
<!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
Awesome! Please provide the following information:
@@ -58,7 +68,8 @@ Awesome! Please provide the following information:
If you are willing to contribute the model yourself, let us know so we can best
guide you.
We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder.
We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them
in the [`templates`](https://github.com/huggingface/transformers/tree/master/templates) folder.
### Do you want a new feature (that is not a model)?
@@ -79,11 +90,13 @@ A world-class feature request addresses the following points:
If your issue is well written we're already 80% of the way there by the time you
post it.
We have added **templates** to guide you in the process of adding a new example script for training or testing the models in the library. You can find them in the [`templates`](./templates) folder.
We have added **templates** to guide you in the process of adding a new example script for training or testing the
models in the library. You can find them in the [`templates`](https://github.com/huggingface/transformers/tree/master/templates)
folder.
## Start contributing! (Pull Requests)
Before writing code, we strongly advise you to search through the exising PRs or
Before writing code, we strongly advise you to search through the existing PRs or
issues to make sure that nobody is already working on the same thing. If you are
unsure, it is always a good idea to open an issue to get some feedback.
@@ -124,13 +137,18 @@ Follow these steps to start contributing:
it with `pip uninstall transformers` before reinstalling it in editable
mode with the `-e` flag.)
Right now, we need an unreleased version of `isort` to avoid a
If you have already cloned that repo, you might need to `git pull` to get the most recent changes in the `datasets`
library.
5. Develop the features on your branch.
As you work on the features, you should make sure that the test suite
@@ -140,6 +158,14 @@ Follow these steps to start contributing:
$ make test
```
Note, that this command uses `-n auto` pytest flag, therefore, it will start as many parallel `pytest` processes as the number of your computer's CPU-cores, and if you have lots of those and a few GPUs and not a great amount of RAM, it's likely to overload your computer. Therefore, to run the test suite, you may want to consider using this command instead:
Adjust the value of `-n` to fit the load your hardware can support.
`transformers` relies on `black` and `isort` to format its source code
consistently. After you make changes, format them with:
@@ -147,12 +173,29 @@ Follow these steps to start contributing:
$ make style
```
`transformers` also uses `flake8` to check for coding mistakes. Quality
`transformers` also uses `flake8` and a few custom scripts to check for coding mistakes. Quality
control runs in CI, however you can also run the same checks with:
```bash
$ make quality
```
You can do the automatic style corrections and code verifications that can't be automated in one go:
```bash
$ make fixup
```
This target is also optimized to only work with files modified by the PR you're working on.
If you're modifying documents under `docs/source`, make sure to validate that
they can still be built. This check also runs in CI. To run a local check
make sure you have installed the documentation builder requirements, by
running `pip install .[tf,torch,docs]` once from the root of this repository
and then run:
```bash
$ make docs
```
Once you're happy with your changes, add changed files using `git add` and
make a commit with `git commit` to record your changes locally:
@@ -192,22 +235,29 @@ Follow these steps to start contributing:
### Checklist
1. The title of your pull request should be a summary of its contribution;
2. If your pull request adresses an issue, please mention the issue number in
2. If your pull request addresses an issue, please mention the issue number in
the pull request description to make sure they are linked (and people
consulting the issue know you are working on it);
3. To indicate a work in progress please prefix the title with `[WIP]`. These
are useful to avoid duplicated work, and to differentiate it from PRs ready
to be merged;
4. Make sure existing tests pass;
5. Add high-coverage tests. No quality test, no merge.
- If you are adding a new model, make sure that you use `ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)`, which triggers the common tests.
- If you are adding new `@slow` tests, make sure they pass using `RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`.
CircleCI does not run them.
6. All public methods must have informative docstrings;
5. Add high-coverage tests. No quality testing = no merge.
- If you are adding a new model, make sure that you use
`ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)`, which triggers the common tests.
- If you are adding new `@slow` tests, make sure they pass using
#### This guide was heavily inspired by the awesome [scikit-learn guide to contributing](https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md)
### Develop on Windows
One way one can run the make command on Window is to pass by MSYS2:
1. [Download MSYS2](https://www.msys2.org/), we assume to have it installed in C:\msys64
2. Open the command line C:\msys64\msys2.exe (it should be available from the start menu)
3. Run in the shell: `pacman -Syu` and install make with `pacman -S make`
<p>State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch
<p>State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0
</h3>
🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides state-of-the-art general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet, CTRL...) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.
🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation, etc in 100+ languages. Its aim is to make cutting-edge NLP easier to use for everyone.
🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets then share them with the community on our [model hub](https://huggingface.co/models). At the same time, each python module defining an architecture can be used as a standalone and modified to enable quick research experiments.
🤗 Transformers is backed by the two most popular deep learning libraries, [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/), with a seamless integration between them, allowing you to train your models with one then load it for inference with the other.
- Low barrier to entry for educators and practitioners
You can test most of our models directly on their pages from the [model hub](https://huggingface.co/models). We also offer an [inference API](https://huggingface.co/pricing) to use those models.
State-of-the-art NLP for everyone
-Deep learning researchers
-Hands-on practitioners
-AI/ML/NLP teachers and educators
Here are a few examples:
-[Masked word completion with BERT](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France)
-[Name Entity Recognition with Electra](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city)
-[Text generation with GPT-2](https://huggingface.co/gpt2?text=A+long+time+ago%2C+)
- [Natural Langugage Inference with RoBERTa](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal)
- [Summarization with BART](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct)
- [Question answering with DistilBERT](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species)
- [Translation with T5](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin)
Lower compute costs, smaller carbon footprint
- Researchers can share trained models instead of always retraining
- Practitioners can reduce compute time and production costs
- 10 architectures with over 30 pretrained models, some in more than 100 languages
**[Write With Transformer](https://transformer.huggingface.co)**, built by the Hugging Face team, is the official demo of this repo’s text generation capabilities.
Choose the right framework for every part of a model's lifetime
- Train state-of-the-art models in 3 lines of code
- Deep interoperability between TensorFlow 2.0 and PyTorch models
- Move a single model between TF2.0/PyTorch frameworks at will
- Seamlessly pick the right framework for training, evaluation, production
## Quick tour
To immediately use a model on a given text, we provide the `pipeline` API. Pipelines group together a pretrained model with the preprocessing that was used during that model training. Here is how to quickly use a pipeline to classify positive versus negative texts
| Section | Description |
|-|-|
| [Installation](#installation) | How to install the package |
| [Quick tour: TF 2.0 and PyTorch ](#Quick-tour-TF-20-training-and-PyTorch-interoperability) | Train a TF 2.0 model in 10 lines of code, load it in PyTorch |
| [Quick tour: pipelines](#quick-tour-of-pipelines) | Using Pipelines: Wrapper around tokenizer and models to use finetuned models |
| [Quick tour: Fine-tuning/usage scripts](#quick-tour-of-the-fine-tuningusage-scripts) | Using provided scripts: GLUE, SQuAD and Text generation |
| [Quick tour: Share your models ](#Quick-tour-of-model-sharing) | Upload and share your fine-tuned models with the community |
| [Migrating from pytorch-transformers to transformers](#Migrating-from-pytorch-transformers-to-transformers) | Migrating your code from pytorch-transformers to transformers |
| [Migrating from pytorch-pretrained-bert to pytorch-transformers](#Migrating-from-pytorch-pretrained-bert-to-transformers) | Migrating your code from pytorch-pretrained-bert to transformers |
| [Documentation][(v2.5.0)](https://huggingface.co/transformers/v2.5.0)[(v2.4.0/v2.4.1)](https://huggingface.co/transformers/v2.4.0)[(v2.3.0)](https://huggingface.co/transformers/v2.3.0)[(v2.2.0/v2.2.1/v2.2.2)](https://huggingface.co/transformers/v2.2.0) [(v2.1.1)](https://huggingface.co/transformers/v2.1.1) [(v2.0.0)](https://huggingface.co/transformers/v2.0.0) [(v1.2.0)](https://huggingface.co/transformers/v1.2.0) [(v1.1.0)](https://huggingface.co/transformers/v1.1.0) [(v1.0.0)](https://huggingface.co/transformers/v1.0.0) [(master)](https://huggingface.co/transformers) | Full API documentation and more |
```python
>>>fromtransformersimportpipeline
# Allocate a pipeline for sentiment-analysis
>>>classifier=pipeline('sentiment-analysis')
>>>classifier('We are very happy to include pipeline into the transformers repository.')
[{'label':'POSITIVE','score':0.9978193640708923}]
```
The second line of code downloads and caches the pretrained model used by the pipeline, the third line evaluates it on the given text. Here the answer is "positive" with a confidence of 99.8%.
This is another example of pipeline used for that can extract question answers from some context:
On top of the answer, the pretrained model used here returned its confidence score, along with the start position and its end position in the tokenized sentence. You can learn more about the tasks supported by the `pipeline` API in [this tutorial](https://huggingface.co/transformers/task_summary.html).
To download and use any of the pretrained models on your given task, you just need to use those three lines of codes (PyTorch version):
```python
>>> from transformers import AutoTokenizer, AutoModel
The tokenizer is responsible for all the preprocessing the pretrained model expects, and can be called directly on one (or list) of texts (as we can see on the fourth line of both code examples). It will output a dictionary you can directly pass to your model (which is done on the fifth line).
The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) or a [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) (depending on your backend) which you can use normally. For instance, [this tutorial](https://huggingface.co/transformers/training.html) explains how to integrate such a model in classic PyTorch or TensorFlow training loop, or how to use our `Trainer` API to quickly fine-tune the on a new dataset.
## Why should I use transformers?
1. Easy-to-use state-of-the-art models:
- High performance on NLU and NLG tasks.
- Low barrier to entry for educators and practitioners.
- Few user-facing abstractions with just three classes to learn.
- A unified API for using all our pretrained models.
1. Lower compute costs, smaller carbon footprint:
- Researchers can share trained models instead of always retraining.
- Practitioners can reduce compute time and production costs.
- Dozens of architectures with over 2,000 pretrained models, some in more than 100 languages.
1. Choose the right framework for every part of a model's lifetime:
- Train state-of-the-art models in 3 lines of code.
- Move a single model between TF2.0/PyTorch frameworks at will.
- Seamlessly pick the right framework for training, evaluation, production.
1. Easily customize a model or an example to your needs:
- Examples for each architecture to reproduce the results by the official authors of said architecture.
- Expose the models internal as consistently as possible.
- Model files can be used independently of the library for quick experiments.
## Why shouldn't I use transformers?
- This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving in additional abstractions/files.
- The training API is not intended to work on any model but is optimized to work with the models provided by the library. For generic machine learning loops, you should use another library.
- While we strive to present as many use cases as possible, the scripts in our [examples folder](https://github.com/huggingface/transformers/tree/master/examples) are just that: examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs.
## Installation
This repo is tested on Python 3.6+, PyTorch 1.0.0+ and TensorFlow 2.0.0-rc1
This repository is tested on Python 3.6+, PyTorch 1.0.0+ (PyTorch 1.3.1+ for [examples](https://github.com/huggingface/transformers/tree/master/examples)) and TensorFlow 2.0.
You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
Create a virtual environment with the version of Python you're going to use and activate it.
First, create a virtual environment with the version of Python you're going to use and activate it.
Now, if you want to use 🤗 Transformers, you can install it with pip. If you'd like to play with the examples, you must install it from source.
### With pip
First you need to install one of, or both, TensorFlow 2.0 and PyTorch.
Then, you will need to install one of, or both, TensorFlow 2.0 and PyTorch.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available) and/or [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform.
When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows:
@@ -85,612 +152,75 @@ When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be
pip install transformers
```
### From source
Here also, you first need to install one of, or both, TensorFlow 2.0 and PyTorch.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available) and/or [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform.
When TensorFlow 2.0 and/or PyTorch has been installed, you can install from source by cloning the repository and running:
When you update the repository, you should upgrade the transformers installation and its dependencies as follows:
```bash
git pull
pip install --upgrade .
```
### Run the examples
Examples are included in the repository but are not shipped with the library.
Therefore, in order to run the latest versions of the examples, you need to install from source, as described above.
Look at the [README](https://github.com/huggingface/transformers/blob/master/examples/README.md) for how to run examples.
### Tests
A series of tests are included for the library and for some example scripts. Library tests can be found in the [tests folder](https://github.com/huggingface/transformers/tree/master/tests) and examples tests in the [examples folder](https://github.com/huggingface/transformers/tree/master/examples).
Depending on which framework is installed (TensorFlow 2.0 and/or PyTorch), the irrelevant tests will be skipped. Ensure that both frameworks are installed if you want to execute all tests.
Here's the easiest way to run tests for the library:
```bash
pip install -e ".[testing]"
make test
```
and for the examples:
```bash
pip install -e ".[testing]"
pip install -r examples/requirements.txt
make test-examples
```
For details, refer to the [contributing guide](https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md#tests).
### Do you want to run a Transformer model on a mobile device?
You should check out our [`swift-coreml-transformers`](https://github.com/huggingface/swift-coreml-transformers) repo.
It contains a set of tools to convert PyTorch or TensorFlow 2.0 trained Transformer models (currently contains `GPT-2`, `DistilGPT-2`, `BERT`, and `DistilBERT`) to CoreML models that run on iOS devices.
At some point in the future, you'll be able to seamlessly move from pre-training or fine-tuning models to productizing them in CoreML, or prototype a model or an app in CoreML then research its hyperparameters or architecture from TensorFlow 2.0 and/or PyTorch. Super exciting!
## Model architectures
🤗 Transformers currently provides the following NLU/NLG architectures:
1.**[BERT](https://github.com/google-research/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
2.**[GPT](https://github.com/openai/finetune-transformer-lm)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
3.**[GPT-2](https://blog.openai.com/better-language-models/)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
4.**[Transformer-XL](https://github.com/kimiyoung/transformer-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
5.**[XLNet](https://github.com/zihangdai/xlnet/)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
6.**[XLM](https://github.com/facebookresearch/XLM/)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
7.**[RoBERTa](https://github.com/pytorch/fairseq/tree/master/examples/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
8.**[DistilBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
9.**[CTRL](https://github.com/salesforce/ctrl/)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
10.**[CamemBERT](https://camembert-model.fr)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
11.**[ALBERT](https://github.com/google-research/ALBERT)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
12.**[T5](https://github.com/google-research/text-to-text-transfer-transformer)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
13.**[XLM-RoBERTa](https://github.com/pytorch/fairseq/tree/master/examples/xlmr)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
14.**[MMBT](https://github.com/facebookresearch/mmbt/)** (from Facebook), released together with the paper a [Supervised Multimodal Bitransformers for Classifying Images and Text](https://arxiv.org/pdf/1909.02950.pdf) by Douwe Kiela, Suvrat Bhooshan, Hamed Firooz, Davide Testuggine.
15.**[FlauBERT](https://github.com/getalp/Flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
16.**[BART](https://github.com/pytorch/fairseq/tree/master/examples/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
17.**[ELECTRA](https://github.com/google-research/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
18.**[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
19. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Peason R coefficient on STS-B for XLNet). You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
## Online demo
**[Write With Transformer](https://transformer.huggingface.co)**, built by the Hugging Face team at transformer.huggingface.co, is the official demo of this repo’s text generation capabilities.
You can use it to experiment with completions generated by `GPT2Model`, `TransfoXLModel`, and `XLNetModel`.
> “🦄 Write with transformer is to writing what calculators are to calculus.”
Let's do a very quick overview of the model architectures in 🤗 Transformers. Detailed examples for each model architecture (Bert, GPT, GPT-2, Transformer-XL, XLNet and XLM) can be found in the [full documentation](https://huggingface.co/transformers/).
```python
importtorch
fromtransformersimport*
# Transformers has a unified API
# for 10 transformer architectures and 30 pretrained weights.
# To use TensorFlow 2.0 versions of the models, simply prefix the class names with 'TF', e.g. `TFRobertaModel` is the TF 2.0 counterpart of the PyTorch model `RobertaModel`
# Let's encode some text in a sequence of hidden-states using each model:
input_ids=torch.tensor([tokenizer.encode("Here is some text to encode",add_special_tokens=True)])# Add special tokens takes care of adding [CLS], [SEP], <s>... tokens in the right way for each model.
withtorch.no_grad():
last_hidden_states=model(input_ids)[0]# Models outputs are now tuples
# Each architecture is provided with several class for fine-tuning on down-stream tasks, e.g.
# SOTA examples for GLUE, SQUAD, text generation...
```
## Quick tour TF 2.0 training and PyTorch interoperability
Let's do a quick example of how a TensorFlow 2.0 model can be trained in 12 lines of code with 🤗 Transformers and then loaded in PyTorch for fast inspection/tests.
```python
importtensorflowastf
importtensorflow_datasets
fromtransformersimport*
# Load dataset, tokenizer, model from pretrained model/vocabulary
print("sentence_1 is","a paraphrase"ifpred_1else"not a paraphrase","of sentence_0")
print("sentence_2 is","a paraphrase"ifpred_2else"not a paraphrase","of sentence_0")
```
## Quick tour of the fine-tuning/usage scripts
**Important**
Before running the fine-tuning scripts, please read the
[instructions](#run-the-examples) on how to
setup your environment to run the examples.
The library comprises several example scripts with SOTA performances for NLU and NLG tasks:
-`run_glue.py`: an example fine-tuning Bert, XLNet and XLM on nine different GLUE tasks (*sequence-level classification*)
-`run_squad.py`: an example fine-tuning Bert, XLNet and XLM on the question answering dataset SQuAD 2.0 (*token-level classification*)
-`run_generation.py`: an example using GPT, GPT-2, CTRL, Transformer-XL and XLNet for conditional language generation
- other model-specific examples (see the documentation).
Here are three quick usage examples for these scripts:
### `run_glue.py`: Fine-tuning on GLUE tasks for sequence classification
The [General Language Understanding Evaluation (GLUE) benchmark](https://gluebenchmark.com/) is a collection of nine sentence- or sentence-pair language understanding tasks for evaluating and analyzing natural language understanding systems.
Before running anyone of these GLUE tasks you should download the
[GLUE data](https://gluebenchmark.com/tasks) by running
You should also install the additional packages required by the examples:
```shell
pip install -r ./examples/requirements.txt
```
```shell
exportGLUE_DIR=/path/to/glue
exportTASK_NAME=MRPC
python ./examples/run_glue.py \
--model_type bert \
--model_name_or_path bert-base-uncased \
--task_name $TASK_NAME\
--do_train \
--do_eval \
--do_lower_case \
--data_dir $GLUE_DIR/$TASK_NAME\
--max_seq_length 128\
--per_gpu_eval_batch_size=8\
--per_gpu_train_batch_size=8\
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir /tmp/$TASK_NAME/
```
where task name can be one of CoLA, SST-2, MRPC, STS-B, QQP, MNLI, QNLI, RTE, WNLI.
The dev set results will be present within the text file 'eval_results.txt' in the specified output_dir. In case of MNLI, since there are two separate dev sets, matched and mismatched, there will be a separate output folder called '/tmp/MNLI-MM/' in addition to '/tmp/MNLI/'.
#### Fine-tuning XLNet model on the STS-B regression task
This example code fine-tunes XLNet on the STS-B corpus using parallel training on a server with 4 V100 GPUs.
Parallel training is a simple way to use several GPUs (but is slower and less flexible than distributed training, see below).
```shell
exportGLUE_DIR=/path/to/glue
python ./examples/run_glue.py \
--model_type xlnet \
--model_name_or_path xlnet-large-cased \
--do_train \
--do_eval \
--task_name=sts-b \
--data_dir=${GLUE_DIR}/STS-B \
--output_dir=./proc_data/sts-b-110 \
--max_seq_length=128\
--per_gpu_eval_batch_size=8\
--per_gpu_train_batch_size=8\
--gradient_accumulation_steps=1\
--max_steps=1200\
--model_name=xlnet-large-cased \
--overwrite_output_dir \
--overwrite_cache \
--warmup_steps=120
```
On this machine we thus have a batch size of 32, please increase `gradient_accumulation_steps` to reach the same batch size if you have a smaller machine. These hyper-parameters should result in a Pearson correlation coefficient of `+0.917` on the development set.
#### Fine-tuning Bert model on the MRPC classification task
This example code fine-tunes the Bert Whole Word Masking model on the Microsoft Research Paraphrase Corpus (MRPC) corpus using distributed training on 8 V100 GPUs to reach a F1 > 92.
Training with these hyper-parameters gave us the following results:
```bash
acc= 0.8823529411764706
acc_and_f1= 0.901702786377709
eval_loss= 0.3418912578906332
f1= 0.9210526315789473
global_step=174
loss= 0.07231863956341798
```
### `run_squad.py`: Fine-tuning on SQuAD for question-answering
This example code fine-tunes BERT on the SQuAD dataset using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD:
This is the model provided as `bert-large-uncased-whole-word-masking-finetuned-squad`.
### `run_generation.py`: Text generation with GPT, GPT-2, CTRL, Transformer-XL and XLNet
A conditional generation script is also included to generate text from a prompt.
The generation script includes the [tricks](https://github.com/rusiaaman/XLNet-gen#methodology) proposed by Aman Rusia to get high-quality generation with memory models like Transformer-XL and XLNet (include a predefined text to make short inputs longer).
Here is how to run the script with the small version of OpenAI GPT-2 model:
```shell
python ./examples/run_generation.py \
--model_type=gpt2 \
--length=20\
--model_name_or_path=gpt2 \
```
and from the Salesforce CTRL model:
```shell
python ./examples/run_generation.py \
--model_type=ctrl \
--length=20\
--model_name_or_path=ctrl \
--temperature=0\
--repetition_penalty=1.2 \
```
## Quick tour of model sharing
Starting with `v2.2.2`, you can now upload and share your fine-tuned models with the community, using the <abbr title="Command-line interface">CLI</abbr> that's built-in to the library.
**First, create an account on [https://huggingface.co/join](https://huggingface.co/join)**. Optionally, join an existing organization or create a new one. Then:
```shell
transformers-cli login
# log in using the same credentials as on huggingface.co
# (you can optionally override its filename, which can be nested inside a folder)
```
If you want your model to be namespaced by your organization name rather than your username, add the following flag to any command:
```shell
--organization organization_name
```
Your model will then be accessible through its identifier, a concatenation of your username (or organization name) and the folder name above:
```python
"username/pretrained_model"
# or if an org:
"organization_name/pretrained_model"
```
**Please add a README.md model card** to the repo under `model_cards/` with: model description, training params (dataset, preprocessing, hardware used, hyperparameters), evaluation results, intended uses & limitations, etc.
Your model now has a page on huggingface.co/models 🔥
New in version `v2.3`: `Pipeline` are high-level objects which automatically handle tokenization, running your data through a transformers model
and outputting the result in a structured object.
You can create `Pipeline` objects for the following down-stream tasks:
-`feature-extraction`: Generates a tensor representation for the input sequence
-`ner`: Generates named entity mapping for each word in the input sequence.
-`sentiment-analysis`: Gives the polarity (positive / negative) of the whole input sequence.
-`text-classification`: Initialize a `TextClassificationPipeline` directly, or see `sentiment-analysis` for an example.
-`question-answering`: Provided some context and a question refering to the context, it will extract the answer to the question in the context.
-`fill-mask`: Takes an input sequence containing a masked token (e.g. `<mask>`) and return list of most probable filled sequences, with their probabilities.
```python
fromtransformersimportpipeline
# Allocate a pipeline for sentiment-analysis
nlp=pipeline('sentiment-analysis')
nlp('We are very happy to include pipeline into the transformers repository.')
>>>{'label':'POSITIVE','score':0.99893874}
# Allocate a pipeline for question-answering
nlp=pipeline('question-answering')
nlp({
'question':'What is the name of the repository ?',
'context':'Pipeline have been included in the huggingface/transformers repository'
## Migrating from pytorch-transformers to transformers
Here is a quick summary of what you should take care of when migrating from `pytorch-transformers` to `transformers`.
### Positional order of some models' keywords inputs (`attention_mask`, `token_type_ids`...) changed
To be able to use Torchscript (see #1010, #1204 and #1195) the specific order of some models **keywords inputs** (`attention_mask`, `token_type_ids`...) has been changed.
If you used to call the models with keyword names for keyword arguments, e.g. `model(inputs_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)`, this should not cause any change.
If you used to call the models with positional inputs for keyword arguments, e.g. `model(inputs_ids, attention_mask, token_type_ids)`, you may have to double check the exact order of input arguments.
## Migrating from pytorch-pretrained-bert to transformers
Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to `transformers`.
### Models always output `tuples`
The main breaking change when migrating from `pytorch-pretrained-bert` to `transformers` is that every model's forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.
The exact content of the tuples for each model is detailed in the models' docstrings and the [documentation](https://huggingface.co/transformers/).
In pretty much every case, you will be fine by taking the first element of the output as the output you previously used in `pytorch-pretrained-bert`.
Here is a `pytorch-pretrained-bert` to `transformers` conversion example for a `BertForSequenceClassification` classification model:
By enabling the configuration option `output_hidden_states`, it was possible to retrieve the last hidden states of the encoder. In `pytorch-transformers` as well as `transformers` the return value has changed slightly: `all_hidden_states` now also includes the hidden state of the embeddings in addition to those of the encoding layers. This allows users to easily access the embeddings final state.
### Serialization
Breaking change in the `from_pretrained()` method:
1. Models are now set in evaluation mode by default when instantiated with the `from_pretrained()` method. To train them, don't forget to set them back in training mode (`model.train()`) to activate the dropout modules.
2. The additional `*input` and `**kwargs` arguments supplied to the `from_pretrained()` method used to be directly passed to the underlying model's class `__init__()` method. They are now used to update the model configuration attribute instead, which can break derived model classes built based on the previous `BertForSequenceClassification` examples. We are working on a way to mitigate this breaking change in [#866](https://github.com/huggingface/transformers/pull/866) by forwarding the the model's `__init__()` method (i) the provided positional arguments and (ii) the keyword arguments which do not match any configuration class attributes.
Also, while not a breaking change, the serialization methods have been standardized and you probably should switch to the new method `save_pretrained(save_directory)` if you were using any other serialization method before.
### Optimizers: BertAdam & OpenAIAdam are now AdamW, schedules are standard PyTorch schedules
The two optimizers previously included, `BertAdam` and `OpenAIAdam`, have been replaced by a single `AdamW` optimizer which has a few differences:
- it only implements weights decay correction,
- schedules are now externals (see below),
- gradient clipping is now also external (see below).
The new optimizer `AdamW` matches PyTorch `Adam` optimizer API and let you use standard PyTorch or apex methods for the schedule and clipping.
The schedules are now standard [PyTorch learning rate schedulers](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate) and not part of the optimizer anymore.
Here is a conversion examples from `BertAdam` with a linear warmup and decay schedule to `AdamW` and the same schedule:
torch.nn.utils.clip_grad_norm_(model.parameters(),max_grad_norm)# Gradient clipping is not in AdamW anymore (so you can use amp without issue)
optimizer.step()
scheduler.step()
optimizer.zero_grad()
```
If you'd like to play with the examples, you must [install the library from source](https://huggingface.co/transformers/installation.html#installing-from-source).
## Models architectures
🤗 Transformers currently provides the following architectures (see [here](https://huggingface.co/transformers/model_summary.html) for a high-level summary of each them):
1. **[ALBERT](https://huggingface.co/transformers/model_doc/albert.html)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
1. **[BART](https://huggingface.co/transformers/model_doc/bart.html)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
1. **[BERT](https://huggingface.co/transformers/model_doc/bert.html)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
1. **[BERT For Sequence Generation](https://huggingface.co/transformers/model_doc/bertgeneration.html)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[Blenderbot](https://huggingface.co/transformers/model_doc/blenderbot.html)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[CamemBERT](https://huggingface.co/transformers/model_doc/camembert.html)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CTRL](https://huggingface.co/transformers/model_doc/ctrl.html)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[DeBERTa](https://huggingface.co/transformers/model_doc/deberta.html)** (from Microsoft Research) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DialoGPT](https://huggingface.co/transformers/model_doc/dialogpt.html)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](https://huggingface.co/transformers/model_doc/distilbert.html)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
1. **[DPR](https://huggingface.co/transformers/model_doc/dpr.html)** (from Facebook) released with the paper [Dense Passage Retrieval
for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon
Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[ELECTRA](https://huggingface.co/transformers/model_doc/electra.html)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[FlauBERT](https://huggingface.co/transformers/model_doc/flaubert.html)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[Funnel Transformer](https://huggingface.co/transformers/model_doc/funnel.html)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GPT](https://huggingface.co/transformers/model_doc/gpt.html)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT-2](https://huggingface.co/transformers/model_doc/gpt2.html)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[Longformer](https://huggingface.co/transformers/model_doc/longformer.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LXMERT](https://huggingface.co/transformers/model_doc/lxmert.html)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[MarianMT](https://huggingface.co/transformers/model_doc/marian.html)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MBart](https://huggingface.co/transformers/model_doc/mbart.html)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[MT5](https://huggingface.co/transformers/model_doc/mt5.html)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Pegasus](https://huggingface.co/transformers/model_doc/pegasus.html)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[ProphetNet](https://huggingface.co/transformers/model_doc/prophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[Reformer](https://huggingface.co/transformers/model_doc/reformer.html)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RoBERTa](https://huggingface.co/transformers/model_doc/roberta.html)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
ultilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
1. **[SqueezeBert](https://huggingface.co/transformers/model_doc/squeezebert.html)** released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[T5](https://huggingface.co/transformers/model_doc/t5.html)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[Transformer-XL](https://huggingface.co/transformers/model_doc/transformerxl.html)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[XLM](https://huggingface.co/transformers/model_doc/xlm.html)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](https://huggingface.co/transformers/model_doc/xlmprophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
1. **[XLNet](https://huggingface.co/transformers/model_doc/xlnet.html)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
1. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
To cehck if each model has an implementation in PyTorch/TensorFlow/Flax or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to [this table](https://huggingface.co/transformers/index.html#bigtable)
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations. You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
## Learn more
| Section | Description |
|-|-|
| [Documentation](https://huggingface.co/transformers/) | Full API documentation and tutorials |
| [Task summary](https://huggingface.co/transformers/task_summary.html) | Tasks supported by 🤗 Transformers |
| [Preprocessing tutorial](https://huggingface.co/transformers/preprocessing.html) | Using the `Tokenizer` class to prepare data for the models |
| [Training and fine-tuning](https://huggingface.co/transformers/training.html) | Using the models provided by 🤗 Transformers in a PyTorch/TensorFlow training loop and the `Trainer` API |
| [Quick tour: Fine-tuning/usage scripts](https://github.com/huggingface/transformers/tree/master/examples) | Example scripts for fine-tuning models on a wide range of tasks |
| [Model sharing and uploading](https://huggingface.co/transformers/model_sharing.html) | Upload and share your fine-tuned models with the community |
| [Migration](https://huggingface.co/transformers/migration.html) | Migrate to 🤗 Transformers from `pytorch-transformers` or `pytorch-pretrained-bert` |
## Citation
We now have a paper you can cite for the 🤗 Transformers library:
We now have a [paper](https://arxiv.org/abs/1910.03771) you can cite for the 🤗 Transformers library:
```bibtex
@article{Wolf2019HuggingFacesTS,
title={HuggingFace's Transformers: State-of-the-art Natural Language Processing},
author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and R'emi Louf and Morgan Funtowicz and Jamie Brew},
author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush},
This section is dedicated to the Benchmarks done by the library, both by maintainers, contributors and users. These
benchmark will help keep track of the preformance improvements that are brought to our models across versions.
## Benchmarking all models for inference
As of version 2.1 we have benchmarked all models for inference, across many different settings: using PyTorch, with
and without TorchScript, using TensorFlow, with and without XLA. All of those tests were done across CPUs (except for
TensorFlow XLA) and GPUs.
The approach is detailed in the [following blogpost](https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2)
The results are available [here](https://docs.google.com/spreadsheets/d/1sryqufw2D0XlUH4sq3e9Wnxu5EAQkaohzrJbd5HdQ_w/edit?usp=sharing).
## TF2 with mixed precision, XLA, Distribution (@tlkh)
This work was done by [Timothy Liu](https://github.com/tlkh).
There are very positive results to be gained from the various TensorFlow 2.0 features:
- Automatic Mixed Precision (AMP)
- XLA compiler
- Distribution strategies (multi-GPU)
The benefits are listed here (tested on CoLA, MRPC, SST-2):
- AMP: Between 1.4x to 1.6x decrease in overall time without change in batch size
- AMP+XLA: Up to 2.5x decrease in overall time on SST-2 (larger dataset)
- Distribution: Between 1.4x to 3.4x decrease in overall time on 4xV100
- Combined: Up to 5.7x decrease in overall training time, or 9.1x training throughput
The model quality (measured by the validation accuracy) fluctuates slightly. Taking an average of 4 training runs
on a single GPU gives the following results:
- CoLA: AMP results in slighter lower acc (0.820 vs 0.824)
- MRPC: AMP results in lower acc (0.823 vs 0.835)
- SST-2: AMP results in slighter lower acc (0.918 vs 0.922)
However, in a distributed setting with 4xV100 (4x batch size), AMP can yield in better results:
CoLA: AMP results in higher acc (0.828 vs 0.812)
MRPC: AMP results in lower acc (0.817 vs 0.827)
SST-2: AMP results in slightly lower acc (0.926 vs 0.929)
The benchmark script is available [here](https://github.com/NVAITC/benchmarking/blob/master/tf2/bert_dist.py).
Note: on some tasks (e.g. MRPC), the dataset is too small. The overhead due to the model compilation with XLA as well
as the distribution strategy setup does not speed things up. The XLA compile time is also the reason why although throughput
can increase a lot (e.g. 2.7x for single GPU), overall (end-to-end) training speed-up is not as fast (as low as 1.4x)
The benefits as seen on SST-2 (larger dataset) is much clear.
All results can be seen on this [Google Sheet](https://docs.google.com/spreadsheets/d/1538MN224EzjbRL239sqSiUy6YY-rAjHyXhTzz_Zptls/edit#gid=960868445).
There is a growing field of study concerned with investigating the inner working of large-scale transformers like BERT (that some call "BERTology"). Some good examples of this field are:
There is a growing field of study concerned with investigating the inner working of large-scale transformers like BERT
(that some call "BERTology"). Some good examples of this field are:
* BERT Rediscovers the Classical NLP Pipeline by Ian Tenney, Dipanjan Das, Ellie Pavlick: https://arxiv.org/abs/1905.05950
* BERT Rediscovers the Classical NLP Pipeline by Ian Tenney, Dipanjan Das, Ellie Pavlick:
https://arxiv.org/abs/1905.05950
* Are Sixteen Heads Really Better than One? by Paul Michel, Omer Levy, Graham Neubig: https://arxiv.org/abs/1905.10650
* What Does BERT Look At? An Analysis of BERT's Attention by Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D. Manning: https://arxiv.org/abs/1906.04341
* What Does BERT Look At? An Analysis of BERT's Attention by Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D.
Manning: https://arxiv.org/abs/1906.04341
In order to help this new field develop, we have included a few additional features in the BERT/GPT/GPT-2 models to help people access the inner representations, mainly adapted from the great work of Paul Michel (https://arxiv.org/abs/1905.10650):
In order to help this new field develop, we have included a few additional features in the BERT/GPT/GPT-2 models to
help people access the inner representations, mainly adapted from the great work of Paul Michel
(https://arxiv.org/abs/1905.10650):
* accessing all the hidden-states of BERT/GPT/GPT-2,
* accessing all the attention weights for each head of BERT/GPT/GPT-2,
* retrieving heads output values and gradients to be able to compute head importance score and prune head as explained in https://arxiv.org/abs/1905.10650.
* retrieving heads output values and gradients to be able to compute head importance score and prune head as explained
in https://arxiv.org/abs/1905.10650.
To help you understand and use these features, we have added a specific example script: `bertology.py<https://github.com/huggingface/transformers/blob/master/examples/run_bertology.py>`_ while extract information and prune a model pre-trained on GLUE.
To help you understand and use these features, we have added a specific example script: `bertology.py
<https://github.com/huggingface/transformers/blob/master/examples/bertology/run_bertology.py>`_ while extract
information and prune a model pre-trained on GLUE.
A command-line interface is provided to convert original Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM checkpoints in models than be loaded using the ``from_pretrained`` methods of the library.
A command-line interface is provided to convert original Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM checkpoints in models
than be loaded using the ``from_pretrained`` methods of the library.
..note::
Since 2.3.0 the conversion script is now part of the transformers CLI (**transformers-cli**)
available in any transformers >= 2.3.0 installation.
Since 2.3.0 the conversion script is now part of the transformers CLI (**transformers-cli**) available in any
transformers >= 2.3.0 installation.
The documentation below reflects the **transformers-cli convert** command format.
You can convert any TensorFlow checkpoint for BERT (in particular `the pre-trained models released by Google<https://github.com/google-research/bert#pre-trained-models>`_\ ) in a PyTorch save file by using the `convert_tf_checkpoint_to_pytorch.py <https://github.com/huggingface/transformers/blob/master/transformers/convert_tf_checkpoint_to_pytorch.py>`_ script.
You can convert any TensorFlow checkpoint for BERT (in particular `the pre-trained models released by Google
<https://github.com/google-research/bert#pre-trained-models>`_\ ) in a PyTorch save file by using the
This CLI takes as input a TensorFlow checkpoint (three files starting with ``bert_model.ckpt``\ ) and the associated configuration file (\ ``bert_config.json``\ ), and creates a PyTorch model for this configuration, loads the weights from the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can be imported using ``torch.load()`` (see examples in `run_bert_extract_features.py <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples/run_bert_extract_features.py>`_\ , `run_bert_classifier.py <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples/run_bert_classifier.py>`_ and `run_bert_squad.py <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples/run_bert_squad.py>`_\ ).
This CLI takes as input a TensorFlow checkpoint (three files starting with ``bert_model.ckpt``\ ) and the associated
configuration file (\ ``bert_config.json``\ ), and creates a PyTorch model for this configuration, loads the weights
from the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that
can be imported using ``torch.load()`` (see examples in `run_bert_extract_features.py
You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow checkpoint (the three files starting with ``bert_model.ckpt``\ ) but be sure to keep the configuration file (\ ``bert_config.json``\ ) and the vocabulary file (\ ``vocab.txt``\ ) as these are needed for the PyTorch model too.
You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow
checkpoint (the three files starting with ``bert_model.ckpt``\ ) but be sure to keep the configuration file (\
``bert_config.json``\ ) and the vocabulary file (\ ``vocab.txt``\ ) as these are needed for the PyTorch model too.
To run this specific conversion script you will need to have TensorFlow and PyTorch installed (\ ``pip install tensorflow``\ ). The rest of the repository only requires PyTorch.
To run this specific conversion script you will need to have TensorFlow and PyTorch installed (\ ``pip install
tensorflow``\ ). The rest of the repository only requires PyTorch.
Here is an example of the conversion process for a pre-trained ``BERT-Base Uncased`` model:
@@ -31,12 +47,41 @@ Here is an example of the conversion process for a pre-trained ``BERT-Base Uncas
Here is an example of the conversion process for a pre-trained OpenAI GPT model, assuming that your NumPy checkpoint save as the same format than OpenAI pretrained model (see `here <https://github.com/openai/finetune-transformer-lm>`__\ )
Here is an example of the conversion process for a pre-trained OpenAI GPT model, assuming that your NumPy checkpoint
save as the same format than OpenAI pretrained model (see `here <https://github.com/openai/finetune-transformer-lm>`__\
)
..code-block::shell
@@ -50,9 +95,10 @@ Here is an example of the conversion process for a pre-trained OpenAI GPT model,
Here is an example of the conversion process for a pre-trained Transformer-XL model (see `here<https://github.com/kimiyoung/transformer-xl/tree/master/tf#obtain-and-evaluate-pretrained-sota-models>`__\ )
Here is an example of the conversion process for a pre-trained Transformer-XL model (see `here
The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and conversion utilities for the following models:
The documentation is organized in five parts:
-**GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy
and a glossary.
-**USING 🤗 TRANSFORMERS** contains general tutorials on how to use the library.
-**ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library.
-**RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general research in
transformers model
- The three last section contain the documentation of each public class and function, grouped in:
-**MAIN CLASSES** for the main classes exposing the important APIs of the library.
-**MODELS** for the classes and functions related to each model implemented in the library.
-**INTERNAL HELPERS** for the classes and functions we use internally.
The library currently contains PyTorch, Tensorflow and Flax implementations, pretrained model weights, usage scripts
and conversion utilities for the following models:
..
This list is updated automatically from the README with `make fix-copies`. Do not update manually!
1.:doc:`ALBERT <model_doc/albert>` (from Google Research and the Toyota Technological Institute at Chicago) released
with the paper `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
<https://arxiv.org/abs/1909.11942>`__, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush
Sharma, Radu Soricut.
2.:doc:`BART <model_doc/bart>` (from Facebook) released with the paper `BART: Denoising Sequence-to-Sequence
Pre-training for Natural Language Generation, Translation, and Comprehension
<https://arxiv.org/pdf/1910.13461.pdf>`__ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman
Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
3.:doc:`BERT <model_doc/bert>` (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`__ by Jacob Devlin, Ming-Wei Chang,
Kenton Lee and Kristina Toutanova.
4.:doc:`BERT For Sequence Generation <model_doc/bertgeneration>` (from Google) released with the paper `Leveraging
Pre-trained Checkpoints for Sequence Generation Tasks <https://arxiv.org/abs/1907.12461>`__ by Sascha Rothe, Shashi
Narayan, Aliaksei Severyn.
5.:doc:`Blenderbot <model_doc/blenderbot>` (from Facebook) released with the paper `Recipes for building an
open-domain chatbot <https://arxiv.org/abs/2004.13637>`__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary
Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
6.:doc:`CamemBERT <model_doc/camembert>` (from Inria/Facebook/Sorbonne) released with the paper `CamemBERT: a Tasty
French Language Model <https://arxiv.org/abs/1911.03894>`__ by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz
Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
7.:doc:`CTRL <model_doc/ctrl>` (from Salesforce) released with the paper `CTRL: A Conditional Transformer Language
Model for Controllable Generation <https://arxiv.org/abs/1909.05858>`__ by Nitish Shirish Keskar*, Bryan McCann*,
Lav R. Varshney, Caiming Xiong and Richard Socher.
8.:doc:`DeBERTa <model_doc/deberta>` (from Microsoft Research) released with the paper `DeBERTa: Decoding-enhanced
BERT with Disentangled Attention <https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao,
Weizhu Chen.
9.:doc:`DialoGPT <model_doc/dialogpt>` (from Microsoft Research) released with the paper `DialoGPT: Large-Scale
Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536>`__ by Yizhe Zhang,
Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
10.:doc:`DistilBERT <model_doc/distilbert>` (from HuggingFace), released together with the paper `DistilBERT, a
distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__ by Victor
Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, RoBERTa into `DistilRoBERTa
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, Multilingual BERT into
`DistilmBERT <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__ and a German
version of DistilBERT.
11.:doc:`DPR <model_doc/dpr>` (from Facebook) released with the paper `Dense Passage Retrieval for Open-Domain
Question Answering <https://arxiv.org/abs/2004.04906>`__ by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick
Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
12.:doc:`ELECTRA <model_doc/electra>` (from Google Research/Stanford University) released with the paper `ELECTRA:
Pre-training text encoders as discriminators rather than generators <https://arxiv.org/abs/2003.10555>`__ by Kevin
Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
13.:doc:`FlauBERT <model_doc/flaubert>` (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model
Pre-training for French <https://arxiv.org/abs/1912.05372>`__ by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne,
1.`BERT <https://github.com/google-research/bert>`_ (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`_ by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
2.`GPT <https://github.com/openai/finetune-transformer-lm>`_ (from OpenAI) released with the paper `Improving Language Understanding by Generative Pre-Training <https://blog.openai.com/language-unsupervised>`_ by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
3.`GPT-2 <https://blog.openai.com/better-language-models>`_ (from OpenAI) released with the paper `Language Models are Unsupervised Multitask Learners <https://blog.openai.com/better-language-models>`_ by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
4.`Transformer-XL <https://github.com/kimiyoung/transformer-xl>`_ (from Google/CMU) released with the paper `Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`_ by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
5.`XLNet <https://github.com/zihangdai/xlnet>`_ (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`_ by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
6.`XLM <https://github.com/facebookresearch/XLM>`_ (from Facebook) released together with the paper `Cross-lingual Language Model Pretraining <https://arxiv.org/abs/1901.07291>`_ by Guillaume Lample and Alexis Conneau.
7.`RoBERTa <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_ (from Facebook), released together with the paper a `Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
8.`DistilBERT <https://huggingface.co/transformers/model_doc/distilbert.html>`_ (from HuggingFace) released together with the paper `DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`_ by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2 <https://github.com/huggingface/transformers/tree/master/examples/distillation>`_.
9.`CTRL <https://github.com/pytorch/fairseq/tree/master/examples/ctrl>`_ (from Salesforce), released together with the paper `CTRL: A Conditional Transformer Language Model for Controllable Generation <https://www.github.com/salesforce/ctrl>`_ by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
10.`CamemBERT <https://huggingface.co/transformers/model_doc/camembert.html>`_ (from FAIR, Inria, Sorbonne Université) released together with the paper `CamemBERT: a Tasty French Language Model <https://arxiv.org/abs/1911.03894>`_ by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suarez, Yoann Dupont, Laurent Romary, Eric Villemonte de la Clergerie, Djame Seddah, and Benoît Sagot.
11.`ALBERT <https://github.com/google-research/ALBERT>`_ (from Google Research), released together with the paper a `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_ by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
12.`XLM-RoBERTa <https://github.com/pytorch/fairseq/tree/master/examples/xlmr>`_ (from Facebook AI), released together with the paper `Unsupervised Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`_ by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
13.`FlauBERT <https://github.com/getalp/Flaubert>`_ (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model Pre-training for French <https://arxiv.org/abs/1912.05372>`_ by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
Transformers is tested on Python 3.6+ and PyTorch 1.1.0
🤗 Transformers is tested on Python 3.6+, and PyTorch 1.1.0+ or TensorFlow 2.0+.
## With pip
You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're
unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). Create a virtual environment with the version of Python you're going
to use and activate it.
PyTorch Transformers can be installed using pip as follows:
Now, if you want to use 🤗 Transformers, you can install it with pip. If you'd like to play with the examples, you
must install it from source.
``` bash
## Installation with pip
First you need to install one of, or both, TensorFlow 2.0 and PyTorch.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available)
and/or [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific
install command for your platform.
When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows:
```bash
pip install transformers
```
## From source
Alternatively, for CPU-support only, you can install 🤗 Transformers and PyTorch in one line with:
To install from source, clone the repository and install with:
```bash
pip install transformers[torch]
```
or 🤗 Transformers and TensorFlow 2.0 in one line with:
```bash
pip install transformers[tf-cpu]
```
To check 🤗 Transformers is properly installed, run the following command:
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('we love you'))"
```
It should download a pretrained model then print something like
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in the [tests folder](https://github.com/huggingface/transformers/tree/master/tests) and examples tests in the [examples folder](https://github.com/huggingface/transformers/tree/master/examples).
Refer to the [contributing guide](https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md#tests) for details about running tests.
## OpenAI GPT original tokenization workflow
If you want to reproduce the original tokenization process of the `OpenAI GPT` paper, you will need to install `ftfy` and `SpaCy`:
``` bash
pip install spacy ftfy==4.4.3
python -m spacy download en
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I hate you'))"
```
If you don't install `ftfy` and `SpaCy`, the `OpenAI GPT` tokenizer will default to tokenize using BERT's `BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't worry).
to check 🤗 Transformers is properly installed.
## Note on model downloads (Continuous Integration or large-scale deployments)
## Caching models
If you expect to be downloading large volumes of models (more than 1,000) from our hosted bucket (for instance through your CI setup, or a large-scale production deployment), please cache the model files on your end. It will be way faster, and cheaper. Feel free to contact us privately if you need any help.
This library provides pretrained models that will be downloaded and cached locally. Unlessyou specify a location with
`cache_dir=...` when you use methods like `from_pretrained`, these models will automatically be downloaded in the
folder given by the shell environment variable ``TRANSFORMERS_CACHE``. The default value for it will be the Hugging
Face cache home followed by ``/transformers/``. This is (by order of priority):
So if you don't have any specific environment variable set, the cache directory will be at
``~/.cache/huggingface/transformers/``.
**Note:** If you have set a shell environment variable for one of the predecessors of this library
(``PYTORCH_TRANSFORMERS_CACHE`` or ``PYTORCH_PRETRAINED_BERT_CACHE``), those will be used if there is no shell
environment variable for ``TRANSFORMERS_CACHE``.
### Note on model downloads (Continuous Integration or large-scale deployments)
If you expect to be downloading large volumes of models (more than 1,000) from our hosted bucket (for instance through
your CI setup, or a large-scale production deployment), please cache the model files on your end. It will be way
faster, and cheaper. Feel free to contact us privately if you need any help.
## Do you want to run a Transformer model on a mobile device?
You should check out our [swift-coreml-transformers](https://github.com/huggingface/swift-coreml-transformers) repo.
It contains a set of tools to convert PyTorch or TensorFlow 2.0 trained Transformer models (currently contains `GPT-2`, `DistilGPT-2`, `BERT`, and `DistilBERT`) to CoreML models that run on iOS devices.
It contains a set of tools to convert PyTorch or TensorFlow 2.0 trained Transformer models (currently contains `GPT-2`,
`DistilGPT-2`, `BERT`, and `DistilBERT`) to CoreML models that run on iOS devices.
At some point in the future, you'll be able to seamlessly move from pre-training or fine-tuning models in PyTorch to productizing them in CoreML,
or prototype a model or an app in CoreML then research its hyperparameters or architecture from PyTorch. Super exciting!
At some point in the future, you'll be able to seamlessly move from pretraining or fine-tuning models in PyTorch or
TensorFlow 2.0 to productizing them in CoreML, or prototype a model or an app in CoreML then research its
hyperparameters or architecture from PyTorch or TensorFlow 2.0. Super exciting!
The base class ``PretrainedConfig`` implements the common methods for loading/saving a configuration either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded from HuggingFace's AWS S3 repository).
The base class :class:`~transformers.PretrainedConfig` implements the common methods for loading/saving a configuration
either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded
The base class``PreTrainedModel`` implements the common methods for loading/saving a model either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded from HuggingFace's AWS S3 repository).
The base classes :class:`~transformers.PreTrainedModel` and :class:`~transformers.TFPreTrainedModel` implement the
common methods for loading/saving a model either from a local file or directory, or from a pretrained model
configuration provided by the library (downloaded from HuggingFace's AWS S3 repository).
``PreTrainedModel`` also implements a few methods which are common among all the models to:
:class:`~transformers.PreTrainedModel` and :class:`~transformers.TFPreTrainedModel` also implement a few methods which
are common among all the models to:
- resize the input token embeddings when new tokens are added to the vocabulary
- prune the attention heads of the model.
``PreTrainedModel``
~~~~~~~~~~~~~~~~~~~~~
The other methods that are common to each model are defined in :class:`~transformers.modeling_utils.ModuleUtilsMixin`
(for the PyTorch models) and :class:`~transformers.modeling_tf_utils.TFModuleUtilsMixin` (for the TensorFlow models) or
for text generation, :class:`~transformers.generation_utils.GenerationMixin` (for the PyTorch models) and
:class:`~transformers.generation_tf_utils.TFGenerationMixin` (for the TensorFlow models)
An example using these processors is given in the `run_glue.py<https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_glue.py>`__ script.
An example using these processors is given in the `run_glue.py
`The Stanford Question Answering Dataset (SQuAD) <https://rajpurkar.github.io/SQuAD-explorer//>`__ is a benchmark that evaluates
the performance of models on question answering. Two versions are available, v1.1 and v2.0. The first version (v1.1) was released together with the paper
`SQuAD: 100,000+ Questions for Machine Comprehension of Text<https://arxiv.org/abs/1606.05250>`__. The second version (v2.0) was released alongside
the paper `Know What You Don't Know: Unanswerable Questions for SQuAD <https://arxiv.org/abs/1806.03822>`__.
`The Stanford Question Answering Dataset (SQuAD) <https://rajpurkar.github.io/SQuAD-explorer//>`__ is a benchmark that
evaluates the performance of models on question answering. Two versions are available, v1.1 and v2.0. The first version
(v1.1) was released together with the paper `SQuAD: 100,000+ Questions for Machine Comprehension of Text
<https://arxiv.org/abs/1606.05250>`__. The second version (v2.0) was released alongside the paper `Know What You Don't
Know: Unanswerable Questions for SQuAD <https://arxiv.org/abs/1806.03822>`__.
This library hosts a processor for each of the two versions:
The base class ``PreTrainedTokenizer`` implements the common methods for loading/saving a tokenizer either from a local file or directory, or from a pretrained tokenizer provided by the library (downloaded from HuggingFace's AWS S3 repository).
A tokenizer is in charge of preparing the inputs for a model. The library contains tokenizers for all the models. Most
of the tokenizers are available in two flavors: a full python implementation and a "Fast" implementation based on the
Rust library `tokenizers <https://github.com/huggingface/tokenizers>`__. The "Fast" implementations allows:
``PreTrainedTokenizer`` is the main entry point into tokenizers as it also implements the main methods for using all the tokenizers:
1. a significant speed-up in particular when doing batched tokenization and
2. additional methods to map between the original string (character and words) and the token space (e.g. getting the
index of the token comprising a given character or the span of characters corresponding to a given token). Currently
no "Fast" implementation is available for the SentencePiece-based tokenizers (for T5, ALBERT, CamemBERT, XLMRoBERTa
and XLNet models).
- tokenizing, converting tokens to ids and back and encoding/decoding,
- adding new tokens to the vocabulary in a way that is independant of the underlying structure (BPE, SentencePiece...),
- managing special tokens (adding them, assigning them to roles, making sure they are not split during tokenization)
The base classes :class:`~transformers.PreTrainedTokenizer` and :class:`~transformers.PreTrainedTokenizerFast`
implement the common methods for encoding string inputs in model inputs (see below) and instantiating/saving python and
"Fast" tokenizers either from a local file or directory or from a pretrained tokenizer provided by the library
(downloaded from HuggingFace's AWS S3 repository). They both rely on
:class:`~transformers.tokenization_utils_base.PreTrainedTokenizerBase` that contains the common methods, and
A couple of changes were introduced when the switch from version 3 to version 4 was done. Below is a summary of the
expected changes:
#### 1. AutoTokenizers and pipelines now use fast (rust) tokenizers by default.
The python and rust tokenizers have roughly the same API, but the rust tokenizers have a more complete feature set.
This introduces two breaking changes:
- The handling of overflowing tokens between the python and rust tokenizers is different.
- The rust tokenizers do not accept integers in the encoding methods.
##### How to obtain the same behavior as v3.x in v4.x
- The pipelines now contain additional features out of the box. See the [token-classification pipeline with the `grouped_entities` flag](https://huggingface.co/transformers/main_classes/pipelines.html?highlight=textclassification#tokenclassificationpipeline).
- The auto-tokenizers now return rust tokenizers. In order to obtain the python tokenizers instead, the user may use the `use_fast` flag by setting it to `False`:
#### 2. SentencePiece is removed from the required dependencies
The requirement on the SentencePiece dependency has been lifted from the `setup.py`. This is done so that we may have a channel on anaconda cloud without relying on `conda-forge`. This means that the tokenizers that depend on the SentencePiece library will not be available with a standard `transformers` installation.
This includes the **slow** versions of:
-`XLNetTokenizer`
-`AlbertTokenizer`
-`CamembertTokenizer`
-`MBartTokenizer`
-`PegasusTokenizer`
-`T5Tokenizer`
-`ReformerTokenizer`
-`XLMRobertaTokenizer`
##### How to obtain the same behavior as v3.x in v4.x
In order to obtain the same behavior as version `v3.x`, you should install `sentencepiece` additionally:
In version `v3.x`:
```bash
pip install transformers
```
to obtain the same in version `v4.x`:
```bash
pip install transformers[sentencepiece]
```
or
```bash
pip install transformers sentencepiece
```
#### 3. The architecture of the repo has been updated so that each model resides in its folder
The past and foreseeable addition of new models means that the number of files in the directory `src/transformers` keeps growing and becomes harder to navigate and understand. We made the choice to put each model and the files accompanying it in their own sub-directories.
This is a breaking change as importing intermediary layers using a model's module directly needs to be done via a different path.
##### How to obtain the same behavior as v3.x in v4.x
In order to obtain the same behavior as version `v3.x`, you should update the path used to access the layers.
In version `v3.x`:
```bash
from transformers.modeling_bert import BertLayer
```
to obtain the same in version `v4.x`:
```bash
from transformers.models.bert.modeling_bert import BertLayer
```
#### 4. Switching the `return_dict` argument to `True` by default
The [`return_dict` argument](https://huggingface.co/transformers/main_classes/output.html) enables the return of dict-like python objects containing the model outputs, instead of the standard tuples. This object is self-documented as keys can be used to retrieve values, while also behaving as a tuple as users may retrieve objects by index or by slice.
This is a breaking change as the limitation of that tuple is that it cannot be unpacked: `value0, value1 = outputs` will not work.
##### How to obtain the same behavior as v3.x in v4.x
In order to obtain the same behavior as version `v3.x`, you should specify the `return_dict` argument to `False`, either in the model configuration or during the forward pass.
Attributes that were deprecated have been removed if they had been deprecated for at least a month. The full list of deprecated attributes can be found in [#8604](https://github.com/huggingface/transformers/pull/8604).
Here is a list of these attributes/methods/arguments and what their replacements should be:
In several models, the labels become consistent with the other models:
-`masked_lm_labels` becomes `labels` in `AlbertForMaskedLM` and `AlbertForPreTraining`.
-`masked_lm_labels` becomes `labels` in `BertForMaskedLM` and `BertForPreTraining`.
-`masked_lm_labels` becomes `labels` in `DistilBertForMaskedLM`.
-`masked_lm_labels` becomes `labels` in `ElectraForMaskedLM`.
-`masked_lm_labels` becomes `labels` in `LongformerForMaskedLM`.
-`masked_lm_labels` becomes `labels` in `MobileBertForMaskedLM`.
-`masked_lm_labels` becomes `labels` in `RobertaForMaskedLM`.
-`lm_labels` becomes `labels` in `BartForConditionalGeneration`.
-`lm_labels` becomes `labels` in `GPT2DoubleHeadsModel`.
-`lm_labels` becomes `labels` in `OpenAIGPTDoubleHeadsModel`.
-`lm_labels` becomes `labels` in `T5ForConditionalGeneration`.
In several models, the caching mechanism becomes consistent with the other models:
-`decoder_cached_states` becomes `past_key_values` in all BART-like, FSMT and T5 models.
-`decoder_past_key_values` becomes `past_key_values` in all BART-like, FSMT and T5 models.
-`past` becomes `past_key_values` in all CTRL models.
-`past` becomes `past_key_values` in all GPT-2 models.
Regarding the tokenizer classes:
- The tokenizer attribute `max_len` becomes `model_max_length`.
- The tokenizer attribute `return_lengths` becomes `return_length`.
- The tokenizer encoding argument `is_pretokenized` becomes `is_split_into_words`.
Regarding the `Trainer` class:
- The `Trainer` argument `tb_writer` is removed in favor of the callback `TensorBoardCallback(tb_writer=...)`.
- The `Trainer` argument `prediction_loss_only` is removed in favor of the class argument `args.prediction_loss_only`.
- The `Trainer` attribute `data_collator` should be a callable.
- The `Trainer` method `_log` is deprecated in favor of `log`.
- The `Trainer` method `_training_step` is deprecated in favor of `training_step`.
- The `Trainer` method `_prediction_loop` is deprecated in favor of `prediction_loop`.
- The `Trainer` method `is_local_master` is deprecated in favor of `is_local_process_zero`.
- The `Trainer` method `is_world_master` is deprecated in favor of `is_world_process_zero`.
Regarding the `TFTrainer` class:
- The `TFTrainer` argument `prediction_loss_only` is removed in favor of the class argument `args.prediction_loss_only`.
- The `Trainer` method `_log` is deprecated in favor of `log`.
- The `TFTrainer` method `_prediction_loop` is deprecated in favor of `prediction_loop`.
- The `TFTrainer` method `_setup_wandb` is deprecated in favor of `setup_wandb`.
- The `TFTrainer` method `_run_model` is deprecated in favor of `run_model`.
Regarding the `TrainerArgument` class:
- The `TrainerArgument` argument `evaluate_during_training` is deprecated in favor of `evaluation_strategy`.
Regarding the Transfo-XL model:
- The Transfo-XL configuration attribute `tie_weight` becomes `tie_words_embeddings`.
- The Transfo-XL modeling method `reset_length` becomes `reset_memory_length`.
Regarding pipelines:
- The `FillMaskPipeline` argument `topk` becomes `top_k`.
Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to `transformers`
## Migrating from pytorch-transformers to 🤗 Transformers
Here is a quick summary of what you should take care of when migrating from `pytorch-transformers` to 🤗 Transformers.
### Positional order of some models' keywords inputs (`attention_mask`, `token_type_ids`...) changed
To be able to use Torchscript (see #1010, #1204 and #1195) the specific order of some models **keywords inputs** (`attention_mask`, `token_type_ids`...) has been changed.
If you used to call the models with keyword names for keyword arguments, e.g. `model(inputs_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)`, this should not cause any change.
If you used to call the models with positional inputs for keyword arguments, e.g. `model(inputs_ids, attention_mask, token_type_ids)`, you may have to double check the exact order of input arguments.
## Migrating from pytorch-pretrained-bert
Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to 🤗 Transformers
### Models always output `tuples`
The main breaking change when migrating from `pytorch-pretrained-bert` to `transformers` is that the models forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.
The main breaking change when migrating from `pytorch-pretrained-bert` to 🤗 Transformers is that the models forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.
The exact content of the tuples for each model are detailled in the models' docstrings and the [documentation](https://huggingface.co/transformers/).
The exact content of the tuples for each model are detailed in the models' docstrings and the [documentation](https://huggingface.co/transformers/).
In pretty much every case, you will be fine by taking the first element of the output as the output you previously used in `pytorch-pretrained-bert`.
Here is a `pytorch-pretrained-bert` to `transformers` conversion example for a `BertForSequenceClassification` classification model:
Here is a `pytorch-pretrained-bert` to 🤗 Transformers conversion example for a `BertForSequenceClassification` classification model:
```python
# Let's load our model
@@ -20,11 +198,11 @@ model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
# If you used to have this line in pytorch-pretrained-bert:
loss=model(input_ids,labels=labels)
# Now just use this line in transformers to extract the loss from the output tuple:
# Now just use this line in 🤗 Transformers to extract the loss from the output tuple:
outputs=model(input_ids,labels=labels)
loss=outputs[0]
# In transformers you can also have access to the logits:
# In 🤗 Transformers you can also have access to the logits:
loss,logits=outputs[:2]
# And even the attention weights if you configure the model to output them (and other outputs too, see the docstrings and documentation)
@@ -96,7 +274,7 @@ for batch in train_data:
loss.backward()
optimizer.step()
### In Transformers, optimizer and schedules are splitted and instantiated like this:
### In 🤗 Transformers, optimizer and schedules are split and instantiated like this:
optimizer=AdamW(model.parameters(),lr=lr,correct_bias=False)# To reproduce BertAdam specific behavior set correct_bias=False
In many cases, the architecture you want to use can be guessed from the name or the path of the pretrained model you are supplying to the ``from_pretrained`` method.
In many cases, the architecture you want to use can be guessed from the name or the path of the pretrained model you
are supplying to the :obj:`from_pretrained()` method. AutoClasses are here to do this job for you so that you
automatically retrieve the relevant model given the name/path to the pretrained weights/config/vocabulary.
AutoClasses are here to do this job for you so that you automatically retrieve the relevant model given the name/path to the pretrained weights/config/vocabulary:
Instantiating one of ``AutoModel``, ``AutoConfig`` and ``AutoTokenizer`` will directly create a class of the relevant architecture (ex: ``model = AutoModel.from_pretrained('bert-base-cased')`` will create a instance of ``BertModel``).
Instantiating one of :class:`~transformers.AutoConfig`, :class:`~transformers.AutoModel`, and
:class:`~transformers.AutoTokenizer` will directly create a class of the relevant architecture. For instance
The Bart model was proposed in `BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation,
Translation, and Comprehension <https://arxiv.org/abs/1910.13461>`__ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan
Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer on 29 Oct, 2019.
Paper
~~~~~
The Bart model was `proposed <https://arxiv.org/abs/1910.13461>`_ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer on 29 Oct, 2019.
According to the abstract,
- Bart uses a standard seq2seq/machine translation architecture with a bidirectional encoder (like BERT) and a left-to-right decoder (like GPT).
- The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme, where spans of text are replaced with a single mask token.
-BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE.
- Bart uses a standard seq2seq/machine translation architecture with a bidirectional encoder (like BERT) and a
left-to-right decoder (like GPT).
-The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme,
where spans of text are replaced with a single mask token.
- BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It
matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new
state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
of up to 6 ROUGE.
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/bart>`_
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/bart>`__.
-`Distilled checkpoints <https://huggingface.co/models?search=distilbart>`__ are described in this `paper
<https://arxiv.org/abs/2010.13002>`__.
Implementation Notes
~~~~~~~~~~~~~~~~~~~~
- Bart doesn't use :obj:`token_type_ids` for sequence classification. Use BartTokenizer.encode to get the proper splitting.
-The forward pass of ``BartModel`` will create decoder inputs (using the helper function ``transformers.modeling_bart._prepare_bart_decoder_inputs``) if they are not passed. This is different than some other modeling APIs.
- Model predictions are intended to be identical to the original implementation. This only works, however, if the string you pass to ``fairseq.encode`` starts with a space.
-``BartForConditionalGeneration.generate`` should be used for conditional generation tasks like summarization, see the example in that docstrings
- Models that load the ``"bart-large-cnn"`` weights will not have a ``mask_token_id``, or be able to perform mask filling tasks.
The DistilBERT model was proposed in the blog post `Smaller, faster, cheaper, lighter: Introducing DistilBERT, a
distilled version of BERT <https://medium.com/huggingface/distilbert-8cf3380435b5>`__, and the paper `DistilBERT, a
distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__. DistilBERT is a
small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than
`bert-base-uncased`, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language
understanding benchmark.
The abstract from the paper is the following:
@@ -14,84 +17,126 @@ The abstract from the paper is the following:
operating these large models in on-the-edge and/or under constrained computational training or inference budgets
remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation
model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger
counterparts. While most prior work investigated the use of distillation for building task-specific models, we
leverage knowledge distillation during the pre-training phase and show that it is possible to reduce the size of a
BERT model by 40%, while retaining 97% of its language understanding capabilities and being 60% faster. To leverage
the inductive biases learned by larger models during pre-training, we introduce a triple loss combining language
modeling, distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train
and we demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative
on-device study.*
counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage
knowledge distillation during the pretraining phase and show that it is possible to reduce the size of a BERT model by
40%, while retaining 97% of its language understanding capabilities and being 60% faster. To leverage the inductive
biases learned by larger models during pretraining, we introduce a triple loss combining language modeling,
distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train and we
demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative on-device
study.*
Tips:
- DistilBert doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token `tokenizer.sep_token` (or `[SEP]`)
- DistilBert doesn't have options to select the input positions (`position_ids` input). This could be added if necessary though, just let's us know if you need this option.
- DistilBERT doesn't have :obj:`token_type_ids`, you don't need to indicate which token belongs to which segment. Just
separate your segments with the separation token :obj:`tokenizer.sep_token` (or :obj:`[SEP]`).
- DistilBERT doesn't have options to select the input positions (:obj:`position_ids` input). This could be added if
necessary though, just let us know if you need this option.
OpenAI GPT model was proposed in `Improving Language Understanding by Generative Pre-Training<https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf>`__
by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. It's a causal (unidirectional)
transformer pre-trained using language modeling on a large corpus will long range dependencies, the Toronto Book Corpus.
OpenAI GPT model was proposed in `Improving Language Understanding by Generative Pre-Training
""" PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow."""
It builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining
objective and training with much larger mini-batches and learning rates.
The RoBERTa model was proposed in `RoBERTa: A Robustly Optimized BERT Pretraining Approach
<https://arxiv.org/abs/1907.11692>`_ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer
Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. It is based on Google's BERT model released in 2018.
It builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining objective and training with
much larger mini-batches and learning rates.
The abstract from the paper is the following:
@@ -14,85 +17,139 @@ The abstract from the paper is the following:
approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes,
and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication
study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and
training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of
every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These
results highlight the importance of previously overlooked design choices, and raise questions about the source
of recently reported improvements. We release our models and code.*
training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every
model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results
highlight the importance of previously overlooked design choices, and raise questions about the source of recently
reported improvements. We release our models and code.*
Tips:
- This implementation is the same as :class:`~transformers.BertModel` with a tiny embeddings tweak as well as a
setup for Roberta pretrained models.
- This implementation is the same as :class:`~transformers.BertModel` with a tiny embeddings tweak as well as a setup
for Roberta pretrained models.
- RoBERTa has the same architecture as BERT, but uses a byte-level BPE as a tokenizer (same as GPT-2) and uses a
different pre-training scheme.
- RoBERTa doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token `tokenizer.sep_token` (or `</s>`)
-`Camembert <./camembert.html>`__ is a wrapper around RoBERTa. Refer to this page for usage examples.
different pretraining scheme.
- RoBERTa doesn't have :obj:`token_type_ids`, you don't need to indicate which token belongs to which segment. Just
separate your segments with the separation token :obj:`tokenizer.sep_token` (or :obj:`</s>`)
-:doc:`CamemBERT <camembert>` is a wrapper around RoBERTa. Refer to this page for usage examples.
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_.
The T5 model was presented in `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer <https://arxiv.org/pdf/1910.10683.pdf>`_ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu in
*Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice.
In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format.
Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks.
By combining the insights from our exploration with scale and our new "Colossal Clean Crawled Corpus", we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.*
The T5 model was presented in `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
<https://arxiv.org/pdf/1910.10683.pdf>`_ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang,
Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
The Authors' code can be found `here <https://github.com/google-research/text-to-text-transfer-transformer>`_ .
The abstract from the paper is the following:
*Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream
task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning
has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of
transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a
text-to-text format. Our systematic study compares pretraining objectives, architectures, unlabeled datasets, transfer
approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration
with scale and our new "Colossal Clean Crawled Corpus", we achieve state-of-the-art results on many benchmarks covering
summarization, question answering, text classification, and more. To facilitate future work on transfer learning for
NLP, we release our dataset, pre-trained models, and code.*
Tips:
- T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which
each task is converted into a text-to-text format. T5 works well on a variety of tasks out-of-the-box by prepending a
different prefix to the input corresponding to each task, e.g., for translation: *translate English to German: ...*,
for summarization: *summarize: ...*.
For more information about which prefix to use, it is easiest to look into Appendix D of the `paper
<https://arxiv.org/pdf/1910.10683.pdf>`__. - For sequence-to-sequence generation, it is recommended to use
:obj:`T5ForConditionalGeneration.generate()``. This method takes care of feeding the encoded input via
cross-attention layers to the decoder and auto-regressively generates the decoder output. - T5 uses relative scalar
embeddings. Encoder input padding can be done on the left and on the right.
The original code can be found `here <https://github.com/google-research/text-to-text-transfer-transformer>`__.
Training
~~~~~~~~~~~~~~~~~~~~
T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher forcing.
This means that for training we always need an input sequence and a target sequence.
The input sequence is fed to the model using ``input_ids``. The target sequence is shifted to the right, *i.e.* perprended by a start-sequence token and fed to the decoder using the `decoder_input_ids`. In teacher-forcing style, the target sequence is then appended by the EOS token and corresponds to the ``lm_labels``. The PAD token is hereby used as the start-sequence token.
T5 can be trained / fine-tuned both in a supervised and unsupervised fashion.
T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher
forcing. This means that for training we always need an input sequence and a target sequence. The input sequence is fed
to the model using :obj:`input_ids``. The target sequence is shifted to the right, i.e., prepended by a start-sequence
token and fed to the decoder using the :obj:`decoder_input_ids`. In teacher-forcing style, the target sequence is then
appended by the EOS token and corresponds to the :obj:`labels`. The PAD token is hereby used as the start-sequence
token. T5 can be trained / fine-tuned both in a supervised and unsupervised fashion.
- Unsupervised denoising training
In this setup spans of the input sequence are masked by so-called sentinel tokens (*a.k.a* unique mask tokens)
and the output sequence is formed as a concatenation of the same sentinel tokens and the *real* masked tokens.
Each sentinel tokens represents a unique mask token for this sentence and should start with ``<extra_id_1>``, ``<extrac_id_2>``, ... up to ``<extra_id_100>``. As a default 100 sentinel tokens are available in ``T5Tokenizer``.
*E.g.* the sentence "The cute dog walks in the park" with the masks put on "cute dog" and "the" should be processed as follows:
::
In this setup spans of the input sequence are masked by so-called sentinel tokens (*a.k.a* unique mask tokens) and
the output sequence is formed as a concatenation of the same sentinel tokens and the *real* masked tokens. Each
sentinel token represents a unique mask token for this sentence and should start with :obj:`<extra_id_0>`,
:obj:`<extra_id_1>`, ... up to :obj:`<extra_id_99>`. As a default, 100 sentinel tokens are available in
:class:`~transformers.T5Tokenizer`.
input_ids = tokenizer.encode('The <extra_id_1> walks in <extra_id_2> park', return_tensors='pt')
lm_labels = tokenizer.encode('<extra_id_1> cute dog <extra_id_2> the <extra_id_3> </s>', return_tensors='pt')
For instance, the sentence "The cute dog walks in the park" with the masks put on "cute dog" and "the" should be
processed as follows:
..code-block::
input_ids = tokenizer('The <extra_id_0> walks in <extra_id_1> park', return_tensors='pt').input_ids
labels = tokenizer('<extra_id_0> cute dog <extra_id_1> the <extra_id_2>', return_tensors='pt').input_ids
# the forward function automatically creates the correct decoder_input_ids
model(input_ids=input_ids, lm_labels=lm_labels)
loss = model(input_ids=input_ids, labels=labels).loss
- Supervised training
In this setup the input sequence and output sequence are standard sequence to sequence input output mapping.
In translation, *e.g.* the input sequence "The house is wonderful." and output sequence "Das Haus ist wunderbar." should
be processed as follows:
::
input_ids = tokenizer.encode('translate English to German: The house is wonderful. </s>', return_tensors='pt')
lm_labels = tokenizer.encode('Das Haus ist wunderbar. </s>', return_tensors='pt')
In this setup the input sequence and output sequence are standard sequence-to-sequence input output mapping. In
translation, for instance with the input sequence "The house is wonderful." and output sequence "Das Haus ist
wunderbar.", the sentences should be processed as follows:
..code-block::
input_ids = tokenizer('translate English to German: The house is wonderful.', return_tensors='pt').input_ids
labels = tokenizer('Das Haus ist wunderbar.', return_tensors='pt').input_ids
# the forward function automatically creates the correct decoder_input_ids
model(input_ids=input_ids, lm_labels=lm_labels)
Tips
~~~~~~~~~~~~~~~~~~~~
- T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised
and supervised tasks and for which each task is converted into a text-to-text format.
T5 works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g.: for translation: *translate English to German: ..., summarize: ...*.
For more information about which prefix to use, it is easiest to look into Appendix D of the `paper <https://arxiv.org/pdf/1910.10683.pdf>`_ .
- For sequence to sequence generation, it is recommended to use ``T5ForConditionalGeneration.generate()``. The method takes care of feeding the encoded input via cross-attention layers to the decoder and auto-regressively generates the decoder output.
- T5 uses relative scalar embeddings. Encoder input padding can be done on the left and on the right.
loss = model(input_ids=input_ids, labels=labels).loss
Starting with `v2.2.2`, you can now upload and share your fine-tuned models with the community, using the <abbr title="Command-line interface">CLI</abbr> that's built-in to the library.
**First, create an account on [https://huggingface.co/join](https://huggingface.co/join)**. Optionally, join an existing organization or create a new one. Then:
```shell
transformers-cli login
# log in using the same credentials as on huggingface.co
# (you can optionally override its filename, which can be nested inside a folder)
```
If you want your model to be namespaced by your organization name rather than your username, add the following flag to any command:
```shell
--organization organization_name
```
Your model will then be accessible through its identifier, a concatenation of your username (or organization name) and the folder name above:
```python
"username/pretrained_model"
# or if an org:
"organization_name/pretrained_model"
```
**Please add a README.md model card** to the repo under `model_cards/` with: model description, training params (dataset, preprocessing, hardware used, hyperparameters), evaluation results, intended uses & limitations, etc.
Your model now has a page on huggingface.co/models 🔥
XLM-RoBERTa was trained on 2.5TB of newly created clean CommonCrawl data in 100 languages. It provides strong gains
over previously released multi-lingual models like mBERT or XLM on downstream tasks like classification, sequence
labeling and question answering.
Two XLM-RoBERTa checkpoints can be used for multi-lingual tasks:
-``xlm-roberta-base`` (Masked language modeling, 100 languages)
-``xlm-roberta-large`` (Masked language modeling, 100 languages)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.