Compare commits

...

1629 Commits

Author SHA1 Message Date
Lysandre
eb0e0ce2ad Release: v3.4.0 2020-10-20 16:22:26 +02:00
Patrick von Platen
0264048660 Update README.md 2020-10-20 16:13:49 +02:00
Patrick von Platen
ffd675b42c add summary (#7927) 2020-10-20 10:11:02 -04:00
Lysandre Debut
5547b40b13 labels and decoder_input_ids to Glossary (#7906)
* labels and decoder_input_ids to Glossary

* Formatting fixes

* Update docs/source/glossary.rst

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* sam's comments

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-10-20 09:50:47 -04:00
Patrick von Platen
f3312515b7 Add note for WikiSplit 2020-10-20 15:42:29 +02:00
Patrick von Platen
0724c0f3a2 Fix EncoderDecoder WikiSplit Example 2020-10-20 15:13:22 +02:00
Stas Bekman
ca37db0559 [flax] fix repo_check (#7914)
* [flax] fix repo_check

Unless, this is actually a problem, this adds `modeling_flax_utils` to ignore list. otherwise currently it expects to have a 'tests/test_modeling_flax_utils.py' for it.
for context please see: https://github.com/huggingface/transformers/pull/3722#issuecomment-712360415

* fix 2 more issues

* merge https://github.com/huggingface/transformers/pull/7919/
2020-10-20 07:55:40 -04:00
Shai Erera
048dd6cf10 Fix bug in _sorted_checkpoints (#7880)
I'm using transformers 3.3.1 and run a training script with `--save_total_limit 3`. I hit the exception below, and after debugging the code found that it wrongly tries to index into the `best_model_checkpoint`'s *str* rather than the `sorted_checkpoints` array. When running without the fix I got this exception:

```
Traceback (most recent call last):
  File "/<HOME>/.conda/envs/transformers/lib/python3.7/site-packages/transformers/trainer.py", line 921, in _save_training
    self._rotate_checkpoints(use_mtime=True)
  File "/<HOME>/.conda/envs/transformers/lib/python3.7/site-packages/transformers/trainer.py", line 1283, in _rotate_checkpoints
    checkpoints_sorted = self._sorted_checkpoints(use_mtime=use_mtime)
  File "/<HOME>/.conda/envs/transformers/lib/python3.7/site-packages/transformers/trainer.py", line 1274, in _sorted_checkpoints
    checkpoints_sorted[best_model_index],
TypeError: 'str' object does not support item assignment
```
2020-10-20 07:50:47 -04:00
Sylvain Gugger
6d4f8bd02a Add Flax dummy objects (#7918) 2020-10-20 07:45:48 -04:00
Stas Bekman
3e31e7f956 [testing] rename skip targets + docs (#7863)
* rename skip targets + docs

* fix quotes

* style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* small improvements

* fix

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-20 04:39:13 -04:00
Patrick von Platen
c912ba5f69 [EncoderDecoder] Fix Typo (#7915)
* fix encoder decoder models

* add .gitignore
2020-10-19 22:02:42 +02:00
Bram Vanroy
55bcd0cb59 Raise error when using AMP on non-CUDA device (#7869)
* Raise error when using AMP on non-CUDA device

* make style

* make style
2020-10-19 15:59:30 -04:00
Patrick von Platen
e3d2bee8d0 fix t5 training docstring (#7911) 2020-10-19 21:49:47 +02:00
Ayub Subhaniya
df1ddcedf2 decoder_config used before intialisation (#7903)
Seeing error when sending `decoder_config` as a parameter while initializing a encoder-decoder model from pretrained. 
fixed "UnboundLocalError: local variable 'decoder_config' referenced before assignment"
2020-10-19 19:48:49 +02:00
Quentin Lhoest
033f29c625 Allow Custom Dataset in RAG Retriever (#7763)
* add CustomHFIndex

* typo in config

* update tests

* add custom dataset example

* clean script

* update test data

* minor in test

* docs

* docs

* style

* fix imports

* allow to pass the indexed dataset directly

* update tests

* use multiset DPR

* address thom and patrick's comments

* style

* update dpr tokenizer

* add output_dir flag in use_own_knowledge_dataset.py

* allow custom datasets in examples/rag/finetune.py

* add test for custom dataset in distributed rag retriever
2020-10-19 19:42:45 +02:00
Julien Rossi
a09fe140c1 Trainer with Iterable Dataset (#7858)
* fix 5990

* accomodate iterable dataset without predefined length
* set it as 1 use case: provide max_steps, and NO num_epochs
* Is a merge of master and PR 5995

* fix trainer test under TF

* fix only for torch
* TF trainer untouched
* trainer tests are skipped when no torch

* address comments

* fix quality checks

* remove torch.dataset from test_trainer

* unnecessary inheritance
* RegressionDataset implements all needed methods __len__ and __getitem__

* fix quality checks

* restore RegressionDataset

* was wrongly under is_torch_available()
2020-10-19 11:57:39 -04:00
Weizhen
2422cda01b ProphetNet (#7157)
* add new model prophetnet

prophetnet modified

modify codes as suggested v1

add prophetnet test files

* still bugs, because of changed output formats of encoder and decoder

* move prophetnet into the latest version

* clean integration tests

* clean tokenizers

* add xlm config to init

* correct typo in init

* further refactoring

* continue refactor

* save parallel

* add decoder_attention_mask

* fix use_cache vs. past_key_values

* fix common tests

* change decoder output logits

* fix xlm tests

* make common tests pass

* change model architecture

* add tokenizer tests

* finalize model structure

* no weight mapping

* correct n-gram stream attention mask as discussed with qweizhen

* remove unused import

* fix index.rst

* fix tests

* delete unnecessary code

* add fast integration test

* rename weights

* final weight remapping

* save intermediate

* Descriptions for Prophetnet Config File

* finish all models

* finish new model outputs

* delete unnecessary files

* refactor encoder layer

* add dummy docs

* code quality

* fix tests

* add model pages to doctree

* further refactor

* more refactor, more tests

* finish code refactor and tests

* remove unnecessary files

* further clean up

* add docstring template

* finish tokenizer doc

* finish prophetnet

* fix copies

* fix typos

* fix tf tests

* fix fp16

* fix tf test 2nd try

* fix code quality

* add test for each model

* merge new tests to branch

* Update model_cards/microsoft/prophetnet-large-uncased-cnndm/README.md

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update model_cards/microsoft/prophetnet-large-uncased-cnndm/README.md

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update src/transformers/modeling_prophetnet.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update utils/check_repo.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* apply sams and sylvains comments

* make style

* remove unnecessary code

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/configuration_prophetnet.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* implement lysandres comments

* correct docs

* fix isort

* fix tokenizers

* fix copies

Co-authored-by: weizhen <weizhen@mail.ustc.edu.cn>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-19 17:36:09 +02:00
Funtowicz Morgan
8f8f8d99fc Integrate Bert-like model on Flax runtime. (#3722)
* WIP flax bert

* Initial commit Bert Jax/Flax implementation.

* Embeddings working and equivalent to PyTorch.

* Move embeddings in its own module BertEmbeddings

* Added jax.jit annotation on forward call

* BertEncoder on par with PyTorch ! :D

* Add BertPooler on par with PyTorch !!

* Working Jax+Flax implementation of BertModel with < 1e-5 differences on the last layer.

* Fix pooled output to take only the first token of the sequence.

* Refactoring to use BertConfig from transformers.

* Renamed FXBertModel to FlaxBertModel

* Model is now initialized in FlaxBertModel constructor and reused.

* WIP JaxPreTrainedModel

* Cleaning up the code of FlaxBertModel

* Added ability to load Flax model saved through save_pretrained()

* Added ability to convert Pytorch Bert model to FlaxBert

* FlaxBert can now load every Pytorch Bert model with on-the-fly conversion

* Fix hardcoded shape values in conversion scripts.

* Improve the way we handle LayerNorm conversion from PyTorch to Flax.

* Added positional embeddings as parameter of BertModel with default to np.arange.

* Let's roll FlaxRoberta !

* Fix missing position_ids parameters on predict for Bert

* Flax backend now supports batched inputs

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Make it possible to load msgpacked model on convert from pytorch in last resort.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Moved save_pretrained to Jax base class along with more constructor parameters.

* Use specialized, model dependent conversion functio.

* Expose `is_flax_available` in file_utils.

* Added unittest for Flax models.

* Added run_tests_flax to the CI.

* Introduce FlaxAutoModel

* Added more unittests

* Flax model reference the _MODEL_ARCHIVE_MAP from PyTorch model.

* Addressing review comments.

* Expose seed in both Bert and Roberta

* Fix typo suggested by @stefan-it

Co-Authored-By: Stefan Schweter <stefan@schweter.it>

* Attempt to make style

* Attempt to make style in tests too

* Added jax & jaxlib to the flax optional dependencies.

* Attempt to fix flake8 warnings ...

* Redo black again and again

* When black and flake8 fight each other for a space ... 💥 💥 💥

* Try removing trailing comma to make both black and flake happy!

* Fix invalid is_<framework>_available call, thanks @LysandreJik 🎉

* Fix another invalid import in flax_roberta test

* Bump and pin flax release to 0.1.0.

* Make flake8 happy, remove unused jax import

* Change the type of the catch for msgpack.

* Remove unused import.

* Put seed as optional constructor parameter.

* trigger ci again

* Fix too much parameters in BertAttention.

* Formatting.

* Simplify Flax unittests to avoid machine crashes.

* Fix invalid number of arguments when raising issue for an unknown model.

* Address @bastings comment in PR, moving jax.jit decorated outside of __call__

* Fix incorrect path to require_flax/require_pytorch functions.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Attempt to make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct rebasing of circle-ci dependencies

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix import sorting.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix unused imports.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Again import sorting...

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Installing missing nlp dependency for flax unittests.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix laoding of model for Flax implementations.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* jit the inner function call to make JAX-compatible

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Format !

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Flake one more time 🎶

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Rewrites BERT in Flax to the new Linen API (#7211)

* Rewrite Flax HuggingFace PR to Linen

* Some fixes

* Fix tests

* Fix CI with change of name of nlp (#7054)

* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last

* Expose `is_flax_available` in file_utils.

* Added run_tests_flax to the CI.

* Attempt to make style

* trigger ci again

* Fix import sorting.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Revert "Rewrites BERT in Flax to the new Linen API (#7211)"

This reverts commit 23703a5eb3364e26a1cbc3ee34b4710d86a674b0.

* Remove jnp.lax references

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Reintroduce Linen changes ...

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use jax native's gelu function.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Renaming BertModel to BertModule to highlight the fact this is the Flax Module object.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Rewrite FlaxAutoModel test to not rely on pretrained_model_archive_map

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove unused variable in BertModule.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove unused variable in BertModule again

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Attempt to have is_flax_available working again.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Introduce JAX TensorType

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve ImportError message when trying to convert to various TensorType format.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Makes Flax model jittable.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Ensure flax models are jittable in unittests.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remove unused imports.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Ensure jax imports are guarded behind is_flax_available.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style again

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style again again

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style again again again

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Update src/transformers/file_utils.py

Co-authored-by: Marc van Zee <marcvanzee@gmail.com>

* Bump flax to it's latest version

Co-authored-by: Marc van Zee <marcvanzee@gmail.com>

* Bump jax version to at least 0.2.0

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Update the unittest to use TensorType.JAX

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* isort import in tests.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Match new flax parameters name "params"

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove unused imports.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Add flax models to transformers __init__

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Attempt to address all CI related comments.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct circle.yml indent.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct circle.yml indent (2)

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove coverage from flax tests

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Addressing many naming suggestions from comments

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Simplify for loop logic to interate over layers in FlaxBertLayerCollection

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* use f-string syntax for formatting logs.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use config property from FlaxPreTrainedModel.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* use "cls_token" instead of "first_token" variable name.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* use "hidden_state" instead of "h" variable name.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct class reference in docstring to link to Flax related modules.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added HF + Google Flax team copyright.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make Roberta independent from Bert

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Move activation functions to flax_utils.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Move activation functions to flax_utils for bert.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added docstring for BERT

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Update import for Bert and Roberta tokenizers

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* fix-copies

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct FlaxRobertaLayer to match PyTorch.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use the same store_artifact for flax unittest

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make sure gradient are disabled only locally for flax unittest using torch equivalence.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use relative imports

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-19 09:55:41 -04:00
Lalit Pagaria
0193c8290d [RAG] Propagating of n_docs as parameter to all RagModel's related functions (#7891)
* Propagating n_docs as parameter to all RagModel's related functions that defaults to self.config.n_docs

* Making n_docs parameter's default value to None in marginalize function

* Fixing code quality issues

* Handle the special case when generator is of T5PreTrainedModel instance type. T5PreTrainedModel do not have n_docs as parameter

* T5PreTrainedModel do not have n_docs as parameter

* Addressing review comment

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Correcting comment by addressing review comment

* Adding assert statement verifying that n_docs is correctly set. n_docs should be the same for both retriever and generator.

* Fixing flake8 reported issue

* Correcting test datasets for rag

* Using doc_scores instead of context_input_ids to check assert as in RagSequenceForGeneration context_input_ids can be null

* doc_scores second dimension have number of retrieved docs

* Changing assert comment

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-10-19 15:15:52 +02:00
Terencio Agozzino
7e6b6fbec9 style: fix typo in the README (#7882) 2020-10-19 08:43:25 -04:00
Stas Bekman
805a202e1a [CIs] report slow tests add --durations=0 to some pytest jobs (#7884)
* add --durations=50 to some pytest runs

* report all tests
2020-10-19 08:23:14 -04:00
Stas Bekman
4eb61f8e88 remove USE_CUDA (#7861) 2020-10-19 07:08:34 -04:00
Jordi Mas
ea1507fb45 Julibert model card (#7868)
* Julibert model card

* Fix text
2020-10-19 06:50:52 -04:00
Terencio Agozzino
7c44c864a5 style: fix typo (#7883) 2020-10-19 06:14:53 -04:00
ayushtiku5
776e82d2be Add support to provide initial tokens to decoder of encoder-decoder type models (#7577)
* Add support to provide initial tokens for decoding

* Add docstring

* improve code quality

* code reformat

* code reformat

* minor change

* remove appending decoder start token

Co-authored-by: Ayush Jain <a.jain@sprinklr.com>
2020-10-19 08:56:08 +02:00
AndreaSottana
406a49dfe4 Fix small type hinting error (#7820)
* Fix small type hinting error

* Update tokenization_utils_base.py

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-19 08:14:29 +02:00
Sam Shleifer
b86a71ea38 [tests] fix slow bart cnn test, faster marian tests (#7888) 2020-10-18 20:18:08 -04:00
Thomas Wolf
ba8c4d0ac0 [Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659)
* splitting fast and slow tokenizers [WIP]

* [WIP] splitting sentencepiece and tokenizers dependencies

* update dummy objects

* add name_or_path to models and tokenizers

* prefix added to file names

* prefix

* styling + quality

* spliting all the tokenizer files - sorting sentencepiece based ones

* update tokenizer version up to 0.9.0

* remove hard dependency on sentencepiece 🎉

* and removed hard dependency on tokenizers 🎉

* update conversion script

* update missing models

* fixing tests

* move test_tokenization_fast to main tokenization tests - fix bugs

* bump up tokenizers

* fix bert_generation

* update ad fix several tokenizers

* keep sentencepiece in deps for now

* fix funnel and deberta tests

* fix fsmt

* fix marian tests

* fix layoutlm

* fix squeezebert and gpt2

* fix T5 tokenization

* fix xlnet tests

* style

* fix mbart

* bump up tokenizers to 0.9.2

* fix model tests

* fix tf models

* fix seq2seq examples

* fix tests without sentencepiece

* fix slow => fast  conversion without sentencepiece

* update auto and bert generation tests

* fix mbart tests

* fix auto and common test without tokenizers

* fix tests without tokenizers

* clean up tests lighten up when tokenizers + sentencepiece are both off

* style quality and tests fixing

* add sentencepiece to doc/examples reqs

* leave sentencepiece on for now

* style quality split hebert and fix pegasus

* WIP Herbert fast

* add sample_text_no_unicode and fix hebert tokenization

* skip FSMT example test for now

* fix style

* fix fsmt in example tests

* update following Lysandre and Sylvain's comments

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-18 20:51:24 +02:00
Raza Habib
c65863ce53 Remove duplicated mish activation function (#7856)
* Remove duplicated mish activation function

* Update activations.py
2020-10-17 17:31:53 -04:00
Patrick von Platen
f5c45a19e6 Fix Rag example docstring (#7872)
* fix rag examples

* fix token generate example
2020-10-17 22:46:47 +02:00
Stas Bekman
9f7b2b2432 [s2s testing] turn all to unittests, use auto-delete temp dirs (#7859) 2020-10-17 14:33:21 -04:00
Patrick von Platen
dc552b9b70 Fix typo in sequence model card 2020-10-16 16:05:06 +02:00
Stas Bekman
1652ddad35 [seq2seq testing] improve readability (#7845) 2020-10-16 09:05:29 -04:00
Quentin Lhoest
466115b279 Fix missing reference titles in retrieval evaluation of RAG (#7817) 2020-10-16 10:15:49 +02:00
Stas Bekman
464b53f5e4 [testing] disable FutureWarning in examples tests (#7842)
* [testing] disable FutureWarning in examples tests

same as tests/conftest.py, we can't resolve those warning, so turn the noise off.

* fix
2020-10-16 03:35:39 -04:00
Sylvain Gugger
eb186bc14e Small fixes to HP search (#7839) 2020-10-16 03:23:44 -04:00
Stas Bekman
d8ca57d2ce fix/hide warnings (#7837)
s
2020-10-16 03:19:51 -04:00
vblagoje
c6e865ac2b Remove masked_lm_labels from returned dictionary (#7818) 2020-10-16 03:12:10 -04:00
Sam Shleifer
96e47d9229 [cleanup] assign todos, faster bart-cnn test (#7835)
* 2 beam output

* unassign/remove TODOs

* remove one more
2020-10-16 03:11:18 -04:00
rmroczkowski
7b13bd01df Herbert polish model (#7798)
* HerBERT transformer model for Polish language understanding.

* HerbertTokenizerFast generated with HerbertConverter

* Herbert base and large model cards

* Herbert model cards with tags

* Herbert tensorflow models

* Herbert model tests based on Bert test suit

* src/transformers/tokenization_herbert.py edited online with Bitbucket

* src/transformers/tokenization_herbert.py edited online with Bitbucket

* docs/source/model_doc/herbert.rst edited online with Bitbucket

* Herbert tokenizer tests and bug fixes

* src/transformers/configuration_herbert.py edited online with Bitbucket

* Copyrights and tests for TFHerbertModel

* model_cards/allegro/herbert-base-cased/README.md edited online with Bitbucket

* model_cards/allegro/herbert-large-cased/README.md edited online with Bitbucket

* Bug fixes after testing

* Reformat modified_only_fixup

* Proper order of configuration

* Herbert proper documentation formatting

* Formatting with make modified_only_fixup

* Dummies fixed

* Adding missing models to documentation

* Removing HerBERT model as it is a simple extension of BERT

* Update model_cards/allegro/herbert-base-cased/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Update model_cards/allegro/herbert-large-cased/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* HerbertTokenizer deprecated configuration removed

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-16 03:06:51 -04:00
Julien Chaumond
99898dcd27 [Pipelines] Fix links to model lists (#7826) 2020-10-16 02:57:02 -04:00
Lysandre Debut
52c9e84285 Fix DeBERTa integration tests (#7729) 2020-10-16 02:49:13 -04:00
Stas Bekman
2255c2c7a0 [seq2seq] get_git_info fails gracefully (#7843)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-10-16 00:22:43 -04:00
Katarina Slama
dfa4c26bc0 Typo and fix the input of labels to cross_entropy (#7841)
The current version caused some errors. The changes fixed it for me. Hope this is helpful!
2020-10-15 19:36:31 -04:00
Stas Bekman
a5a8eeb772 fix DeprecationWarning (#7834)
in `tests/test_utils_check_copies.py` I was getting intermittently:
```
utils/check_copies.py:52
  /mnt/nvme1/code/transformers-comet/utils/check_copies.py:52: DeprecationWarning: invalid escape sequence \s
    while line_index < len(lines) and re.search(f"^{indent}(class|def)\s+{name}", lines[line_index]) is None:
```
So this should fix it.
2020-10-15 16:21:09 -04:00
David S. Lim
9c71cca316 model card for bert-base-NER (#7799)
* model card for bert-base-NER

* add meta data up top

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-15 21:55:00 +02:00
Stas Bekman
4dbca50022 fix wandb/comet problems (#7830)
* fix wandb/comet problems

* simplify

* Update src/transformers/integrations.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-15 15:23:24 -04:00
Julien Chaumond
e7aa64838c [model_cards] facebook/bart-large-mnli: register ZSC for the inference API
cc @Narsil @mfuntowicz @joeddav
2020-10-15 19:02:10 +02:00
Sylvain Gugger
2ce3ddab2d Small fixes to NotebookProgressCallback (#7813) 2020-10-15 10:30:34 -04:00
Julien Chaumond
6f45dd2fac [model_cards] Fix yaml for Facebook/wmt19-*
see d99ed7ad61
2020-10-15 16:14:08 +02:00
Julien Chaumond
d99ed7ad61 [model_cards] Facebook: add thumbnail 2020-10-15 12:53:29 +02:00
Lysandre
2485b8b0ac Set XLA example time to 500s 2020-10-15 12:34:29 +02:00
Lysandre
2dba7d5702 Notebook catch all errors 2020-10-15 12:21:32 +02:00
Nicolas Patry
9ade8e7499 Upgrading TFAutoModelWithLMHead to (#7730)
- TFAutoModelForCausalLM
- TFAutoModelForMaskedLM
- TFAutoModelForSeq2SeqLM

as per deprecation warning. No tests as it simply removes current
warnings from tests.
2020-10-15 05:26:08 -04:00
Sylvain Gugger
62b5622e6b Add specific notebook ProgressCalback (#7793) 2020-10-15 05:05:08 -04:00
Nicolas Patry
0911b6bd86 Improving Pipelines by defaulting to framework='tf' when pytorch seems unavailable. (#7728)
* Improving Pipelines by defaulting to framework='tf' when

pytorch seems unavailable.

* Actually changing the default resolution order to account for model
defaults

Adding a new tests for each pipeline to check that pipeline(task) works
too without manually adding the framework too.
2020-10-15 09:42:07 +02:00
Julien Plu
3a134f7c67 Fix TF savedmodel in Roberta (#7795)
* Remove wrong parameter.

* Same in Longformer
2020-10-14 23:48:50 +02:00
Nils Reimers
3032de9369 Model Card (#7752)
* Create README.md

* Update model_cards/sentence-transformers/LaBSE/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-14 13:30:58 -04:00
sarahlintang
3fdbeba83c [model_cards] sarahlintang/IndoBERT (#7748)
* Create README.md

* Update model_cards/sarahlintang/IndoBERT/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-14 13:10:31 -04:00
Julien Chaumond
ba654270b3 [model_cards] rename to correct model name 2020-10-14 19:02:48 +02:00
Zhuosheng Zhang
08978487e7 Create README.md (#7722) 2020-10-14 12:56:12 -04:00
Sagor Sarker
3557509127 added evaluation results for classification task (#7790) 2020-10-14 12:50:43 -04:00
Sylvain Gugger
bb9559a7f9 Don't use store_xxx on optional bools (#7786)
* Don't use `store_xxx` on optional bools

* Refine test

* Refine test
2020-10-14 12:05:02 -04:00
Sylvain Gugger
a1d1b332d0 Add predict step accumulation (#7767)
* Add eval_accumulation_step and clean distributed eval

* Add TPU test

* Add TPU stuff

* Fix arg name

* Fix Seq2SeqTrainer

* Fix total_size

* Update src/transformers/trainer_pt_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Doc and add test to TPU

* Add unit test

* Adapt name

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-14 11:41:45 -04:00
Sam Shleifer
8feb0cc967 fix examples/rag imports, tests (#7712) 2020-10-14 11:35:00 -04:00
XiaoqiJiao
890e790e16 [model_cards] TinyBERT (HUAWEI Noah's Ark Lab) (#7775) 2020-10-14 09:31:01 -04:00
Jonathan Chang
121dd4332b Add batch inferencing support for GPT2LMHeadModel (#7552)
* Add support for gpt2 batch inferencing

* add test

* remove typo

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-10-14 13:40:24 +02:00
Quentin Lhoest
0c64b18840 Fix bert position ids in DPR convert script (#7776)
* fix bert position ids in DPR convert script

* style
2020-10-14 05:30:02 -04:00
Sylvain Gugger
7968051aba Fix typo 2020-10-13 17:30:46 -04:00
Sam Shleifer
2977bd528f Faster pegasus tokenization test with reduced data size (#7762) 2020-10-13 16:22:29 -04:00
François Lagunas
2d6e2ad4fa Adding optional trial argument to model_init (#7759)
* Adding optional trial argument to model_init

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-13 17:07:02 +02:00
Tiger
7e73c12805 fixed lots of typos. (#7758) 2020-10-13 10:00:20 -04:00
Noam Wies
8cb4ecca25 Avoid unnecessary DDP synchronization when gradient_accumulation_steps > 1 (#7742)
* use DDP no_sync when possible

* fix is_nlp_available addition mistake

* reformat trainer.py

* reformat trainer.py

* drop support for pytorch < 1.2

* return support for pytorch < 1.2
2020-10-13 09:46:44 -04:00
Lysandre Debut
52f7d74398 Do not softmax when num_labels==1 (#7726)
* Do not softmax when num_labels==1

* Update src/transformers/pipelines.py

Co-authored-by: Funtowicz Morgan <mfuntowicz@users.noreply.github.com>

Co-authored-by: Funtowicz Morgan <mfuntowicz@users.noreply.github.com>
2020-10-13 09:42:27 -04:00
Patrick von Platen
82b09a8481 [Rag] Fix loading of pretrained Rag Tokenizer (#7756)
* fix rag

* Update tokenizer save_pretrained

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-10-13 14:34:22 +02:00
Patrick von Platen
2d4e928d97 Update PULL_REQUEST_TEMPLATE.md
Putting my name on a couple more issues to directly redirect them to me
2020-10-13 12:18:31 +02:00
Felipe Curti
dcba9ee03b Gpt1 for sequence classification (#7683)
* Add Documentation for GPT-1 Classification

* Add GPT-1 with Classification head

* Add tests for GPT-1 Classification

* Add GPT-1 For Classification to auto models

* Remove authorized missing keys, change checkpoint to openai-gpt
2020-10-13 05:06:15 -04:00
Lysandre Debut
f34b4cd1bd ElectraTokenizerFast (#7754) 2020-10-13 04:50:41 -04:00
Sam Shleifer
9c2b2db2cd [marian] Automate Tatoeba-Challenge conversion (#7709) 2020-10-12 12:24:25 -04:00
Alex Combessie
aacac8f708 Add license info to nlptown/bert-base-multilingual-uncased-sentiment (#7738) 2020-10-12 11:56:10 -04:00
Lysandre Debut
1f1d950b28 Fix #7331 (#7732) 2020-10-12 09:10:52 -04:00
Julien Plu
d9ffb87efb Fix tf text class (#7724)
* Fix test

* fix generic text classification

* fix test

* Fix tests
2020-10-12 08:45:15 -04:00
sgugger
d6175a4268 Fix code quality 2020-10-12 08:22:27 -04:00
Jonathan Chang
1d5ea34f6a Fix trainer callback (#7720)
Fix a bug that happends when subclassing Trainer and
overwriting evaluate() without calling prediciton_loop()
2020-10-12 07:45:12 -04:00
Kelvin
f176e70723 The input training data files (multiple files in glob format). (#7717)
Very often splitting large files to smaller files can prevent tokenizer going out of memory in environment like Colab that does not have swap memory
2020-10-12 07:44:02 -04:00
AndreaSottana
34fcfb44e3 Update tokenization_utils_base.py (#7696)
Minor spelling corrections in docstrings. "information" is uncountable in English and has no plural.
2020-10-12 06:09:20 -04:00
fteufel
2f34bcf3e7 check for tpu availability in save_pretrained (#7699)
Added is_torch_tpu_available() to the condition
for saving a model as xla model. "xla_device"
property of config can also be True on a non-xla
device, when loading a checkpointthat was trained
on xla before.

Resolves #7695
2020-10-12 04:10:17 -04:00
Sylvain Gugger
13c1857718 Fix typo in all model docs (#7714) 2020-10-12 04:06:59 -04:00
Berowne
83086858f8 fixed typo in warning line 207. (#7718)
replace 'men_len' with 'mem_len' to match parameter name
2020-10-12 03:58:58 -04:00
Miguel Victor
03ec02a667 Corrected typo: maked → masked (#7703) 2020-10-11 16:45:00 -04:00
Sam Shleifer
827c519494 [examples] bump pl=0.9.0 (#7053) 2020-10-11 16:39:38 -04:00
Alexandr Maslov
ba4bbd92bc Fix docstring in AutoModel class (#7694) 2020-10-10 21:08:08 -04:00
Andrew Kane
26d5475d4b Added license information for default and distilbert models (#7688) 2020-10-10 03:55:11 -04:00
Sylvain Gugger
c6e18de9f8 Fix flaky test in test_trainer (#7689) 2020-10-09 20:01:15 -04:00
Sylvain Gugger
2c9e83f7b8 Fix title level in Blenderbot doc (#7687) 2020-10-09 19:24:10 -04:00
Doug Blank
9618cd6964 Import integration libraries first (#7650)
* Import intergration libraries first

* isort and black happiness

* flake8 happiness

* Add a test

* Black reformat

* Ignore import order in tests

* A heavy-handed method of disabling comet for tests

* Remove comet_ml tests

* Run black on setup.py
2020-10-09 12:13:22 -04:00
sgugger
4dcc424de3 Complete release instruction 2020-10-09 12:12:03 -04:00
Sylvain Gugger
a3cea6a8cc Better links for models in READMED and doc index (#7680) 2020-10-09 11:17:16 -04:00
Sam Shleifer
0af53b1ef9 Delete extra test file (#7681) 2020-10-09 11:16:35 -04:00
Stas Bekman
b0f05e0c4c [pegasus] Faster tokenizer tests (#7672) 2020-10-09 11:10:32 -04:00
sgugger
bc00b37a0d Revert "Better model links in the README and index"
This reverts commit 76e05518bb.
2020-10-09 10:56:13 -04:00
sgugger
76e05518bb Better model links in the README and index 2020-10-09 10:54:40 -04:00
Julien Plu
9ad830596d Fix dataset cardinality (#7678)
* Fix test

* Fix cardinality issue

* Fix test
2020-10-09 10:38:25 -04:00
Joe Davison
a1ac082879 add license to xlm-roberta-large-xnli card 2020-10-09 09:16:06 -04:00
Funtowicz Morgan
21ed3a6b99 Reintroduce clean_text on BertTokenizer call which was removed by mistake in #4723 (#5749)
* Reintroduce clean_text call which was removed by mistake in #4723

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added unittest for clean_text parameter on Bert tokenizer.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Better unittest name.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Adapt unittest to use untrained tokenizer.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Code quality + update test

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-10-09 08:07:28 -04:00
Noah Trenaman
5668fdb09e Update XLM-RoBERTa details (#7669) 2020-10-09 05:16:58 -04:00
guhur
0578a91300 fix nn.DataParallel compatibility with PyTorch 1.5 (#7671)
The same type of errors as in https://github.com/huggingface/transformers/pull/4300
2020-10-09 05:15:08 -04:00
Sam Shleifer
297233fa92 [s2s] Switch README urls to cdn (#7670) 2020-10-08 21:22:22 -04:00
Sam Shleifer
a1ecc90d6b [pseudo] Switch URLS to CDN (#7661) 2020-10-08 14:12:39 -04:00
Suraj Patil
06a973fd2a [s2s] configure lr_scheduler from command line (#7641) 2020-10-08 13:06:35 -04:00
Lysandre Debut
4a00613c24 Fix RobertaForCausalLM docs (#7642)
* Fix RobertaForCausalLM docs

* Apply review suggestion

Co-authored-by: sgugger <sylvain.gugger@gmail,com>

Co-authored-by: sgugger <sylvain.gugger@gmail,com>
2020-10-08 08:36:00 -04:00
Thomas Wolf
55cb2ee62e Green tests: update torch-hub test dependencies (add protobuf and pin tokenizer 0.9.0-RC2) (#7658)
* pin torch-hub test

* add protobuf dep
2020-10-08 13:21:15 +02:00
Thomas Wolf
9aeacb58ba Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141)
* [WIP] SP tokenizers

* fixing tests for T5

* WIP tokenizers

* serialization

* update T5

* WIP T5 tokenization

* slow to fast conversion script

* Refactoring to move tokenzier implementations inside transformers

* Adding gpt - refactoring - quality

* WIP adding several tokenizers to the fast world

* WIP Roberta - moving implementations

* update to dev4 switch file loading to in-memory loading

* Updating and fixing

* advancing on the tokenizers - updating do_lower_case

* style and quality

* moving forward with tokenizers conversion and tests

* MBart, T5

* dumping the fast version of transformer XL

* Adding to autotokenizers + style/quality

* update init and space_between_special_tokens

* style and quality

* bump up tokenizers version

* add protobuf

* fix pickle Bert JP with Mecab

* fix newly added tokenizers

* style and quality

* fix bert japanese

* fix funnel

* limite tokenizer warning to one occurence

* clean up file

* fix new tokenizers

* fast tokenizers deep tests

* WIP adding all the special fast tests on the new fast tokenizers

* quick fix

* adding more fast tokenizers in the fast tests

* all tokenizers in fast version tested

* Adding BertGenerationFast

* bump up setup.py for CI

* remove BertGenerationFast (too early)

* bump up tokenizers version

* Clean old docstrings

* Typo

* Update following Lysandre comments

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2020-10-08 11:32:16 +02:00
Piero Molino
4d04120c6d Replaced torch.load for loading the pretrained vocab of TransformerXL tokenizer to pickle.load (#6935)
* Replaced torch.load for loading the pretrained vocab of TransformerXL to pickle.load

* Replaced torch.save with pickle.dump when saving the vocabulary

* updating transformer-xl

* uploaded on S3 - compatibility

* fix tests

* style

* Address review comments

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-10-08 10:16:10 +02:00
Sam Shleifer
aba4e22944 [pseudolabels] cleanup markdown table (#7653) 2020-10-07 23:04:18 -04:00
Sam Shleifer
e3e6517355 Fix 3 failing slow bart/blender tests (#7652) 2020-10-07 22:05:03 -04:00
Sam Shleifer
960faaaf28 Blenderbot (#7418)
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-07 19:09:23 -04:00
Blaise Cruz
aee7967fc4 Added model cards for Tagalog BERT models (#7603) 2020-10-07 16:49:20 -04:00
Bobby Donchev
b1c06140f4 Create README.md for IsRoBERTa language model (#7640)
* Create README.md

* Update README.md

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-07 16:46:03 -04:00
Keshan
e10d389561 [Model card] SinhalaBERTo model. (#7558)
* [Model card] SinhalaBERTo model.

This is the model card for keshan/SinhalaBERTo model.

* Update model_cards/keshan/SinhalaBERTo/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-07 16:40:52 -04:00
Amine Abdaoui
167bce56f2 [model_card] bert-base-5lang-cased (#7573)
Co-authored-by: Amin <amin.geotrend@gmail.com>
2020-10-07 16:38:14 -04:00
Abed khooli
923dd4e5ef Create README.md (#7581) 2020-10-07 16:37:40 -04:00
dartrevan
85ead0fec4 Update README.md (#7590) 2020-10-07 16:37:10 -04:00
Ilias Chalkidis
c6b9c72eac Update README.md (#7629)
Minor changes: Add arxiv link + Layout improvement + fix typos
2020-10-07 16:36:08 -04:00
Abhilash Majumder
048b4bd2c6 Create Model Card For "abhilash1910/french-roberta" Model (#7544) 2020-10-07 16:35:28 -04:00
Julien Chaumond
c2e0d8ac52 [model_card] nikokons/gpt2-greek
by @nikkon3
2020-10-07 16:28:47 -04:00
Sam Shleifer
e2bb9abb6a [s2s] release pseudolabel links and instructions (#7639) 2020-10-07 11:20:44 -04:00
Sylvain Gugger
08ba4b4902 Trainer callbacks (#7596)
* Initial callback proposal

* Finish various callbacks

* Post-rebase conflicts

* Fix tests

* Don't use something that's not set

* Documentation

* Remove unwanted print.

* Document all models can work

* Add tests + small fixes

* Update docs/source/internal/trainer_utils.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* Fix TF tests

* Real fix this time

* This one should work

* Fix typo

* Really fix typo

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-07 10:50:21 -04:00
Lysandre Debut
8fa0c956b3 Add GPT2 to sequence classification auto model (#7630) 2020-10-07 05:20:05 -04:00
Gabriele Picco
e084089eb9 Fix tokenizer UnboundLocalError when padding is set to PaddingStrategy.MAX_LENGTH (#7610)
* Fix UnboundLocalError when PaddingStrategy is MAX_LENGTH

* Fix UnboundLocalError for TruncationStrategy
2020-10-06 18:16:00 -04:00
Philipp
adfe6ace88 Fix wrong reference name/filename in docstring (#7616)
Resolves: #7613
2020-10-06 18:02:29 -04:00
Lysandre
f0d20ad328 Fix-copies 2020-10-06 23:44:03 +02:00
Lysandre Debut
5982431814 Add GPT2ForSequenceClassification based on DialogRPT (#7501)
* Add GPT2ForSequenceClassification based on DialogRPT

* Better documentation

* Code quality
2020-10-06 17:31:21 -04:00
Sam Shleifer
500be01c5d [s2s] save first batch to json for debugging purposes (#6810) 2020-10-06 16:11:56 -04:00
Sam Shleifer
2b574e7c60 [bart] fix config.classif_dropout (#7593) 2020-10-06 11:33:51 -04:00
Ahmed Elnaggar
aa6c3c14b4 typo fix (#7611)
It should be T5-3B not T5-3M.
2020-10-06 15:32:52 +02:00
Adrien David-Sivelle
98fb718577 Docker GPU Images: Add NVIDIA/apex to the cuda images with pytorch (#7598)
- Use cuda:10.2 image instead of 10.1 (to address version mismatch
  warning with pytorch)
- Use devel version that is built on the runtime and includes headers
  and development tools (was otherwise failing to build apex)
2020-10-06 15:23:32 +02:00
George Mihaila
4d541f516f fix return dicitonary labels from masked_lm_labels to labels (#7595) 2020-10-06 09:12:04 -04:00
cedspam
8d2c248df7 Update README.md (#7612) 2020-10-06 08:46:55 -04:00
Ilias Chalkidis
1c80b2c604 Create README.md (LEGAL-BERT Model card) (#7607)
* Create README.md

Model description for all LEGAL-BERT models, published as part of  "LEGAL-BERT: The Muppets straight out of Law School". Chalkidis et al., 2018, In Findings of EMNLP 2020

* Update model_cards/nlpaueb/legal-bert-base-uncased/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-06 08:46:17 -04:00
Siddharth Jain
eda27f4494 [TF generation] Fix typo (#7582)
* Fixing top_k and min_length assertions, and a typo fix

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-10-06 12:47:16 +02:00
Lysandre Debut
0257992e4a Fix squeezebert docs (#7587)
* Configuration

* Modeling

* Tokenization

* Obliterate the trailing spaces

* From underlines to long underlines
2020-10-06 06:22:04 -04:00
Ahmed Elnaggar
66c72082d0 Add ProtT5-XL-BFD model card (#7606)
* Add ProtT5-XL-BFD model card

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-10-06 12:19:21 +02:00
Stas Bekman
b21a30bdd8 [makefile] check only .py files (#7588)
* check only .py files

* better choice of words
2020-10-06 05:25:21 -04:00
Sam Shleifer
d5d2744aa7 Support T5 Distillation w/hidden state supervision (#7599) 2020-10-05 21:31:48 -04:00
Lysandre Debut
818c294fdd The toggle actually sticks (#7586) 2020-10-05 11:23:57 -04:00
Sylvain Gugger
03835af700 Documentation fixes (#7585) 2020-10-05 11:01:03 -04:00
Julien Plu
9cf7b23b9b Custom TF weights loading (#7422)
* First try

* Fix TF utils

* Handle authorized unexpected keys when loading weights

* Add several more authorized unexpected keys

* Apply style

* Fix test

* Address Patrick's comments.

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply style

* Make return_dict the default behavior and display a warning message

* Revert

* Replace wrong keyword

* Revert code

* Add forgot key

* Fix bug in loading PT models from a TF one.

* Fix sort

* Add a test for custom load weights in BERT

* Apply style

* Remove unused import

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-05 09:58:45 -04:00
Sylvain Gugger
d3adb985d1 Expand test to locate flakiness (#7580) 2020-10-05 09:45:47 -04:00
Sylvain Gugger
b2b7fc7814 Check and update model list in index.rst automatically (#7527)
* Check and update model list in index.rst automatically

* Check and update model list in index.rst automatically

* Adapt template
2020-10-05 09:40:45 -04:00
Sylvain Gugger
ca05c2a47d Fix post_init of some TrainingArguments (#7525) 2020-10-05 09:19:16 -04:00
Sylvain Gugger
3bd3d8b549 Add new dummy PT objects 2020-10-05 09:13:47 -04:00
Sylvain Gugger
28d183c90c Allow soft dependencies in the namespace with ImportErrors at use (#7537)
* PoC on RAG

* Format class name/obj name

* Better name in message

* PoC on one TF model

* Add PyTorch and TF dummy objects + script

* Treat scikit-learn

* Bad copy pastes

* Typo
2020-10-05 09:12:04 -04:00
Joshua H
1a00f46c74 Update Code example according to deprecation of AutoModeWithLMHead (#7555)
'The class `AutoModelWithLMHead` is deprecated and will be removed in a future version. Please use `AutoModelForCausalLM` for causal language models, `AutoModelForMaskedLM` for masked language models and `AutoModelForSeq2SeqLM` for encoder-decoder models.'
I dont know how to change the 'How to use this model directly from the 🤗/transformers library:' part since it is not part of the model-paper
2020-10-05 08:21:21 -04:00
Amine Abdaoui
0d79de7322 docs(pretrained_models): fix num parameters (#7575)
* docs(pretrained_models): fix num parameters

* fix(pretrained_models): correct typo

Co-authored-by: Amin <amin.geotrend@gmail.com>
2020-10-05 07:50:56 -04:00
Malte Pietsch
ba5ea66e30 Fix tokenization in SQuAD for RoBERTa, Longformer, BART (#7387)
* fix squad tokenization for roberta & co

* change to pure type based check

* sort imports
2020-10-05 06:34:13 -04:00
Sylvain Gugger
0270256b27 Allow nested tensors in predicted logits (#7542) 2020-10-05 06:33:15 -04:00
Cola
60de910e60 Add power argument for TF PolynomialDecay (#5732)
* 🚩 Add `power` argument for TF PolynomialDecay

* 🚩 Create default optimizer with power

* 🚩 Add argument to training args

* 🚨 Clean code format

* 🚨 Fix black warning

* 🚨 Fix code format
2020-10-05 05:16:29 -04:00
Lysandre Debut
41c3a3b98e Add Electra unexpected keys (#7569) 2020-10-05 04:49:39 -04:00
Nathan Cooper
071970feb8 [Model card] Java Code Summarizer model (#7568)
* Create README.md

* Update model_cards/ncoop57/bart-base-code-summarizer-java-v0/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-05 04:49:17 -04:00
Forrest Iandola
02ef825be2 SqueezeBERT architecture (#7083)
* configuration_squeezebert.py

thin wrapper around bert tokenizer

fix typos

wip sb model code

wip modeling_squeezebert.py. Next step is to get the multi-layer-output interface working

set up squeezebert to use BertModelOutput when returning results.

squeezebert documentation

formatting

allow head mask that is an array of [None, ..., None]

docs

docs cont'd

path to vocab

docs and pointers to cloud files (WIP)

line length and indentation

squeezebert model cards

formatting of model cards

untrack modeling_squeezebert_scratchpad.py

update aws paths to vocab and config files

get rid of stub of NSP code, and advise users to pretrain with mlm only

fix rebase issues

redo rebase of modeling_auto.py

fix issues with code formatting

more code format auto-fixes

move squeezebert before bert in tokenization_auto.py and modeling_auto.py because squeezebert inherits from bert

tests for squeezebert modeling and tokenization

fix typo

move squeezebert before bert in modeling_auto.py to fix inheritance problem

disable test_head_masking, since squeezebert doesn't yet implement head masking

fix issues exposed by the test_modeling_squeezebert.py

fix an issue exposed by test_tokenization_squeezebert.py

fix issue exposed by test_modeling_squeezebert.py

auto generated code style improvement

issue that we inherited from modeling_xxx.py: SqueezeBertForMaskedLM.forward() calls self.cls(), but there is no self.cls, and I think the goal was actually to call self.lm_head()

update copyright

resolve failing 'test_hidden_states_output' and remove unused encoder_hidden_states and encoder_attention_mask

docs

add integration test. rename squeezebert-mnli --> squeezebert/squeezebert-mnli

autogenerated formatting tweaks

integrate feedback from patrickvonplaten and sgugger to programming style and documentation strings

* tiny change to order of imports
2020-10-05 04:25:43 -04:00
Sylvain Gugger
e2c935f561 Cleanup documentation for BART, Marian, MBART and Pegasus (#7523)
* Cleanup documentation for BART, Marian, MBART and Pegasus

* Cleanup documentation for BART, Marian, MBART and Pegasus
2020-10-05 04:22:12 -04:00
Alexandr
5e941bece2 LayoutLM: add exception handling for bbox values (#7452)
* LayoutLM: add exception handling for bbox values

To replicate unhandled error:

- In `test_modelling_layoutlm.py` set `range_bbox=1025`, i.e. greater 1024
- Run `pytest tests/test_modeling_layoutlm.py`

Requirement for bbox values to be within the range 0-1000 is documented
but if it is violated then it isa not clear what is the issue from error
message.

* Update src/transformers/modeling_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-05 04:17:14 -04:00
Dhaval Taunk
2ca0fae9a6 added script for fine-tuning roberta for sentiment analysis task (#7505) 2020-10-05 03:57:15 -04:00
Sylvain Gugger
95f792afb0 Remove labels from the RagModel example (#7560) 2020-10-04 17:39:23 -04:00
Suraj Patil
99cb924bfb [s2s] add config params like Dropout in Seq2SeqTrainingArguments (#7532) 2020-10-04 12:42:30 -04:00
Sam Shleifer
9bdce3a4f9 [s2s] fix lockfile and peg distillation constants (#7545) 2020-10-02 15:58:14 -04:00
Sam Shleifer
de4d7b004a [s2s] Adafactor support for builtin trainer (#7522) 2020-10-01 17:27:45 -04:00
Sam Shleifer
d3a9601a11 [s2s] trainer scripts: Remove --run_name, thanks sylvain! (#7521) 2020-10-01 17:18:47 -04:00
Sylvain Gugger
bdcc4b78a2 Fix seq2seq example test (#7518)
* Fix seq2seq example test

* Fix bad copy-paste

* Also save the state
2020-10-01 14:13:29 -04:00
Sylvain Gugger
29baa8fabe Clean the Trainer state (#7490)
* Trainer should not modify its TrainingArguments

* Trainer should not modify its TrainingArguments

* Trainer should not modify its TrainingArguments

* Add test of resumed training

* Fixes

* Non multiGPU test

* Clean Trainer state

* Add more to the state

* Documentation

* One last test

* Make resume training test more complete

* Unwanted changes
2020-10-01 13:07:04 -04:00
Sam Shleifer
2a358f45ef [s2s] fix nltk pytest race condition with FileLock (#7515) 2020-10-01 12:51:09 -04:00
Suraj Patil
72d363d979 [examples/s2s] clean up finetune_trainer (#7509) 2020-10-01 12:19:29 -04:00
Patrick von Platen
bd2621583b fix data type (#7513) 2020-10-01 18:15:41 +02:00
Patrick von Platen
62f5ae68ec [Seq2Seq] Fix a couple of bugs and clean examples (#7474)
* clean T5

* fix t5 tests

* fix index typo

* fix tf common test

* fix examples

* change positional ordering for Bart and FSTM

* add signature test

* clean docs and add tests

* add docs to encoder decoder

* clean docs

* correct two doc strings

* remove sig test for TF Elektra & Funnel

* fix tf t5 slow tests

* fix input_ids to inputs in tf

* Update src/transformers/modeling_bart.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_bart.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* implement lysandre results

* make style

* fix encoder decoder typo

* fix tf slow tests

* fix slow tests

* renaming

* remove unused input

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-01 17:38:50 +02:00
Muhammad Harris
a42f62d34f Train T5 in Tensoflow 2 Community Notebook (#7428)
* t5 t5 community notebook added

* author link updated

* t5 t5 community notebook added

* author link updated

* new colab link updated

Co-authored-by: harris <muhammad.harris@visionx.io>
2020-10-01 16:54:29 +02:00
Kai Fricke
5fc3b5cba4 Fix Tune progress_reporter kwarg (#7508) 2020-10-01 10:34:31 -04:00
Kai Fricke
dabc85d1ba Report Tune metrics in final evaluation (#7507) 2020-10-01 09:52:36 -04:00
Alexandr
9a92afb6d0 Update LayoutLM doc (#7388)
Co-authored-by: Alexandr Maslov <avmaslov3@gmail.com>
2020-10-01 09:11:42 -04:00
Julien Chaumond
e32390931d [model_card] distilbert-base-german-cased 2020-10-01 09:08:49 -04:00
Julien Chaumond
9a4e163b58 [model_card] Fix metadata, adalbertojunior/PTT5-SMALL-SUM 2020-10-01 08:54:06 -04:00
Adalberto
8435e10e24 Create README.md (#7299)
* Create README.md

* language metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-01 08:52:28 -04:00
Martin Müller
d727432072 Update README.md (#7459) 2020-10-01 08:51:26 -04:00
allenyummy
664da5b077 Create README.md (#7468) 2020-10-01 08:50:26 -04:00
ahotrod
f745f61c99 Update README.md (#7491)
Model now fine-tuned on Transformers 3.1.0, previous out-of-date model was fine-tuned on Transformers 2.3.0.
2020-10-01 08:50:07 -04:00
Abed khooli
6ef7658c0a Create README.md (#7349)
Model card for akhooli/personachat-arabic
2020-10-01 08:48:51 -04:00
Bayartsogt Yadamsuren
15ab3f049b Creating readme for bert-base-mongolian-cased (#7439)
* Creating readme for bert-base-mongolian-cased

* Update model_cards/bayartsogt/bert-base-mongolian-cased/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-01 08:46:27 -04:00
Bayartsogt Yadamsuren
0c2b9fa831 creating readme for bert-base-mongolian-uncased (#7440) 2020-10-01 08:45:22 -04:00
Akshay Gupta
381443c096 Update README.md (#7498)
Making transformers readme more robust.
2020-10-01 07:42:07 -04:00
Lysandre Debut
85d2d8c920 Fix local_files_only for TF (#6091) 2020-10-01 05:06:02 -04:00
Sam Shleifer
9e80f972fb Enable pegasus fp16 by clamping large activations (#7243)
* Clean clamp

* boom boom

* Take some other changes

* boom boom

* boom boom

* boom boom

* one chg

* fix test

* Use finfo

* style
2020-10-01 04:48:37 -04:00
Sylvain Gugger
be51c1039d Add forgotten return_dict argument in the docs (#7483) 2020-10-01 04:41:29 -04:00
Sam Shleifer
48f23f92a8 [s2sTrainer] test + code cleanup (#7467) 2020-10-01 00:33:01 -04:00
Sam Shleifer
097049b81b Distributed Trainer: 2 little fixes (#7461)
* reset model.config

* Update src/transformers/trainer.py

* use lower case tensor

* Just tensor change
2020-09-30 22:14:14 -04:00
Julien Chaumond
0acd1ffa09 [doc] rm Azure buttons as not implemented yet 2020-09-30 17:31:08 -04:00
Sam Shleifer
03e46c1de3 [s2s] fix kwargs style (#7488) 2020-09-30 17:00:06 -04:00
Sam Shleifer
6fe8a693eb [s2s] Fix t5 warning for distributed eval (#7487) 2020-09-30 16:58:03 -04:00
Sylvain Gugger
4c6728460a Bump isort version. (#7484) 2020-09-30 13:44:58 -04:00
Amanpreet Singh
c031d01023 Seq2SeqDataset: avoid passing src_lang everywhere (#7470)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-30 13:27:48 -04:00
Suraj Patil
08939cfdf7 [s2strainer] fix eval dataset loading (#7477) 2020-09-30 12:39:13 -04:00
Sylvain Gugger
a97a73e0ee Small QOL improvements to TrainingArguments (#7475)
* Small QOL improvements to TrainingArguments

* With the self.
2020-09-30 12:12:03 -04:00
Sylvain Gugger
dc7d2daa4c Alphabetize model lists (#7478) 2020-09-30 10:43:58 -04:00
Sylvain Gugger
fdccf82e28 Remove config assumption in Trainer (#7464)
* Remove config assumption in Trainer

* Initialize for eval
2020-09-30 09:03:25 -04:00
François REMY
cc4eff8087 Make transformers install check positive (#7473)
When transformers is correctly installed, you should get a positive message ^_^
2020-09-30 07:44:40 -04:00
Pengcheng He
7a0cf0ec93 Add DeBERTa model (#5929)
* Add DeBERTa model

* Remove dependency of deberta

* Address comments

* Patch DeBERTa
Documentation
Style

* Add final tests

* Style

* Enable tests + nitpicks

* position IDs

* BERT -> DeBERTa

* Quality

* Style

* Tokenization

* Last updates.

* @patrickvonplaten's comments

* Not everything can be a copy

* Apply most of @sgugger's review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Last reviews

* DeBERTa -> Deberta

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-30 07:07:30 -04:00
Lysandre Debut
44a93c981f Number of GPUs for multi-gpu (#7472) 2020-09-30 06:53:20 -04:00
Lysandre Debut
886ef35ce6 Fix LXMERT with DataParallel (#7471) 2020-09-30 06:41:24 -04:00
Lysandre
35e94c68df Number of GPUs 2020-09-30 12:29:26 +02:00
Lysandre Debut
056723ad1d Multi-GPU setup (#7453) 2020-09-30 05:53:34 -04:00
Sylvain Gugger
4ba248748f Get a better error when check_copies fails (#7457)
* Get a better error when check_copies fails

* Fix tests
2020-09-30 10:05:14 +02:00
Sam Shleifer
bef0175168 remove codecov PR comments (#7400) 2020-09-29 15:16:43 -04:00
Sylvain Gugger
a1c2ef7bd0 Add documentation for v3.3.1 2020-09-29 14:31:43 -04:00
Sylvain Gugger
1ba08dc221 Release: v3.3.1 2020-09-29 14:17:34 -04:00
Sylvain Gugger
8546dc55c2 Fix Trainer tests in a multiGPU env (#7458) 2020-09-29 14:06:41 -04:00
Sylvain Gugger
d0fd7154c5 Catch import datasets common errors (#7456) 2020-09-29 13:42:09 -04:00
Sylvain Gugger
f1220c5fe2 Add a code of conduct (#7433) 2020-09-29 13:38:47 -04:00
Teven
9e9a1fb8c7 Adding gradient checkpointing to GPT2 (#7446)
* GPT2 gradient checkpointing

* find_unused_parameters removed if checkpointing

* find_unused_parameters removed if checkpointing

* Update src/transformers/configuration_gpt2.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Added a test for generation with checkpointing

* Update src/transformers/configuration_gpt2.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-29 12:26:26 -04:00
Sylvain Gugger
52e8392b7e Add automatic best model loading to Trainer (#7431)
* Add automatic best model loading to Trainer

* Some small fixes

* Formatting
2020-09-29 10:41:18 -04:00
Sylvain Gugger
1fc4de69ed Document new features of make fixup (#7434) 2020-09-29 03:56:57 -04:00
GmailB
205bf0b7ea Update README.md (#7444)
Hi, just corrected the example code, add 2 links and fixed some typos
2020-09-29 03:18:01 -04:00
Sam Shleifer
74d8d69bd4 [s2s] consistent output format across eval scripts (#7435) 2020-09-28 23:20:03 -04:00
Typicasoft
671b278e25 Create README.md (#7436)
* Create README.md

MagBERT-NER : Added widget (Text)

* Rename model_cards/README.md to model_cards/TypicaAI/magbert-ner/README.md
2020-09-28 18:25:25 -04:00
Manuel Romero
a1a8ffa512 Update README.md (#7429)
Add links to models fine-tuned on a downstream task
2020-09-28 13:40:09 -04:00
Stas Bekman
f62f2ffdcc [makefile] 10x speed up checking/fixing (#7403)
* [makefile] check/fix only modified since branching files

* fix phonies

* parametrize dirs

* have only one source for dirs to check

* look ma, no autoformatters here
2020-09-28 10:45:42 -04:00
Lysandre
16c213820e Update docs to version v3.3.0 2020-09-28 16:32:00 +02:00
Lysandre
0613f05226 Release: v3.3.0 2020-09-28 16:24:43 +02:00
Sylvain Gugger
ca3fc36de3 Reorganize documentation navbar (#7423)
* Reorganize documentation navbar

* Update css to have clear sections
2020-09-28 16:22:58 +02:00
Lysandre Debut
7f4115c099 Pull request template (#7392)
co-authored-by: sgugger <sylvain.gugger@gmail.com>

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
2020-09-28 09:51:49 -04:00
Sylvain Gugger
0611eab5e3 Document RAG again (#7377)
Do not merge before Monday
2020-09-28 08:31:46 -04:00
Sylvain Gugger
7563d5a3cf Catch PyTorch warning when saving/loading scheduler (#7401) 2020-09-28 08:20:10 -04:00
Boris Dayma
1749ca317e docs: fix model sharing file names (#5855)
* docs: fix model sharing file names

* Update docs/source/model_sharing.rst

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* docs(model_sharing.rst): fix new line

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-28 08:17:30 -04:00
Patrick von Platen
8279471506 correct RAG model cards (#7420) 2020-09-28 11:08:39 +02:00
Marcin Zabłocki
4083a55ab0 Flos fix (#7384) 2020-09-28 04:09:26 -04:00
Ola Piktus
ae3e84f3ba [RAG] Clean Rag readme in examples (#7413)
* Improve README + consolidation script

* Reformat README

* Reformat README

Co-authored-by: Your Name <you@example.com>
2020-09-28 10:06:39 +02:00
Sam Shleifer
748425d47d [T5] allow config.decoder_layers to control decoder size (#7409)
* Working assymmetrical T5

* rename decoder_layers -> num_decoder_layers

* Fix docstring

* Allow creation of asymmetric t5 students
2020-09-28 03:08:04 -04:00
Sam Shleifer
7296fea1d6 [s2s] rougeLSum expects \n between sentences (#7410)
Co-authored-by: Swetha Mandava <smandava@nvidia.com>
2020-09-27 16:27:19 -04:00
Suraj Patil
eab5f59682 [s2s] add create student script (#7290)
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-27 15:10:46 -04:00
Patrick von Platen
e50a931c11 [Longformer, Bert, Roberta, ...] Fix multi gpu training (#7272)
* fix multi-gpu

* fix longformer

* force to delete unnecessary layers

* fix notifications

* fix warning

* fix roberta

* fix tests

* remove hasattr

* fix tests

* fix roberta

* merge and clean authorized keys
2020-09-25 20:33:21 +02:00
Patrick von Platen
2c8ecdf8a8 fix rag retriever save pretrained (#7399) 2020-09-25 19:47:12 +02:00
Patrick von Platen
1a14687e6f Update README.md 2020-09-25 19:43:48 +02:00
Patrick von Platen
3327c2b0f6 Update README.md 2020-09-25 19:43:36 +02:00
Ola Piktus
fe326bd5cf Remove dependency on examples/seq2seq from rag (#7395)
Co-authored-by: Your Name <you@example.com>
2020-09-25 18:20:49 +02:00
Sylvain Gugger
ad39271ae8 Fix FP16 and attention masks in FunnelTransformer (#7374)
* Fix #7371

* Fix training

* Fix test values

* Apply the fix to TF as well
2020-09-25 12:20:39 -04:00
Patrick von Platen
4e5b036bdd Update README.md 2020-09-25 18:16:46 +02:00
Patrick von Platen
55eccfbb49 Update README.md 2020-09-25 18:16:44 +02:00
Sylvain Gugger
e2e77f02c2 Fix BartModel output documentation (#7390) 2020-09-25 11:48:13 -04:00
Sylvain Gugger
bbb07830ff Speedup check_copies script (#7394) 2020-09-25 11:47:22 -04:00
Stas Bekman
8859c4f841 [code quality] new make target that combines style and quality targets (#7310)
* [code quality] merge style and quality targets

Any reason why we don't run `flake8` in `make style`? I find myself needing to run `make style` and `make quality` all the time, but I need the latter just for the last 2 checks. Since we have no control over the source code why bother with separating checking and fixing - let's just have one target that fixes and then performs the remaining checks, as we know the first two have been done already.

This PR suggests to merge the 2 targets into one efficient target.

I will edit the docs if this change resonates with the team.

* move checks into style, re-use target

* better name

* add fixup target

* document new target
2020-09-25 11:37:40 -04:00
Sam Shleifer
38a1b03f4d Remove unhelpful bart warning (#7391) 2020-09-25 11:01:07 -04:00
Patrick von Platen
5ff0d6d7d0 Update README.md 2020-09-25 16:58:29 +02:00
Quentin Lhoest
cf1c88e092 [RAG] Fix retrieval offset in RAG's HfIndex and better integration tests (#7372)
* Fix retrieval offset in RAG's HfIndex

* update slow tests

* style

* fix new test

* style

* add better tests

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-09-25 16:12:46 +02:00
Patrick von Platen
571c7a11c1 [Rag] Fix wrong usage of num_beams and bos_token_id in Rag Sequence generation (#7386)
* fix_rag_sequence

* add second bug fix
2020-09-25 14:35:49 +02:00
Suraj Patil
415071b4c2 doc changes (#7385) 2020-09-25 08:00:36 -04:00
Patrick von Platen
2dd652d757 [RAG] Add missing doc and attention_mask to rag (#7382)
* add docs

* add missing docs and attention_mask in fine-tune
2020-09-25 11:23:55 +02:00
Lysandre Debut
7cdd9da5bf Check config type using type instead of isinstance (#7363)
* Check config type instead of instance


Bad merge

* Remove for loops

* Style
2020-09-25 05:09:09 -04:00
Sam Shleifer
3c6bf8998f modeling_bart: 3 small cleanups that dont change outputs (#7381)
* Mbart passing

* boom boom

* cleaner assert

* add assert

* Fix tests
2020-09-25 04:24:14 -04:00
Suraj Patil
9e68d075a4 Seq2SeqTrainer (#6769)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-24 18:46:58 -04:00
Sam Shleifer
d9d0f1140b [s2s] distributed eval allows num_return_sequences > 1 (#7254) 2020-09-24 17:30:09 -04:00
Patrick von Platen
0804d077c6 correct attention mask (#7373) 2020-09-24 23:22:04 +02:00
Stas Bekman
a8cbc4269c [fsmt] build/test scripts (#7257)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-24 17:10:26 -04:00
Sylvain Gugger
a8e7982f84 Remove mentions of RAG from the docs (#7376)
* Remove mentions of  RAG from the docs

* Deactivate check
2020-09-24 17:07:14 -04:00
Stas Bekman
eadd870b2f [seq2seq] make it easier to run the scripts (#7274) 2020-09-24 15:23:48 -04:00
Lysandre Debut
8d3bb781ee Formatter (#7368)
* Formatter

* Docs
2020-09-24 10:59:21 -04:00
Teven
7dfdf793bb Fixing case in which Trainer hung while saving model in distributed training (#7365)
* remote debugging

* remote debugging

* moved _store_flos call

* moved _store_flos call

* moved _store_flos call

* removed debugging artefacts
2020-09-24 09:56:40 -04:00
Sylvain Gugger
0ccb6f5c6d Clean RAG docs and template docs (#7348)
* Clean RAG docs and template docs

* Fix typo

* Better doc
2020-09-24 09:24:41 -04:00
Sylvain Gugger
27174bd4fe Make PyTorch model files independent from each other (#7352) 2020-09-24 08:53:54 -04:00
Julien Plu
d161ed1682 Update the TF models to remove their interdependencies (#7238)
* Refacto the models to remove their interdependencies

* Fix Flaubert model

* Fix Flaubert

* Fix XLM

* Fix Albert

* Fix Roberta

* Fix Albert

* Fix Flaubert

* Apply style + remove unused imports

* Fix Distilbert

* remove unused import

* fix Distilbert

* Fix Flaubert

* Apply style

* Fix Flaubert

* Add the copy comments for the check_copies script

* Fix MobileBert model name

* Address Morgan's comments

* Fix typo

* Oops typo
2020-09-24 08:30:59 -04:00
Jabin Huang
0cffa424f8 Updata tokenization_auto.py (#6870)
Updata tokenization_auto.py to handle Inherited tokenizer
2020-09-24 06:52:10 -04:00
Daquan Lin
03fb8e79c6 Update modeling_tf_longformer.py (#7359)
correct a very small mistake
2020-09-24 11:37:29 +02:00
Sylvain Gugger
1ff5bd38a3 Check decorator order (#7326)
* Check decorator order

* Adapt for parametrized decorators

* Fix typos
2020-09-24 04:54:37 -04:00
Sylvain Gugger
0be5f4a00c Expand a bit the documentation doc (#7350) 2020-09-24 04:34:18 -04:00
Sam Shleifer
38f1703795 wip: Code to add lang tags to marian model cards (#6586) 2020-09-23 18:11:06 -04:00
Theo Linnemann
129fdae040 Remove reference to args in XLA check (#7344)
Previously, the TFTrainingArguments object did a check to see if XLA was enabled, but did this by referencing `self.args.xla`, when it should be `self.xla`, because it is the args object. This can be verified a few lines above, where the XLA field is set.
2020-09-23 13:56:21 -04:00
Felipe Curti
d266613635 [Benchmarks] Change all args to from no_... to their positive form (#7075)
* Changed name to all no_... arguments and all references to them, inverting the boolean condition

* Change benchmark tests to use new Benchmark Args

* Update src/transformers/benchmark/benchmark_args_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/benchmark/benchmark.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix Style. Add --no options in help

* fix some part of tests

* Update src/transformers/benchmark/benchmark_args_utils.py

* Update src/transformers/benchmark/benchmark_args_utils.py

* Update src/transformers/benchmark/benchmark_args_utils.py

* fix all tests

* make style

* add backwards compability

* make backwards compatible

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: fmcurti <fcurti@DESKTOP-RRQURBM.localdomain>
2020-09-23 13:25:24 -04:00
Doug Blank
8c697d58ef Ensure that integrations are imported before transformers or ml libs (#7330)
* Ensure that intergrations are imported before transformers or ml libs

* Black reformatter wanted a newline

* isort requests

* black requests

* flake8 requests
2020-09-23 13:23:45 -04:00
Sylvain Gugger
3323146e90 Models doc (#7345)
* Clean up model documentation

* Formatting

* Preparation work

* Long lines

* Main work on rst files

* Cleanup all config files

* Syntax fix

* Clean all tokenizers

* Work on first models

* Models beginning

* FaluBERT

* All PyTorch models

* All models

* Long lines again

* Fixes

* More fixes

* Update docs/source/model_doc/bert.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update docs/source/model_doc/electra.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Last fixes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-23 13:20:45 -04:00
Wissam Antoun
58405a527b Fixed evaluation_strategy on epoch end bug (#7340)
* Fixed evaluation_strategy on epoch end bug

move the evaluation script outside the the iteration loop

* black formatting
2020-09-23 13:17:00 -04:00
Stas Bekman
28cf873036 [testing] skip decorators: docs, tests, bugs (#7334)
* skip decorators: docs, tests, bugs

* another important note

* style

* bloody style

* add @pytest.mark.parametrize

* add note

* no idea what it wants :(
2020-09-23 05:16:19 -04:00
Stas Bekman
df53643807 [code quality] fix confused flake8 (#7309)
* fix confused flake

We run `black  --target-version py35 ...` but flake8 doesn't know that, so currently with py38 flake8 fails suggesting that black should have reformatted 63 files. Indeed if I run:

```
black --line-length 119 --target-version py38 examples templates tests src utils
```
it indeed reformats 63 files.

The only solution I found is to create a black config file as explained at https://github.com/psf/black#configuration-format, which is what this PR adds.

Now flake8 knows that py35 is the standard and no longer gets confused regardless of the user's python version.

* adjust the other files that will now rely on black's config file
2020-09-22 22:12:36 -04:00
Sam Shleifer
78387cc63e [s2s] only save metrics.json from rank zero (#7331) 2020-09-22 18:27:28 -04:00
Sam Shleifer
e53138a1b9 [s2s] add src_lang kwarg for distributed eval (#7300) 2020-09-22 18:26:37 -04:00
blinovpd
a9c7849cfa [model_cards] blinoff/roberta-base-russian-v0 (#7317) 2020-09-22 18:26:13 -04:00
Sylvain Gugger
f5518e5631 Formatting 2020-09-22 14:55:12 -04:00
Chady Kamar
17099ebd58 Add num workers cli arg (#7322)
* Add dataloader_num_workers to TrainingArguments

This argument is meant to be used to set the
number of workers for the PyTorch DataLoader.

* Pass num_workers argument on DataLoader init
2020-09-22 14:44:42 -04:00
Sam Shleifer
25b0463d0b [s2s] add supported architecures to MD (#7252) 2020-09-22 13:09:35 -04:00
Pavel Soriano
d6bc72c469 Fixed results of SQuAD-FR evaluation (#7313)
The score for the F1 metric was reported as the Exact Match and vice-versa.
2020-09-22 12:39:07 -04:00
Huang Lianzhe
6303b5a718 [Bug Fix] The actual batch_size is inconsistent with the settings. (#7235)
* [bug fix] fixed the bug that the actual batch_size is inconsistent with the parameter settings

* reformat

* reformat

* reformat

* add support for dict and BatchEncoding

* add support for dict and BatchEncoding

* add documentation for DataCollatorForNextSentencePrediction

* Some more nits for the docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Some more nits for the docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Some more nits for the docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Some more nits for the docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Some more nits for the docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* rename variables

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-22 12:31:21 -04:00
Ola Piktus
c754c41c61 RAG (#6813)
* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* Formatting / renaming prior to actual work

* First commit

* improve comments

* Retrieval evaluation scripts

* refactor to include modeling outputs + MPI retriever

* Fix rag-token model + refactor

* Various fixes + finetuning logic

* use_bos fix

* Retrieval refactor

* Finetuning refactoring and cleanup

* Add documentation and cleanup

* Remove set_up_rag_env.sh file

* Fix retrieval wit HF index

* Fix import errors

* Fix quality errors

* Refactor as per suggestions in https://github.com/huggingface/transformers/pull/6813#issuecomment-687208867

* fix quality

* Fix RAG Sequence generation

* minor cleanup plus initial tests

* fix test

* fix tests 2

* Comments fix

* post-merge fixes

* Improve readme + post-rebase refactor

* Extra dependencied for tests

* Fix tests

* Fix tests 2

* Refactor test requirements

* Fix tests 3

* Post-rebase refactor

* rename nlp->datasets

* RAG integration tests

* add tokenizer to slow integration test and allow retriever to run on cpu

* add tests; fix position ids warning

* change structure

* change structure

* add from encoder generator

* save working solution

* make all integration tests pass

* add RagTokenizer.save/from_pretrained and RagRetriever.save/from_pretrained

* don't save paths

* delete unnecessary imports

* pass config to AutoTokenizer.from_pretrained for Rag tokenizers

* init wiki_dpr only once

* hardcode legacy index and passages paths (todo: add the right urls)

* finalize config

* finalize retriver api and config api

* LegacyIndex index download refactor

* add dpr to autotokenizer

* make from pretrained more flexible

* fix ragfortokengeneration

* small name changes in tokenizer

* add labels to models

* change default index name

* add retrieval tests

* finish token generate

* align test with previous version and make all tests pass

* add tests

* finalize tests

* implement thoms suggestions

* add first version of test

* make first tests work

* make retriever platform agnostic

* naming

* style

* add legacy index URL

* docstrings + simple retrieval test for distributed

* clean model api

* add doc_ids to retriever's outputs

* fix retrieval tests

* finish model outputs

* finalize model api

* fix generate problem for rag

* fix generate for other modles

* fix some tests

* save intermediate

* set generate to default

* big refactor generate

* delete rag_api

* correct pip faiss install

* fix auto tokenization test

* fix faiss install

* fix test

* move the distributed logic to examples

* model page

* docs

* finish tests

* fix dependencies

* fix import in __init__

* Refactor eval_rag and finetune scripts

* start docstring

* add psutil to test

* fix tf test

* move require torch to top

* fix retrieval test

* align naming

* finish automodel

* fix repo consistency

* test ragtokenizer save/load

* add rag model output docs

* fix ragtokenizer save/load from pretrained

* fix tokenizer dir

* remove torch in retrieval

* fix docs

* fixe finetune scripts

* finish model docs

* finish docs

* remove auto model for now

* add require torch

* remove solved todos

* integrate sylvains suggestions

* sams comments

* correct mistake on purpose

* improve README

* Add generation test cases

* fix rag token

* clean token generate

* fix test

* add note to test

* fix attention mask

* add t5 test for rag

* Fix handling prefix in finetune.py

* don't overwrite index_name

Co-authored-by: Patrick Lewis <plewis@fb.com>
Co-authored-by: Aleksandra Piktus <piktus@devfair0141.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5102.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5067.h2.fair>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
2020-09-22 18:29:58 +02:00
Sylvain Gugger
1ee2194fb6 Mark big downloads slow (#7325)
* Make big downloads as slow

* Add import

* Right order for slow decorator

* More slow tests
2020-09-22 12:21:52 -04:00
Julien Plu
585217c87f Add generic text classification example in TF (#5716)
* Add new example with nlp

* Update README

* replace nlp by datasets

* Update examples/text-classification/README.md

Add Lysandre's suggestion.

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-22 12:05:05 -04:00
Lysandre
6e21f24220 Documentation version 2020-09-22 18:04:39 +02:00
Lysandre
3ebb1b3a2b Release: v3.2.0 2020-09-22 17:36:51 +02:00
Sylvain Gugger
01f0fd0bab Fixes for LayoutLM (#7318) 2020-09-22 10:37:11 -04:00
Julien Plu
702a76ff92 Create an XLA parameter and fix the mixed precision (#7311)
* Create an XLA parameter and fix mixed precision creation

* Fix issue brought by intellisense

* Complete docstring
2020-09-22 10:19:34 -04:00
Sylvain Gugger
596342c2b9 Support for Windows in check_copies (#7316) 2020-09-22 10:17:48 -04:00
Sylvain Gugger
89edf504bf Add possibility to evaluate every epoch (#7302)
* Add possibility to evaluate every epoch

* Remove multitype arg

* Remove needless import

* Use a proper enum

* Apply suggestions from @LysandreJik

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* One else and formatting

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-22 09:52:29 -04:00
Sylvain Gugger
21ca148090 is_pretokenized -> is_split_into_words (#7236)
* is_pretokenized -> is_split_into_words

* Fix tests
2020-09-22 09:34:35 -04:00
Julien Plu
324f361e91 Fix saving TF custom models (#7291)
* Fix #7277

* Apply style

* Add a full training pipeline test

* Apply style
2020-09-22 09:31:13 -04:00
Minghao Li
cd9a0585ea Add LayoutLM Model (#7064)
* first version

* finish test docs readme model/config/tokenization class

* apply make style and make quality

* fix layoutlm GitHub link

* fix conflict in index.rst and add layoutlm to pretrained_models.rst

* fix bug in test_parents_and_children_in_mappings

* reformat modeling_auto.py and tokenization_auto.py

* fix bug in test_modeling_layoutlm.py

* Update docs/source/model_doc/layoutlm.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/layoutlm.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove inh, add tokenizer fast, and update some doc

* copy and rename necessary class from modeling_bert to modeling_layoutlm

* Update src/transformers/configuration_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/configuration_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/configuration_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/configuration_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* add mish to activations.py, import ACT2FN and import logging from utils

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-22 09:28:02 -04:00
Sylvain Gugger
244e1b5ba3 Fix #7304 (#7305) 2020-09-22 09:20:03 -04:00
Lysandre Debut
e46108817e Adds FSMT to LM head AutoModel (#7312) 2020-09-22 06:35:51 -04:00
Stas Bekman
e2964b8a19 [fsmt] no need to pass device (#7292) 2020-09-22 05:39:06 -04:00
Sylvain Gugger
e4b94d8e58 Copy code from Bert to Roberta and add safeguard script (#7219)
* Copy code from Bert to Roberta and add safeguard script

* Fix docstring

* Comment code

* Formatting

* Update src/transformers/modeling_roberta.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add test and fix bugs

* Fix style and make new comand

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-22 05:02:27 -04:00
Sam Shleifer
656c27c3a3 [s2s] save hostname with repo info (#7301)
* save hostname
2020-09-21 17:26:24 -04:00
Thomas Winters
34a1b75f01 Added RobBERT-v2 model card (#7286)
* Added RobBERT-v2 model card

* minor Tweaks

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-21 16:17:28 -04:00
jjacampos
6513d16a48 IXAmBERT model card (#7283)
This PR includes the model card for the IXAmBERT model which has been recently uploaded to the huggingface repository.
2020-09-21 16:15:31 -04:00
Stas Bekman
af4b98ed97 [s2s] adjust finetune + test to work with fsmt (#7263) 2020-09-21 15:13:19 -04:00
Stas Bekman
8d562a2d1a [s2s] s/alpha_loss_encoder/alpha_encoder_loss/ (#7298)
fix to match `distillation.py:        self.alpha_encoder_loss`
2020-09-21 14:14:26 -04:00
Stas Bekman
cbb2f75a16 [s2s tests] fix test_run_eval_search (#7297) 2020-09-21 14:00:40 -04:00
Suraj Patil
7a88ed6c2a [model card] distlbart-mnli model cards (#7278) 2020-09-21 12:26:18 -04:00
Sylvain Gugger
63276b76d4 Fix #7284 (#7289) 2020-09-21 10:31:26 -04:00
Raphaël Bournhonesque
8d464374ba Disable missing weight warning (#7282) 2020-09-21 09:14:48 -04:00
Stas Bekman
8ff88d25e9 [fsmt] rewrite SinusoidalPositionalEmbedding + USE_CUDA test fixes + new TranslationPipeline test (#7224)
* fix USE_CUDA, add pipeline

* USE_CUDA fix

* recode SinusoidalPositionalEmbedding into nn.Embedding subclass

was needed for torchscript to work - this is now part of the state_dict, so will have to remove these keys during save_pretrained

* back out (ci debug)

* restore

* slow last?

* facilitate not saving certain keys and test

* remove no longer used keys

* style

* fix logging import

* cleanup

* Update src/transformers/modeling_utils.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* fix bug in max_positional_embeddings

* rename keys to keys_to_never_save per suggestion, improve the setup

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-21 09:13:35 -04:00
Dat Quoc Nguyen
67c4b0c517 Add model cards for new pre-trained BERTweet-COVID19 models (#7269)
Two new pre-trained models "vinai/bertweet-covid19-base-cased" and "vinai/bertweet-covid19-base-uncased" are resulted by further pre-training the pre-trained model "vinai/bertweet-base" on a  corpus of 23M COVID-19 English Tweets for 40 epochs.
2020-09-21 06:12:51 -04:00
Patrick von Platen
0cbe1139b1 Update README.md 2020-09-21 11:53:08 +02:00
Lysandre
aae4edb5f0 Addressing review comment 2020-09-21 11:37:00 +02:00
Suraj Patil
43b9d93875 [example/glue] fix compute_metrics_fn for bart like models (#7248)
* fix compute_metrics_fn

* p.predictions -> preds

* apply suggestions
2020-09-21 05:34:20 -04:00
guillaume-be
39062d05f0 Fixed target_mapping preparation for XLNet when batch size > 1 (incl. beam search) (#7267) 2020-09-21 04:53:52 -04:00
Nadir El Manouzi
4b3e55bdcc Add "Fine-tune ALBERT for sentence-pair classification" notebook to the community notebooks (#7255) 2020-09-21 04:25:22 -04:00
Stas Bekman
7cbf0f722d examples/seq2seq/__init__.py mutates sys.path (#7194) 2020-09-20 16:54:42 -04:00
Manuel Romero
a4faeceaed Fix typo in model name (#7268) 2020-09-20 19:12:30 +02:00
Stas Bekman
47ab3e8262 @slow has to be last (#7251)
Found an issue when `@slow` isn't the last decorator (gets ignored!), so documenting this significance.
2020-09-20 09:17:29 -04:00
Stas Bekman
4f6e525742 model card improvements (#7221) 2020-09-19 17:02:05 -04:00
Stas Bekman
eb074af75e fsmt tiny model card + script (#7244) 2020-09-19 14:37:12 -04:00
Manuel Romero
1d90d0f386 Add title to model card (#7240) 2020-09-19 02:10:45 -04:00
Manuel Romero
c9b7ef042f Create README.md (#7239) 2020-09-19 02:09:29 -04:00
Sam Shleifer
83dba10b8f [s2s] distributed_eval.py saves better speed info (#7242) 2020-09-18 15:46:01 -04:00
Dat Quoc Nguyen
af2322c7a0 Add new pre-trained models BERTweet and PhoBERT (#6129)
* Add BERTweet and PhoBERT models

* Update modeling_auto.py

Re-add `bart` to LM_MAPPING

* Update tokenization_auto.py

Re-add `from .configuration_mobilebert import MobileBertConfig`
not sure why it's replaced by `from transformers.configuration_mobilebert import MobileBertConfig`

* Add BERTweet and PhoBERT to pretrained_models.rst

* Update tokenization_auto.py

Remove BertweetTokenizer and PhobertTokenizer out of tokenization_auto.py (they are currently not supported by AutoTokenizer.

* Update BertweetTokenizer - without nltk

* Update model card for BERTweet

* PhoBERT - with Auto mode - without import fastBPE

* PhoBERT - with Auto mode - without import fastBPE

* BERTweet - with Auto mode - without import fastBPE

* Add PhoBERT and BERTweet to TF modeling auto

* Improve Docstrings for PhobertTokenizer and BertweetTokenizer

* Update PhoBERT and BERTweet model cards

* Fixed a merge conflict in tokenization_auto

* Used black to reformat BERTweet- and PhoBERT-related files

* Used isort to reformat BERTweet- and PhoBERT-related files

* Reformatted BERTweet- and PhoBERT-related files based on flake8

* Updated test files

* Updated test files

* Updated tf test files

* Updated tf test files

* Updated tf test files

* Updated tf test files

* Update commits from huggingface

* Delete unnecessary files

* Add tokenizers to auto and init files

* Add test files for tokenizers

* Revised model cards

* Update save_vocabulary function in BertweetTokenizer and PhobertTokenizer and test files

* Revised test files

* Update orders of Phobert and Bertweet tokenizers in auto tokenization file
2020-09-18 13:16:43 -04:00
Patrick von Platen
9397436ea5 Create README.md 2020-09-18 16:52:00 +02:00
Patrick von Platen
7eeca4d399 Create README.md 2020-09-18 16:44:02 +02:00
Patrick von Platen
31516c776a Update README.md 2020-09-18 16:37:14 +02:00
Patrick von Platen
4c14669a78 Update README.md 2020-09-18 16:35:11 +02:00
Yih-Dar
3a03bab9db Fix a few countings (steps / epochs) in trainer_tf.py (#7175) 2020-09-18 09:28:56 -04:00
Stefan Schweter
ee9eae4e06 token-classification: update url of GermEval 2014 dataset (#6571) 2020-09-18 06:18:06 -04:00
Julien Chaumond
eef8d94d19 [model_cards]
We use ISO 639-1 cc @gentaiscool
2020-09-18 12:09:24 +02:00
Patrick von Platen
afd6a9f827 Create README.md 2020-09-18 11:41:12 +02:00
Patrick von Platen
9f1544b9e0 Create README.md 2020-09-18 11:37:20 +02:00
Sameer Zahid
5c1d5ea667 Fixed typo in README (#7233) 2020-09-18 04:52:43 -04:00
Yuta Hayashibe
7719ecd19f Fix a typo (#7225) 2020-09-18 04:23:33 -04:00
Manuel Romero
4a26e8ac5f Create README.md (#7205) 2020-09-18 03:24:30 -04:00
Manuel Romero
94320c5b81 Add customized text to widget (#7204) 2020-09-18 03:24:23 -04:00
Manuel Romero
3aefb24b20 Create README.md (#7209) 2020-09-18 03:24:10 -04:00
Manuel Romero
a22e7a8dd4 Create README.md (#7210) 2020-09-18 03:23:58 -04:00
Manuel Romero
c028b26481 Create README.md (#7212) 2020-09-18 03:23:49 -04:00
Genta Indra Winata
c7cdd7b4fd Create README.md for indobert-lite-base-p1 (#7182) 2020-09-18 03:22:32 -04:00
Genta Indra Winata
bfb9150b8f Create README.md for indobert-lite-large-p1 (#7184)
* Create README.md

* Update README.md
2020-09-18 03:22:11 -04:00
Genta Indra Winata
d193593403 Create README.md (#7183) 2020-09-18 03:21:54 -04:00
Genta Indra Winata
e65d846674 Create README.md (#7185) 2020-09-18 03:21:39 -04:00
Genta Indra Winata
e27d86d48d Create README.md for indobert-large-p2 model card (#7181) 2020-09-18 03:21:28 -04:00
Genta Indra Winata
881c0783e9 Create README.md for indobert-large-p1 model card (#7180) 2020-09-18 03:21:16 -04:00
Genta Indra Winata
e0d58a5c87 Create README.md (#7179) 2020-09-18 03:20:59 -04:00
Genta Indra Winata
1313a1d2a8 Create README.md for indobert-base-p2 (#7178) 2020-09-18 03:20:29 -04:00
tuner007
cf24f43e76 Create README.md (#7095)
Create model card for Pegasus QA
2020-09-18 03:19:45 -04:00
Sam Shleifer
67d9fc50d9 [s2s] remove double assert (#7223) 2020-09-17 18:32:31 -04:00
Stas Bekman
edbaad2c5c [model cards] fix metadata - 3rd attempt (#7218) 2020-09-17 16:57:06 -04:00
Stas Bekman
999a1c957a skip failing FSMT CUDA tests until investigated (#7220) 2020-09-17 16:53:14 -04:00
Stas Bekman
51c4adf54c [model cards] fix dataset yaml (#7216) 2020-09-17 15:29:39 -04:00
Sam Shleifer
a5638b2b3a [s2s] dynamic batch size with --max_tokens_per_batch (#7030) 2020-09-17 15:19:34 -04:00
Stas Bekman
efeab6a3f1 [s2s] run_eval/run_eval_search tweaks (#7192)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-17 14:26:38 -04:00
Stas Bekman
9c5bcab5b0 [model cards] fix yaml in cards (#7207) 2020-09-17 14:11:17 -04:00
Sohee Yang
e643a29722 Change to use relative imports in some files & Add python prompt symbols to example codes (#7202)
* Move 'from transformers' statements to relative imports in some files

* Add python prompt symbols in front of the example codes

* Reformat the code

* Add one missing space

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-17 12:30:45 -04:00
Stas Bekman
0fe6e435b6 [model cards] ported allenai Deep Encoder, Shallow Decoder models (#7153)
* [model cards] ported allenai Deep Encoder, Shallow Decoder models

* typo

* fix references

* add allenai/wmt19-de-en-6-6 model cards

* fill-in the missing info for the build script as provided by the searcher.
2020-09-17 17:58:49 +02:00
Stas Bekman
1eeb206bef [ported model] FSMT (FairSeq MachineTranslation) (#6940)
* ready for PR

* cleanup

* correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST

* fix

* perfectionism

* revert change from another PR

* odd, already committed this one

* non-interactive upload workaround

* backup the failed experiment

* store langs in config

* workaround for localizing model path

* doc clean up as in https://github.com/huggingface/transformers/pull/6956

* style

* back out debug mode

* document: run_eval.py --num_beams 10

* remove unneeded constant

* typo

* re-use bart's Attention

* re-use EncoderLayer, DecoderLayer from bart

* refactor

* send to cuda and fp16

* cleanup

* revert (moved to another PR)

* better error message

* document run_eval --num_beams

* solve the problem of tokenizer finding the right files when model is local

* polish, remove hardcoded config

* add a note that the file is autogenerated to avoid losing changes

* prep for org change, remove unneeded code

* switch to model4.pt, update scores

* s/python/bash/

* missing init (but doesn't impact the finetuned model)

* cleanup

* major refactor (reuse-bart)

* new model, new expected weights

* cleanup

* cleanup

* full link

* fix model type

* merge porting notes

* style

* cleanup

* have to create a DecoderConfig object to handle vocab_size properly

* doc fix

* add note (not a public class)

* parametrize

* - add bleu scores integration tests

* skip test if sacrebleu is not installed

* cache heavy models/tokenizers

* some tweaks

* remove tokens that aren't used

* more purging

* simplify code

* switch to using decoder_start_token_id

* add doc

* Revert "major refactor (reuse-bart)"

This reverts commit 226dad15ca6a9ef4e26178526e878e8fc5c85874.

* decouple from bart

* remove unused code #1

* remove unused code #2

* remove unused code #3

* update instructions

* clean up

* move bleu eval to examples

* check import only once

* move data+gen script into files

* reuse via import

* take less space

* add prepare_seq2seq_batch (auto-tested)

* cleanup

* recode test to use json instead of yaml

* ignore keys not needed

* use the new -y in transformers-cli upload -y

* [xlm tok] config dict: fix str into int to match definition (#7034)

* [s2s] --eval_max_generate_length (#7018)

* Fix CI with change of name of nlp (#7054)

* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last

* extending to support allen_nlp wmt models

- allow a specific checkpoint file to be passed
- more arg settings
- scripts for allen_nlp models

* sync with changes

* s/fsmt-wmt/wmt/ in model names

* s/fsmt-wmt/wmt/ in model names (p2)

* s/fsmt-wmt/wmt/ in model names (p3)

* switch to a better checkpoint

* typo

* make non-optional args such - adjust tests where possible or skip when there is no other choice

* consistency

* style

* adjust header

* cards moved (model rename)

* use best custom hparams

* update info

* remove old cards

* cleanup

* s/stas/facebook/

* update scores

* s/allen_nlp/allenai/

* url maps aren't needed

* typo

* move all the doc / build /eval generators to their own scripts

* cleanup

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* fix indent

* duplicated line

* style

* use the correct add_start_docstrings

* oops

* resizing can't be done with the core approach, due to 2 dicts

* check that the arg is a list

* style

* style

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-17 11:31:29 -04:00
Sylvain Gugger
492bb6aa48 Trainer multi label (#7191)
* Trainer accep multiple labels

* Missing import

* Fix dosctrings
2020-09-17 08:15:37 -04:00
RafaelWO
709745927b Transformer-XL: Remove unused parameters (#7087)
* Removed 'tgt_len' and 'ext_len' from Transfomer-XL

 * Some changes are still to be done

* Removed 'tgt_len' and 'ext_len' from Transfomer-XL (2)

 * Removed comments
 * Fixed quality

* Changed warning to info
2020-09-17 06:10:34 -04:00
Dhaval Taunk
c183d81e27 added multilabel text classification notebook using distilbert to community notebooks (#7201)
* added multilabel classification using distilbert notebook to community notebooks

* added multilabel classification using distilbert notebook to community notebooks
2020-09-17 05:58:57 -04:00
Stas Bekman
79111b77d2 remove deprecated flag (#7171)
```
/home/circleci/.local/lib/python3.6/site-packages/isort/main.py:915: UserWarning: W0501: The following deprecated CLI flags were used and ignored: --recursive!
  "W0501: The following deprecated CLI flags were used and ignored: "
```
2020-09-17 05:52:12 -04:00
Stas Bekman
0cdafbf7ec remove duplicated code (#7173) 2020-09-17 05:51:40 -04:00
Sam Shleifer
45b0b1ff2f [s2s] fix kwarg typo (#7196) 2020-09-16 21:58:57 -04:00
Sam Shleifer
0203ad43bc [s2s] distributed eval cleanup (#7186) 2020-09-16 15:38:37 -04:00
sgugger
3babef815c Formatting 2020-09-16 14:57:09 -04:00
Stas Bekman
42049b8e12 use the correct add_start_docstrings (#7174) 2020-09-16 14:40:35 -04:00
Stas Bekman
fdaf8ab349 [s2s run_eval] new features (#7109)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-16 13:59:57 -04:00
Antoine Louis
df165065c3 [model_cards] antoiloui/belgpt2 🇧🇪 (#7166)
* Create README.md

* Update README.md
2020-09-16 12:16:01 -04:00
Sylvain Gugger
108c9aefcc Update README (#7133)
* Rewrite and update README

* Typo and migration guide

* Apply suggestions from code review

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address Clem's comments

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-09-16 12:12:12 -04:00
Donna Choi
9e376e156a Add condition (#7161) 2020-09-16 09:15:10 -04:00
Stas Bekman
f8590c56e6 [doc] improve/expand the Parametrization section (#7156) 2020-09-16 08:45:50 -04:00
Stas Bekman
d3391c87fe build/eval/gen-card scripts for fsmt (#7155)
* build/eval/gen-card scripts for fsmt

* adjust for model renames
2020-09-16 08:41:26 -04:00
Xi Ye
08bfc1718a fix the warning message of overflowed sequence (#7151) 2020-09-16 07:40:57 -04:00
Julien Plu
af8425b749 Refactoring the TF activations functions (#7150)
* Refactoring the activations functions into a common file

* Apply style

* remove unused import

* fix tests

* Fix tests.
2020-09-16 07:03:47 -04:00
Stas Bekman
b00cafbde5 [docs] add testing documentation (#7101)
* [docs] add testing documentation

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* tweaks as suggested

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* tweaks

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* more tweaks

* suggestions from @LysandreJik

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-15 19:25:25 -04:00
Patrick von Platen
85ffda96fc fix encoder decoder kwargs (#7131) 2020-09-15 21:10:07 +02:00
Yih-Dar
4c62c6021a fix ZeroDivisionError and epoch counting (#7125)
* fix ZeroDivisionError and epoch counting

* Add test for num_train_epochs calculation in trainer.py

* Remove @require_non_multigpu for test_num_train_epochs_in_training
2020-09-15 11:51:50 -04:00
Patrick von Platen
7af2791d77 Create README.md 2020-09-15 16:47:36 +02:00
Sylvain Gugger
153ec2f154 Funnel model cards (#7147) 2020-09-15 10:40:57 -04:00
Sylvain Gugger
7186ca6240 Multi predictions trainer (#7126)
* Allow multiple outputs

* Formatting

* Move the unwrapping before metrics

* Fix typo

* Add test for non-supported config options
2020-09-15 10:27:24 -04:00
Pedro Lima
52d250f6aa [model_cards] pvl/labse_bert model card
From **Language-Agnostic BERT Sentence Embedding**

https://ai.googleblog.com/2020/08/language-agnostic-bert-sentence.html
2020-09-15 08:54:12 -04:00
tuner007
84d64805b0 Create README.md (#7097)
Model card for PEGASUS finetuned for paraphrasing task
2020-09-15 08:48:25 -04:00
Philip May
52bb7ccce5 German electra model card v3 update (#7089)
* changed eval table model order

* Update install

* update mc
2020-09-15 08:48:13 -04:00
Siddharth Jain
1a85299a5e Tiny typo fix (#7143) 2020-09-15 08:18:42 -04:00
Paul O'Leary McCann
e29c3f1b11 Add quotes to paths in MeCab arguments (#7142)
Without quotes directories with spaces in them will fail to be processed
correctly.
2020-09-15 19:04:50 +08:00
Yih-Dar
cb061e78e1 Fix TF Trainer loss calculation (#6998)
* create branch for issue #6968

* First attempt to fix incorrect tf trainer loss calculation

* Fix training loss in metric

* fix tf trainer evaluation loss

* apply count_instances_in_batch() for eval and test datasets

* prototype of using a new argument in trainer_tf.py to fix loss issue

* some renaming and fix, in particular for evaluation methods

* fix bugs to have a running version

* change to @staticmethod

* apply style
2020-09-15 05:41:00 -04:00
Stas Bekman
b0cbcdb05b [logging] remove no longer needed verbosity override (#7100) 2020-09-15 04:01:14 -04:00
Sylvain Gugger
2bf70e2150 Fix reproducible tests in Trainer (#7119)
* Fix reproducible tests in Trainer

* Deal with multiple GPUs
2020-09-15 03:32:44 -04:00
Sam Shleifer
9e89390ce1 [QOL] add signature for prepare_seq2seq_batch (#7108) 2020-09-14 20:33:08 -04:00
Sam Shleifer
33d479d2b2 [s2s] distributed eval in one command (#7124) 2020-09-14 15:57:56 -04:00
sgugger
206b78d485 Pin version of TF and torch 2020-09-14 14:08:51 -04:00
Kevin Canwen Xu
90cde2e938 Add Mirror Option for Downloads (#6679)
* Add Tuna Mirror for Downloads from China

* format fix

* Use preset instead of hardcoding URL

* Fix

* make style

* update the mirror option doc

* update the mirror
2020-09-14 23:50:22 +08:00
Antonio V Mendoza
e0e0675ac7 Demoing LXMERT with raw images by incorporating the FRCNN model for roi-pooled extraction and bounding-box predction on the GQA answer set. (#6986)
* adding demo

* Update examples/lxmert/requirements.txt

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update examples/lxmert/checkpoint.sh

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* added user input for .py demo

* updated model loading, data extrtaction, checkpoints, and lots of other automation

* adding normalizing for bounding boxes

* Update requirements.txt

* some optimizations for extracting data

* added data extracting file

* added data extraction file

* minor fixes to reqs and readme

* Style

* remove options

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-09-14 10:07:04 -04:00
sgugger
5636cbb25d Extra ) 2020-09-14 09:37:55 -04:00
Sylvain Gugger
ccc8e30c8a Clean up autoclass doc (#7081) 2020-09-14 09:26:41 -04:00
Stas Bekman
3ca1874ca4 [examples testing] restore code (#7099)
For some reason https://github.com/huggingface/transformers/pull/5512 re-added temp dir creation code that was removed by
https://github.com/huggingface/transformers/pull/6494 defeating the purpose of that PR for those tests.
2020-09-14 08:54:23 -04:00
Stas Bekman
4d39148419 fix deprecation warnings (#7033)
* fix deprecation warnings

* remove tests/test_tokenization_common.py's test_padding_to_max_length

* revert test_padding_to_max_length
2020-09-14 07:51:19 -04:00
Stas Bekman
576eec98e0 ignore FutureWarning in tests (#7079) 2020-09-14 07:50:51 -04:00
Bartosz Telenczuk
15d18e0307 fix link to paper (#7116) 2020-09-14 07:43:40 -04:00
Lysandre Debut
bb3106f741 Temporarily skip failing tests due to dependency change (#7118)
* Temporarily skip failing tests due to dependency change

* Remove trace
2020-09-14 07:42:13 -04:00
Sam Shleifer
0fab39695a [s2s distill] allow pegasus-12-12 (#7104) 2020-09-14 00:03:59 -04:00
Sam Shleifer
de9e297964 [s2s] distributed eval cleanup (#7110) 2020-09-13 23:40:38 -04:00
Sam Shleifer
54395d87a6 Update xsum length penalty to better values (#7107) 2020-09-13 20:48:47 -04:00
Sam Shleifer
e7f8d2ab64 [s2s] two stage run_distributed_eval.py (#7105) 2020-09-13 17:28:18 -04:00
Sam Shleifer
0ec63afec2 fix bug in pegasus converter (#7094) 2020-09-13 15:11:47 -04:00
Sam Shleifer
b76cb1c3df [s2s] run_eval supports --prefix clarg. (#6953) 2020-09-12 01:08:21 -04:00
李明浩
563ffb3dc3 Create README.md (#7066) 2020-09-11 15:21:05 -04:00
李明浩
1ad49cde3a Create README.md (#7067) 2020-09-11 15:20:54 -04:00
Sagor Sarker
4753816e39 added bangla-bert-base model card and also modified other model cards (#7071)
* added bangla-bert-base

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-11 15:17:25 -04:00
Suraj Patil
0a8c17d53c [T5Tokenizer] remove prefix_tokens (#7078) 2020-09-11 14:18:45 -04:00
Sylvain Gugger
4cbd50e611 Compute loss method (#7074) 2020-09-11 12:06:31 -04:00
Sylvain Gugger
ae736163d0 Add tests and fix various bugs in ModelOutput (#7073)
* Add tests and fix various bugs in ModelOutput

* Update tests/test_model_output.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-09-11 12:01:33 -04:00
Sylvain Gugger
e841b75dec Automate the lists in auto-xxx docs (#7061)
* More readable dict

* More nlp -> datasets

* Revert "More nlp -> datasets"

This reverts commit 3cd1883d226c63c4a686fc1fed35f2cd586ebe45.

* Automate the lists in auto-xxx docs

* More readable dict

* Revert "More nlp -> datasets"

This reverts commit 3cd1883d226c63c4a686fc1fed35f2cd586ebe45.

* Automate the lists in auto-xxx docs

* nlp -> datasets

* Fix new key
2020-09-11 10:42:09 -04:00
Sylvain Gugger
0054a48cdd Add dep on datasets (#7058) 2020-09-11 04:43:19 -04:00
Patrick von Platen
221d4c63a3 clean naming (#7068) 2020-09-11 09:57:53 +02:00
Stas Bekman
8fcbe486e1 these tests require non-multigpu env (#7059)
* these tests require non-multigpu env

* cleanup

* clarify
2020-09-10 18:52:55 -04:00
Sam Shleifer
77950c485a [wip/s2s] DistributedSortishSampler (#7056) 2020-09-10 15:23:44 -04:00
Sylvain Gugger
514486739c Fix CI with change of name of nlp (#7054)
* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last
2020-09-10 14:51:08 -04:00
Sam Shleifer
e9a2f772bc [s2s] --eval_max_generate_length (#7018) 2020-09-10 14:11:34 -04:00
Stas Bekman
df4594a9da [xlm tok] config dict: fix str into int to match definition (#7034) 2020-09-10 19:31:01 +02:00
Julien Chaumond
d6c08b07a0 [AutoTokenizer] Correct error message 2020-09-10 17:19:01 +02:00
Patrick von Platen
db38f7ce29 [BertGeneration, Docs] Fix another old name in docs (#7050)
* correct docs for bert generation

* upload
2020-09-10 17:12:33 +02:00
Patrick von Platen
3bd95b0faf correct docs for bert generation (#7048) 2020-09-10 17:08:40 +02:00
Patrick von Platen
eb2feb5d90 Create README.md 2020-09-10 17:05:50 +02:00
Ashwin Geet Dsa
66a5a6fda8 fix to ensure that returned tensors after the tokenization is Long (#7039)
* fix to ensure that returned tensors after the tokenization is Long

* fix to ensure that returned tensors after the tokenization is Long

Co-authored-by: Ashwin Geet Dsa <adsa@grvingt-6.nancy.grid5000.fr>
2020-09-10 11:04:03 -04:00
Patrick von Platen
9ccdb1d517 Update README.md 2020-09-10 17:01:19 +02:00
Patrick von Platen
60698936fc Create README.md 2020-09-10 17:00:10 +02:00
Patrick von Platen
e0c3bc8ee0 Create README.md 2020-09-10 16:51:15 +02:00
Patrick von Platen
c356b9878d Create README.md 2020-09-10 16:45:44 +02:00
Patrick von Platen
5afd3f6196 Create README.md 2020-09-10 16:44:47 +02:00
Sylvain Gugger
15a189049e Add TF Funnel Transformer (#7029)
* Add TF Funnel Transformer

* Proper dummy input

* Formatting

* Update src/transformers/modeling_tf_funnel.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* One review comment forgotten

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-10 10:41:56 -04:00
Patrick von Platen
7fd1febf38 Add "Leveraging Pretrained Checkpoints for Generation" Seq2Seq models. (#6594)
* add conversion script

* improve conversion script

* make style

* add tryout files

* fix

* update

* add causal bert

* better names

* add tokenizer file as well

* finish causal_bert

* fix small bugs

* improve generate

* change naming

* renaming

* renaming

* renaming

* remove leftover files

* clean files

* add fix tokenizer

* finalize

* correct slow test

* update docs

* small fixes

* fix link

* adapt check repo

* apply sams and sylvains recommendations

* fix import

* implement Lysandres recommendations

* fix logger warn
2020-09-10 16:40:51 +02:00
Sylvain Gugger
d1691d90e5 Samell fixed in tf template (#7044) 2020-09-10 10:36:02 -04:00
Patrick von Platen
63e539459d Update README.md 2020-09-10 16:34:28 +02:00
Patrick von Platen
054db06b1b Create README.md 2020-09-10 16:30:46 +02:00
Lysandre Debut
b482ad474a Fix template (#7040) 2020-09-10 08:45:52 -04:00
Yu Liu
762cba3bda Albert pretrain datasets/ datacollator (#6168)
* add dataset for albert pretrain

* datacollator for albert pretrain

* naming, comprehension, file reading change

* data cleaning is no needed after this modification

* delete prints

* fix a bug

* file structure change

* add tests for albert datacollator

* remove random seed

* add back len and get item function

* sample file for testing and test code added

* format change for black

* more format change

* Style

* var assignment issue resolve

* add back wrongly deleted DataCollatorWithPadding in init file

* Style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-09-10 07:56:29 -04:00
Johann C. Rocholl
49e9be0639 Fix confusing warnings during TF2 import from PyTorch (#6623)
1. Swapped missing_keys and unexpected_keys.

2. Copy&paste error caused these warnings to say "from TF 2.0" when it's actually "from PyTorch".
2020-09-10 05:31:59 -04:00
Stas Bekman
4ee1053dcf add -y to bypass prompt for transformers-cli upload (#7035) 2020-09-10 04:58:29 -04:00
Patrick von Platen
76818cc4c6 Create README.md 2020-09-09 16:26:35 +02:00
Lysandre Debut
15478c1287 Batch encore plus and overflowing tokens fails when non existing overflowing tokens for a sequence (#6677)
* Patch and test

* Fix tests
2020-09-09 06:55:17 -04:00
Henry Dashwood
9fd11bf1a8 replace torch.triu with onnx compatible code (#6929) 2020-09-09 04:56:40 -04:00
Julien Chaumond
ed71c21d6a [from_pretrained] Allow tokenizer_type ≠ model_type (#6995) 2020-09-09 04:22:59 -04:00
Stas Bekman
03e363f9ae [generation] consistently add eos tokens (#6982)
Currently beam search returns inconsistent outputs - if hypos have different lengths we get eos, if they are the same - we don't.

This PR makes the output consistent.

Also why not also replace:

```
            if sent_lengths[i] < max_length:
                decoded[i, sent_lengths[i]] = eos_token_id
```
with:
```
            decoded[i, sent_lengths[i]] = eos_token_id
```
Shouldn't eos always be there? If the data gets truncated, the caller needs to user a larger `max_length`.

Please correct me if my logic is flawed.
2020-09-09 04:08:36 -04:00
Stas Bekman
d0963486c1 adding TRANSFORMERS_VERBOSITY env var (#6961)
* introduce TRANSFORMERS_VERBOSITY env var + test + test helpers

* cleanup

* remove helper function
2020-09-09 04:08:01 -04:00
Sam Shleifer
f0fc0aea6b pegasus.rst: fix expected output (#7017) 2020-09-08 13:29:16 -04:00
Patrick von Platen
120176ea29 [Longformer] Fix longformer documentation (#7016)
* fix longformer

* allow position ids to not be initialized
2020-09-08 18:51:28 +02:00
Lysandre Debut
5c4eb4b1ac Fixing FLOPS merge by checking if torch is available (#7013)
* Should check if `torch` is available

* fixed samples_count error, distributed_concat arguments

* style

* Import torch at beginning of file

Co-authored-by: TevenLeScao <teven.lescao@gmail.com>
2020-09-08 10:51:58 -04:00
Teven
01d340adfa Floating-point operations logging in trainer (#6768)
* neFLOs calculation, logging, and reloading (#1)

* testing distributed consecutive batches

* fixed AttributeError from DataParallel

* removed verbosity

* rotate with use_mtime=True

* removed print

* fixed interaction with gradient accumulation

* indent formatting

* distributed neflo counting

* fixed typo

* fixed typo

* mean distributed losses

* exporting log history

* moved a few functions

* floating_point_ops clarification for transformers with parameter-reuse

* code quality

* double import

* made flo estimation more task-agnostic

* only logging flos if computed

* code quality

* unused import

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Sylvain review

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* black

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-08 10:00:56 -04:00
Sylvain Gugger
d155b38d6e Funnel transformer (#6908)
* Initial model

* Fix upsampling

* Add special cls token id and test

* Formatting

* Test and fist FunnelTokenizerFast

* Common tests

* Fix the check_repo script and document Funnel

* Doc fixes

* Add all models

* Write doc

* Fix test

* Initial model

* Fix upsampling

* Add special cls token id and test

* Formatting

* Test and fist FunnelTokenizerFast

* Common tests

* Fix the check_repo script and document Funnel

* Doc fixes

* Add all models

* Write doc

* Fix test

* Fix copyright

* Forgot some layers can be repeated

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/modeling_funnel.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* Update src/transformers/modeling_funnel.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments

* Update src/transformers/modeling_funnel.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Slow integration test

* Make small integration test

* Formatting

* Add checkpoint and separate classification head

* Formatting

* Expand list, fix link and add in pretrained models

* Styling

* Add the model in all summaries

* Typo fixes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-08 08:08:08 -04:00
Stuart Mesham
25afb4ea50 fixed trainer tr_loss memory leak (#6999)
* fixed trainer tr_loss memory leak

* detached returned training loss from computation graph in the Trainer class' training_step() method

* Revert "fixed trainer tr_loss memory leak"

This reverts commit 47226e4e
2020-09-08 08:07:33 -04:00
Manuel Romero
1b76936d1a Fix typo (#6994) 2020-09-08 04:22:57 -04:00
Philipp Schmid
8235426ee8 New Community NB "Fine tune GPT-2 with Trainer class" (#7005) 2020-09-08 03:42:20 -04:00
Stas Bekman
c18f5916a0 typo (#7001)
apologies for the tiny PRs, just sending those as I find them.
2020-09-08 01:22:20 -04:00
Mehrdad Farahani
60fc03290b README for HooshvareLab/bert-fa-base-uncased (#6990)
ParsBERT v2.0 is a fine-tuned and vocab-reconstructed version of ParsBERT, and it's able to be used in other scopes!

It includes these features:
- We added some unused-vocab for use in summarization and other scopes.
- We fine-tuned the model on vast styles of writing in the Persian language.
2020-09-07 16:43:50 -04:00
Jangwon Park
90ec78b514 Add missing arguments for BertWordPieceTokenizer (#5810) 2020-09-07 08:35:41 -04:00
Lysandre Debut
77cd0e13d2 Conversion scripts shouldn't have relative imports (#6991) 2020-09-07 08:31:06 -04:00
Lysandre
1650130b0f Remove misleading docstring 2020-09-07 14:16:59 +02:00
Stas Bekman
159ef07e4c match CI's version of flake8 (#6941)
my flake8 wasn't up-to-date enough `make quality` wasn't reporting the same things CI did - this PR adds the actual required version.

Thinking more about some of these minimal versions - CI will always install afresh and thus will always run the latest version. Is there a way to tell pip to always install the latest versions of certain dependencies on `pip install -i ".[dev]"`, rather than hardcoding the minimals which quickly become outdated?
2020-09-07 08:12:25 -04:00
Abed khooli
e9d0d4c75c Create README.md (#6974) 2020-09-07 07:31:22 -04:00
Stas Bekman
848fbe1e35 [gen utils] missing else case (#6980)
* [gen utils] missing else case

1. `else` is missing - I hit that case while porting a model. Probably needs to assert there?
2. also the comment on top seems to be outdated (just vocab_size is being set there)

* typo
2020-09-07 07:28:06 -04:00
tznurmin
f7e80721eb Fixed the default number of attention heads in Reformer Configuration (#6973) 2020-09-07 12:12:22 +02:00
Richard Bownes
e20d8895bd Create README.md model card (#6964)
* Create README.md

* Add some custom prompts

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-07 06:01:40 -04:00
Stas Bekman
b4a9c95f1b [testing] add dependency: parametrize (#6958)
unittest doesn't support pytest's super-handy `@pytest.mark.parametrize`, I researched and there are many proposed workarounds, most tedious at best. If we include https://pypi.org/project/parameterized/ in dev dependencies - it will provide a very easy to write parameterization in tests. Same as pytest's fixture, plus quite a few other ways. 

Example:
```
from parameterized import parameterized
@parameterized([
    (2, 2, 4),
    (2, 3, 8),
    (1, 9, 1),
    (0, 9, 0),
])
def test_pow(base, exponent, expected):
   assert_equal(math.pow(base, exponent), expected)
```
(extra `self`var if inside a test class)

To remind the pytest style is slightly different:
```
    @pytest.mark.parametrize("test_input,expected", [("3+5", 8), ("2+4", 6), ("6*9", 42)])
    def test_eval(test_input, expected):
```
More examples here: https://pypi.org/project/parameterized

May I suggest that it will make it much easier to write some types of tests?
2020-09-07 05:50:18 -04:00
Stas Bekman
acfaad74ab [docstring] missing arg (#6933)
* [docstring] missing arg

add the missing `tie_word_embeddings` entry

* cleanup

* Update src/transformers/configuration_reformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-07 05:36:16 -04:00
Stas Bekman
c3317e1f80 typo (#6959)
there is no var `decoder_input_ids`, but there is `input_ids` for decoder :)
2020-09-07 05:16:24 -04:00
Julien Chaumond
10c6f94adc [model_card] register jplu/tf-xlm-r-ner-40-lang as multilingual 2020-09-07 05:03:40 -04:00
Lysandre Debut
9ef9c39728 Cannot index None (#6984) 2020-09-07 04:56:08 -04:00
Sylvain Gugger
08de989a0a Trainer with grad accum (#6930)
* Add warning for gradient accumulation

* Formatting
2020-09-07 04:54:00 -04:00
Julien Chaumond
d4aa7284c8 [model_card] jplu/tf-xlm-r-ner-40-lang: Fix link
cc @jplu
2020-09-07 04:33:15 -04:00
Boris Dayma
995a958dd1 feat: allow prefix for any generative model (#5885)
* feat: allow padding_text for any generative model

* docs(pipelines.py): correct typo

* Update src/transformers/pipelines.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* feat: rename padding_text to prefix

* fix: cannot tokenize empty text

* fix: pass prefix arg to pipeline

* test: add prefix to text-generetation pipeline

* style: fix style

* style: clean code and variable name more explicit

* set arg docstring to optional

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-07 03:03:45 -04:00
Sam Shleifer
ce37be9d94 [s2s] warn if --fp16 for torch 1.6 (#6977) 2020-09-06 20:41:29 -04:00
Patrick von Platen
f72fe1f31a Correct wrong spacing in README 2020-09-06 13:26:56 +02:00
Steven Liu
d31031f603 create model card for astroGPT (#6960)
* create model card for astroGPT

* Hotlink to actual image file

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-05 12:50:19 -04:00
Naveenkhasyap
56742e9f61 Create Readme.MD for KanBERTo (#6942)
* Create Readme.MD for KanBERTo

KanBERTo language model readme for Kannada language.

* Update model_cards/Naveen-k/KanBERTo/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-04 18:24:32 -04:00
Stas Bekman
48ff6d5109 [doc] remove the implied defaults to :obj:None, s/True/ :obj:`True/, etc. (#6956)
* remove the implied defaults to :obj:`None`

* fix bug in the original

* replace to :obj:`True`, :obj:`False`
2020-09-04 18:22:25 -04:00
Stas Bekman
eff274d629 typo (#6952) 2020-09-04 16:14:37 -04:00
Sam Shleifer
a4fc0c80b1 [s2s] run_eval.py parses generate_kwargs (#6948) 2020-09-04 14:19:31 -04:00
Sam Shleifer
6078b12098 [s2s] distill: --normalize_hidden --supervise_forward (#6834) 2020-09-04 14:05:56 -04:00
Stas Bekman
c5d43a872f [docstring] misc arg doc corrections (#6932)
* correct bool types

fix docstring s/int/bool/

* fix description

* fix num_labels to match reality
2020-09-04 10:09:42 -04:00
Patrick von Platen
e3990d137a fix (#6946) 2020-09-04 16:08:54 +02:00
Yih-Dar
a75e319819 Fix mixed precision issue in TF DistilBert (#6915)
* Remove hard-coded uses of float32 to fix mixed precision use in TF Distilbert

* fix style

* fix gelu dtype issue in TF Distilbert

* fix numeric overflow while using half precision
2020-09-04 14:29:57 +02:00
Sam Shleifer
e95d262f25 [s2s] support early stopping based on loss, rather than rouge (#6927) 2020-09-03 17:31:35 -04:00
Sam Shleifer
207ed8cb78 [s2s] use --eval_beams command line arg (#6926) 2020-09-03 12:42:09 -04:00
krfricke
0f360d3d1c move wandb/comet logger init to train() to allow parallel logging (#6850)
* move wandb/comet logger init to train() to allow parallel logging

* Setup wandb/comet loggers on first call to log()
2020-09-03 11:49:14 -04:00
Sam Shleifer
39ed68d597 [s2s] allow task_specific_params=summarization_xsum (#6923) 2020-09-03 11:11:40 -04:00
Sam Shleifer
5a318f075a [s2s]: script to convert pl checkpoints to hf checkpoints (#6911)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-03 09:47:00 -04:00
brett koonce
b8e4906c97 tweak tar command in readme (#6919) 2020-09-03 09:29:01 -04:00
Stefan Engl
a66db7d828 Corrected link to paper (#6905) 2020-09-03 09:23:42 -04:00
David Mark Nemeskey
55d61ce8d6 Added a link to the thesis. (#6906) 2020-09-03 09:20:03 -04:00
abdullaholuk-loodos
653a79ccad Loodos model cards had errors on "Usage" section. It is fixed. Also "electra-base-turkish-uncased" model removed from s3 and re-uploaded as "electra-base-turkish-uncased-discriminator". Its README added. (#6921)
Co-authored-by: Abdullah Oluk <abdullaholuk123@gmail.com>
2020-09-03 09:13:43 -04:00
Julien Chaumond
5a3aec90a9 [model_card] link to correctly cased piaf dataset
cc @psorianom @rachelker
2020-09-03 08:57:32 -04:00
Sylvain Gugger
722b5807d8 Template updates (#6914) 2020-09-03 04:14:58 -04:00
Antonio V Mendoza
ea2c6f1afc Adding the LXMERT pretraining model (MultiModal languageXvision) to HuggingFace's suite of models (#5793)
* added template files for LXMERT and competed the configuration_lxmert.py

* added modeling, tokization, testing, and finishing touched for lxmert [yet to be tested]

* added model card for lxmert

* cleaning up lxmert code

* Update src/transformers/modeling_lxmert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_tf_lxmert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_tf_lxmert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_lxmert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* tested torch lxmert, changed documtention, updated outputs, and other small fixes

* Update src/transformers/convert_pytorch_checkpoint_to_tf2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/convert_pytorch_checkpoint_to_tf2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/convert_pytorch_checkpoint_to_tf2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* renaming, other small issues, did not change TF code in this commit

* added lxmert question answering model in pytorch

* added capability to edit number of qa labels for lxmert

* made answer optional for lxmert question answering

* add option to return hidden_states for lxmert

* changed default qa labels for lxmert

* changed config archive path

* squshing 3 commits: merged UI + testing improvments + more UI and testing

* changed some variable names for lxmert

* TF LXMERT

* Various fixes to LXMERT

* Final touches to LXMERT

* AutoTokenizer order

* Add LXMERT to index.rst and README.md

* Merge commit test fixes + Style update

* TensorFlow 2.3.0 sequential model changes variable names

Remove inherited test

* Update src/transformers/modeling_tf_pytorch_utils.py

* Update docs/source/model_doc/lxmert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/lxmert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_lxmert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added suggestions

* Fixes

* Final fixes for TF model

* Fix docs

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-03 04:02:25 -04:00
Puneetha Pai
4ebb52afdb test_tf_common: remove un_used mixin class parameters (#6866) 2020-09-02 10:54:40 -04:00
Stas Bekman
e71f32c0ef [testing] fix ambiguous test (#6898)
Since `generate()` does:
```
        num_beams = num_beams if num_beams is not None else self.config.num_beams
```
This test fails if `model.config.num_beams > 1` (which is the case in the model I'm porting).

This fix makes the test setup unambiguous by passing an explicit `num_beams=1` to `generate()`.

Thanks.
2020-09-02 16:18:17 +02:00
Sylvain Gugger
8f2723caf0 Output attention takes an s (#6903)
* Fix output_attention -> output_attentions

* Formatting

* One unsaved file
2020-09-02 08:11:45 -04:00
Yohei Tamura
485da7222f fix error class instantiation (#6634) 2020-09-02 07:36:32 -04:00
Suraj Patil
4230d30f77 [pipelines] Text2TextGenerationPipeline (#6744)
* add Text2TextGenerationPipeline

* remove max length warning

* remove comments

* remove input_length

* fix typo

* add tests

* use TFAutoModelForSeq2SeqLM

* doc

* typo

* add the doc below TextGenerationPipeline

* doc nit

* style

* delete comment
2020-09-02 07:34:35 -04:00
Prajjwal Bhargava
6b24281229 fix typo in comments (#6838) 2020-09-02 06:55:37 -04:00
Stas Bekman
7351ef83c1 [doc] typos (#6867)
* [doc] typos

fixed typos

* Update README.md
2020-09-02 06:51:51 -04:00
Harry Wang
ee1bff06f8 minor docs grammar fixes (#6889) 2020-09-02 06:45:19 -04:00
Patrick von Platen
8abd7f69fc fix warning for position ids (#6884) 2020-09-02 06:44:51 -04:00
Parthe Pandit
7cb0572c64 Update modeling_bert.py (#6897)
outptus -> outputs in example of BertForPreTraining
2020-09-02 06:39:01 -04:00
David Mark Nemeskey
e3c55ceb8d Model card for huBERT (#6893)
* Create README.md

Model card for huBERT.

* Update README.md

lowercase h

* Update model_cards/SZTAKI-HLT/hubert-base-cc/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-02 04:50:10 -04:00
Patrick von Platen
1889e96c8c fix QA example for PT (#6890) 2020-09-02 09:53:09 +02:00
Julien Chaumond
d822ab636b [model_cards] Fix file path for flexudy/t5-base-multi-sentence-doctor 2020-09-02 00:02:40 +02:00
Rohan Rajpal
ad5fb33c9a Create README.md (#6598) 2020-09-01 17:59:15 -04:00
Rohan Rajpal
f9dadcd85b Create README.md (#6602) 2020-09-01 17:58:43 -04:00
Igli Manaj
f5d69c75f7 Update multilingual passage rereanking model card (#6788)
Fix range of possible score, add inference .
2020-09-01 17:56:19 -04:00
Tom Grek
5d820f3ca6 Model card for primer/BART-Squad2 (#6801) 2020-09-01 17:52:32 -04:00
zolekode
8b884dadc6 added model card for flexudys t5 model (#6759)
Co-authored-by: zolekode <pascal.zoleko@fau.de>
2020-09-01 17:38:55 -04:00
hakan
bff6d517cd loodos turkish model cards added (#6840) 2020-09-01 17:35:24 -04:00
Manuel Romero
502d194b95 Create README.md (#6887)
Add language meta attribute
2020-09-01 17:09:10 -04:00
Manuel Romero
d082edf216 Create README.md (#6888)
Add language meta attribute
2020-09-01 17:09:02 -04:00
Abed khooli
dacbee9a50 Create README.md (#6886)
* Create README.md

model card for  akhooli/xlm-r-large-arabic-sent

* Update model_cards/akhooli/xlm-r-large-arabic-sent/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-01 17:06:15 -04:00
Abed khooli
e2971e61bd Create README.md (#6885) 2020-09-01 16:57:48 -04:00
Patrick von Platen
4d1a3ffde8 [EncoderDecoder] Add xlm-roberta to encoder decoder (#6878)
* finish xlm-roberta

* finish docs

* expose XLMRobertaForCausalLM
2020-09-01 21:56:39 +02:00
Patrick von Platen
311992630c Create README.md (#6883)
* Create README.md

* Update README.md
2020-09-01 19:24:45 +02:00
Jin Young (Daniel) Sohn
21d719238c Add cache_dir to save features TextDataset (#6879)
* Add cache_dir to save features TextDataset

This is in case the dataset is in a RO filesystem, for which is the case
in tests (GKE TPU tests).

* style
2020-09-01 11:42:17 -04:00
Lysandre Debut
1461aac8d7 Update docs stable version 2020-09-01 11:02:24 -04:00
Lysandre
3726754a6c v3.1.0 documentation 2020-09-01 14:39:07 +02:00
Lysandre
4b3ee9cbc5 Release: v3.1.0 2020-09-01 14:27:52 +02:00
Patrick von Platen
afc4ece462 [Generate] Facilitate PyTorch generate using ModelOutputs (#6735)
* fix generate for GPT2 Double Head

* fix gpt2 double head model

* fix  bart / t5

* also add for no beam search

* fix no beam search

* fix encoder decoder

* simplify t5

* simplify t5

* fix t5 tests

* fix BART

* fix transfo-xl

* fix conflict

* integrating sylvains and sams comments

* fix tf past_decoder_key_values

* fix enc dec test
2020-09-01 12:38:25 +02:00
Funtowicz Morgan
397f819615 Restore PaddingStrategy.MAX_LENGTH on QAPipeline while no v2. (#6875)
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-09-01 05:35:35 -04:00
Sam Shleifer
a32d85f0d4 delete reinit (#6862) 2020-09-01 03:43:27 -04:00
Sylvain Gugger
d5f1ffa0d8 Logging doc (#6852)
* Add logging doc

* Foamtting

* Update docs/source/main_classes/logging.rst

* Update src/transformers/utils/logging.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-01 03:16:34 -04:00
Stas Bekman
59a6a32a61 add a final report to all pytest jobs (#6861)
we had it added for one job, please add it to all pytest jobs - we need the output of what tests were run to debug the codecov issue. thank you!
2020-08-31 22:47:23 -04:00
Sam Shleifer
431ab19d7a [fix] typo in available in helper function (#6859) 2020-08-31 17:59:34 -04:00
Sam Shleifer
367235ee52 Bart can make decoder_input_ids from labels (#6758) 2020-08-31 16:16:47 -04:00
Sam Shleifer
b9772897ec [s2s] command line args for faster val steps (#6833) 2020-08-31 16:16:10 -04:00
Sam Shleifer
8af1970e45 Fix marian slow test (#6854) 2020-08-31 16:10:43 -04:00
Funtowicz Morgan
bbdba0a76d Update ONNX notebook to include section on quantization. (#6831)
* Update ONNX notebook to include section on quantization.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing ONNX team comments
2020-08-31 21:28:00 +02:00
Sylvain Gugger
a59bcefbb1 Split hp search methods (#6857)
* Split the run_hp_search by backend

* Unused import
2020-08-31 15:16:39 -04:00
krfricke
23f9611c16 Add checkpointing to Ray Tune HPO (#6747)
* Introduce HPO checkpointing for PBT

* Moved checkpoint saving

* Fixed checkpoint subdir pass

* Fixed style

* Enable/disable checkpointing, check conditions for various tune schedulers incl. PBT

* Adjust number of GPUs to number of jobs

* Avoid mode pickling in ray

* Move hp search to integrations
2020-08-31 14:38:46 -04:00
Sam Shleifer
61b7ba93f5 Marian distill scripts + integration test (#6799) 2020-08-31 13:48:26 -04:00
Jin Young (Daniel) Sohn
02d09c8fcc Only access loss tensor every logging_steps (#6802)
* Only access loss tensor every logging_steps

* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards

* Fix style (#6803)

* t5 model should make decoder_attention_mask (#6800)

* [s2s] Test hub configs in self-scheduled CI (#6809)

* [s2s] round runtime in run_eval (#6798)

* Pegasus finetune script: add --adafactor (#6811)

* [bart] rename self-attention -> attention (#6708)

* [tests] fix typos in inputs (#6818)

* Fixed open in colab link (#6825)

* Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827)

* BR_BERTo model card (#6793)

* clearly indicate shuffle=False (#6312)

* Clarify shuffle

* clarify shuffle

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>

* [s2s README] Add more dataset download instructions (#6737)

* Style

* Patch logging issue

* Set default logging level to `WARNING` instead of `INFO`

* TF Flaubert w/ pre-norm (#6841)

* Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644)

* add datacollator and dataset for next sentence prediction task

* bug fix (numbers of special tokens & truncate sequences)

* bug fix (+ dict inputs support for data collator)

* add padding for nsp data collator; renamed cached files to avoid conflict.

* add test for nsp data collator

* Style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* Fix in Adafactor docstrings (#6845)

* Fix resuming training for Windows (#6847)

* Only access loss tensor every logging_steps

* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards

* comments

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Thomas Ashish Cherian <6967017+PandaWhoCodes@users.noreply.github.com>
Co-authored-by: Zane Lim <zyuanlim@gmail.com>
Co-authored-by: Rodolfo De Nadai <rdenadai@gmail.com>
Co-authored-by: xujiaze13 <37360975+xujiaze13@users.noreply.github.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Huang Lianzhe <hlz@pku.edu.cn>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-31 11:35:51 -04:00
Sylvain Gugger
c48546c7f7 Fix resuming training for Windows (#6847) 2020-08-31 11:02:30 -04:00
Sylvain Gugger
d2f9cb838e Fix in Adafactor docstrings (#6845) 2020-08-31 10:52:47 -04:00
Huang Lianzhe
2de7ee0385 Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644)
* add datacollator and dataset for next sentence prediction task

* bug fix (numbers of special tokens & truncate sequences)

* bug fix (+ dict inputs support for data collator)

* add padding for nsp data collator; renamed cached files to avoid conflict.

* add test for nsp data collator

* Style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-31 08:25:00 -04:00
Lysandre Debut
895d394669 TF Flaubert w/ pre-norm (#6841) 2020-08-31 04:53:20 -04:00
Lysandre
4561f05c5f Set default logging level to WARNING instead of INFO 2020-08-31 09:56:25 +02:00
Lysandre
05c3214153 Patch logging issue 2020-08-31 09:37:08 +02:00
Sam Shleifer
dfa10a41ba [s2s README] Add more dataset download instructions (#6737) 2020-08-30 16:29:24 -04:00
xujiaze13
32fe44086c clearly indicate shuffle=False (#6312)
* Clarify shuffle

* clarify shuffle

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-30 19:26:10 +08:00
Rodolfo De Nadai
0eecaceac7 BR_BERTo model card (#6793) 2020-08-30 19:02:46 +08:00
Zane Lim
d176aaad7f Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827) 2020-08-30 18:21:49 +08:00
Thomas Ashish Cherian
a5847619e3 Fixed open in colab link (#6825) 2020-08-30 18:21:00 +08:00
Stas Bekman
563485bf95 [tests] fix typos in inputs (#6818) 2020-08-30 18:19:57 +08:00
Sam Shleifer
22933e661f [bart] rename self-attention -> attention (#6708) 2020-08-29 18:03:08 -04:00
Sam Shleifer
0f58903bb6 Pegasus finetune script: add --adafactor (#6811) 2020-08-29 17:43:32 -04:00
Sam Shleifer
ac47458a02 [s2s] round runtime in run_eval (#6798) 2020-08-29 17:36:31 -04:00
Sam Shleifer
5ab21b072f [s2s] Test hub configs in self-scheduled CI (#6809) 2020-08-28 17:05:52 -04:00
Sam Shleifer
3cac867fac t5 model should make decoder_attention_mask (#6800) 2020-08-28 15:22:33 -04:00
Sam Shleifer
20f7786453 Fix style (#6803) 2020-08-28 15:02:25 -04:00
Sam Shleifer
9336086ab5 prepare_seq2seq_batch makes labels/ decoder_input_ids made later. (#6654)
* broken test

* batch parity

* tests pass

* boom boom

* boom boom

* split out bart tokenizer tests

* fix tests

* boom boom

* Fixed dataset bug

* Fix marian

* Undo extra

* Get marian working

* Fix t5 tok tests

* Test passing

* Cleanup

* better assert msg

* require torch

* Fix mbart tests

* undo extra decoder_attn_mask change

* Fix import

* pegasus tokenizer can ignore src_lang kwargs

* unused kwarg test cov

* boom boom

* add todo for pegasus issue

* cover one word translation edge case

* Cleanup

* doc
2020-08-28 11:15:17 -04:00
RafaelWO
cb276b41de Transformer-XL: Improved tokenization with sacremoses (#6322)
* Improved tokenization with sacremoses

 * The TransfoXLTokenizer is now using sacremoses for tokenization
 * Added tokenization of comma-separated and floating point numbers.
 * Removed prepare_for_tokenization() from tokenization_transfo_xl.py because punctuation is handled by sacremoses
 * Added corresponding tests
 * Removed test comapring TransfoXLTokenizer and TransfoXLTokenizerFast
 * Added deprecation warning to TransfoXLTokenizerFast

* isort change

Co-authored-by: Teven <teven.lescao@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-08-28 09:56:17 -04:00
Ahmed Elnaggar
930153e7d2 Add ProtBert model card (#6764) 2020-08-28 12:12:28 +08:00
Stas Bekman
743d131d76 [style] set the minimal required version for black (#6784)
`make style` with `black` < 20.8b1 is a no go (in case some other package forced a lower version) - so make it explicit to avoid confusion
2020-08-28 11:38:09 +08:00
Sam Shleifer
fb78a90d6a PL: --adafactor option (#6776) 2020-08-27 22:19:46 -04:00
Stas Bekman
92ac2fa7d1 [transformers-cli] fix logger getter (#6777) 2020-08-27 20:01:17 -04:00
Lysandre
42fddacd1c Format 2020-08-27 18:31:51 +02:00
Stas Bekman
70fccc5cf3 new Makefile target: docs (#6510)
* [doc] multiple corrections to "Summary of the tasks"

* add a new "docs" target to validate docs and document it

* fix mixup
2020-08-27 12:25:16 -04:00
Stas Bekman
dbfe34f2f5 [test schedulers] adjust to test the first step's reading (#6429)
* [test schedulers] small improvement

* cleanup
2020-08-27 12:23:28 -04:00
Stas Bekman
e6b811f0a7 [testing] replace hardcoded paths to allow running tests from anywhere (#6523)
* [testing] replace hardcoded paths to allow running tests from anywhere

* fix the merge conflict
2020-08-27 12:22:18 -04:00
Sam Shleifer
9d1b4db2aa add nlp install (#6767) 2020-08-27 11:08:14 -04:00
Tom Grek
c225e872ed Fix it to work with BART (#6756) 2020-08-27 09:04:50 -04:00
Lysandre
0d2c111a0c Format 2020-08-27 14:56:47 +02:00
Julien Plu
6f289dc97a Fix the TF Trainer gradient accumulation and the TF NER example (#6713)
* Align TF NER example over the PT one

* Fix Dataset call

* Fix gradient accumulation training

* Apply style

* Address Sylvain's comments

* Address Sylvain's comments

* Apply style
2020-08-27 08:45:34 -04:00
Lysandre Debut
41aa2b4ef1 Adafactor docs (#6765) 2020-08-27 05:16:50 -04:00
Nikolai Yakovenko
971d1802d0 Add AdaFactor optimizer from fairseq (#6722)
* AdaFactor optimizer ported from fairseq. Tested for T5 finetuning and MLM -- reduced memory consumption compared to ADAM.

* update PR fixes, add basic test

* bug -- incorrect params in test

* bugfix -- import Adafactor into test

* bugfix -- removed accidental T5 include

* resetting T5 to master

* bugfix -- include Adafactor in __init__

* longer loop for adafactor test

* remove double error class declare

* lint

* black

* isort

* Update src/transformers/optimization.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* single docstring

* Cleanup docstring

Co-authored-by: Nikolai Y <nikolai.yakovenko@point72.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-27 04:58:13 -04:00
Sam Shleifer
4bd7be9a42 s2s distillation uses AutoModelForSeqToSeqLM (#6761) 2020-08-26 23:25:11 -04:00
Ahmed Elnaggar
05e7150a53 create ProtBert-BFD model card. (#6724) 2020-08-27 02:19:19 +02:00
Sam Shleifer
61518e2df3 [s2s] run_eval.py QOL improvements and cleanup(#6746) 2020-08-26 18:59:20 -04:00
Igli Manaj
434936f34a Model Card for Multilingual Passage Reranking BERT (#6755) 2020-08-26 18:00:27 -04:00
Joe Davison
10a34501f1 add __init__.py to utils (#6754) 2020-08-26 23:51:10 +02:00
Ali Safaya
61b9ed8074 Model card for kuisailab/albert-large-arabic (#6730)
* Create README.md

* Update README.md
2020-08-26 17:27:56 -04:00
Ali Safaya
8e0d51e4f2 Model card for kuisailab/albert-xlarge-arabic (#6731)
* Create README.md

* Update README.md
2020-08-26 17:27:42 -04:00
Ali Safaya
70c96a10e9 Model card for kuisailab/albert-base-arabic (#6729)
* Create README.md

* Update README.md
2020-08-26 17:27:34 -04:00
Sagor Sarker
cc4ba79f68 added model card for codeswitch-spaeng-sentiment-analysis-lince (#6727)
* added model card for codeswitch-spaeng-sentiment-analysis-lince model also update other model card

* fixed typo

* fixed typo

* fixed typo

* fixed typo

* fixed typo

* fixed typo

* fixed typo

* Update README.md
2020-08-26 17:26:32 -04:00
Tanmay Thakur
e10fb9cbe6 Create model card for lordtt13/COVID-SciBERT (#6718) 2020-08-26 17:22:25 -04:00
Adam Montgomerie
baeba53e88 Adding model cards for 5 models (#6703)
* Added model cards for 4 models

Added model cards for:
- roberta-base-bulgarian
- roberta-base-bulgarian-pos
- roberta-small-bulgarian
- roberta-small-bulgarian-pos

* fixed link text

* Update README.md

* Create README.md

* removed trailing bracket

* Add language metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-26 17:20:55 -04:00
Julien Chaumond
3242e4d942 [model_cards] Fix tiny typos 2020-08-26 23:16:06 +02:00
Joe Davison
99407f9d1e add xlm-roberta-large-xnli model card (#6723)
* add xlm-roberta-large-xnli model card

* update pt example

* typo
2020-08-26 16:05:59 -04:00
Patrick von Platen
858b7d5873 [TF Longformer] Improve Speed for TF Longformer (#6447)
* add tf graph compile tests

* fix conflict

* remove more tf transpose statements

* fix conflicts

* fix comment typos

* move function to class function

* fix black

* fix black

* make style
2020-08-26 14:55:41 -04:00
Lysandre
a75c64d80c Black 20 release 2020-08-26 17:20:22 +02:00
Lysandre
e78c110338 isort 5 2020-08-26 17:13:49 +02:00
Julien Plu
02e8cd5584 Fix optimizer (#6717) 2020-08-26 11:12:44 -04:00
Lysandre Debut
77abd1e79f Centralize logging (#6434)
* Logging

* Style

* hf_logging > utils.logging

* Address @thomwolf's comments

* Update test

* Update src/transformers/benchmark/benchmark_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Revert bad change

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-26 11:10:36 -04:00
Jay Yip
461ae86812 Fix tf boolean mask in graph mode (#6741) 2020-08-26 05:15:35 -04:00
Patrick von Platen
925f34bbbd Add "tie_word_embeddings" config param (#6692)
* add tie_word_embeddings

* correct word embeddings in modeling utils

* make style

* make config param only relevant for torch

* make style

* correct typo

* delete deprecated arg in transo-xl
2020-08-26 04:58:21 -04:00
Patrick von Platen
fa8ee8e855 fix torchscript docs (#6740) 2020-08-26 04:51:56 -04:00
Sylvain Gugger
64c7c2bc15 Install nlp for github actions test (#6728) 2020-08-25 14:58:38 -04:00
Sam Shleifer
624495706c T5Tokenizer adds EOS token if not already added (#5866)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-25 14:56:08 -04:00
Sam Shleifer
e11d923bfc Fix pegasus-xsum integration test (#6726) 2020-08-25 14:06:28 -04:00
Tomo Lazovich
7e6397a7d8 [squad] make examples and dataset accessible from SquadDataset object (#6710)
* [squad] make examples and dataset accessible from SquadDataset object

* [squad] add support for legacy cache files
2020-08-25 13:32:56 -04:00
Funtowicz Morgan
ac9702c284 Fix ONNX test_quantize unittest (#6716) 2020-08-25 13:24:40 -04:00
Zane Lim
074340339a Create README.md (#6721)
add model card for singbert large
2020-08-26 00:11:24 +08:00
Patrick von Platen
d17cce2270 add missing keys (#6719) 2020-08-25 11:38:51 -04:00
Arnav Sharma
a25c9fc8e1 Selected typo fix (#6687) 2020-08-25 15:39:02 +02:00
Funtowicz Morgan
625318f525 tensor.nonzero() is deprecated in PyTorch 1.6 (#6715)
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-08-25 08:12:54 -04:00
Sylvain Gugger
124c3d6adc Add tokenizer to Trainer (#6689) 2020-08-25 07:47:09 -04:00
Sylvain Gugger
abc0202194 More tests to Trainer (#6699)
* More tests to Trainer

* Add warning in the doc
2020-08-25 07:07:36 -04:00
Sylvain Gugger
f5bad031bc Use generators tqdm progressbars (#6696) 2020-08-25 07:06:58 -04:00
Sam Shleifer
a99d09c6f9 add new line to make examples run (#6706) 2020-08-25 06:26:29 -04:00
Joel Hanson
4db2fa77d7 Allow tests in examples to use cuda or fp16,if they are available (#5512)
* Allow tests in examples to use cuda or fp16,if they are available

The tests in examples didn't use the cuda or fp16 even if they where available.
- The text classification example (`run_glue.py`) didn't use the fp16 even if it was available but
  the device was take based on the availablity(cuda/cpu).
- The language-modeling example (`run_language_modeling.py`) was having `--no_cuda` argument
  which made the test to work without cuda. This example is having issue when running with fp16
  thus it not enabled (got an assertion error for perplexity due to it higher value).
- The cuda and fp16 is not enabled for question-answering example (`run_squad.py`) as it is having a
  difference in the f1 score.
- The text-generation example (`run_generation.py`) will take the cuda or fp16 whenever it is available.

Resolves some of: #5057

* Unwanted import of is_apex_available was removed

* Made changes to test examples file to have the pass --fp16 only if cuda and apex is avaliable
- run_glue.py: Removed the check for cuda and fp16.
- run_generation.py: Removed the check for cuda and fp16 also removed unwanted flag creation.

* Incorrectly sorted imports fixed

* The model needs to be converted to half precision

* Formatted single line if condition statement to multiline

* The torch_device also needed to be checked before running the test on examples
- The tests in examples which uses cuda should also depend from the USE_CUDA flag,
  similarly to the rest of the test suite. Even if we decide to set USE_CUDA to
  True by default, setting USE_CUDA to False should result in the examples not using CUDA

* Format some of the code in test_examples file

* The improper import of is_apex_available was sorted

* Formatted the code to keep the style standards

* The comma at the end of list giving a flake8 issue was fixed

* Import sort was fixed

* Removed the clean_test_dir function as its not used right now
2020-08-25 06:02:07 -04:00
Yohei Tamura
841f071569 Add typing.overload for convert_ids_tokens (#6637)
* add overload for type checker

* black
2020-08-25 04:57:08 -04:00
Quentin Lhoest
0f16dd0ac2 Add DPR to models summary (#6690)
* add dpr to models summary

* minor

* minor

* Update docs/source/model_summary.rst

qa -> question answering

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_summary.rst

qa -> question ansering (cont'd)

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-25 09:57:28 +02:00
Jay
4fca874ea9 Remove hard-coded uses of float32 to fix mixed precision use (#6648) 2020-08-25 15:42:32 +08:00
Sam Shleifer
0344428f79 [s2s] round bleu, rouge to 4 digits (#6704) 2020-08-25 00:33:11 -04:00
Zane Lim
b6512d2357 Add model card for singbert. (#6674)
* Add model card for singbert.

Adding a model card for singbert- bert for singlish and manglish.

* Update README.md

Add additional tags and model name.

* Update README.md

Fix tag for malay.

* Update model_cards/zanelim/singbert/README.md

Fix language

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>

* Add examples and custom widget input.

Add examples and custom widget input.

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-25 10:09:13 +08:00
Sylvain Gugger
d20cbb886b Fix hyperparameter_search doc (#6695) 2020-08-24 21:04:08 -04:00
Sam Shleifer
0ebc9699fa [fixdoc] Add import to pegasus usage doc (#6698) 2020-08-24 15:54:57 -04:00
Sylvain Gugger
6b4c617666 Move unused args to kwargs (#6694) 2020-08-24 13:20:03 -04:00
Stas Bekman
912a21ec78 remove BartForConditionalGeneration.generate (#6659)
As suggested here: https://github.com/huggingface/transformers/issues/6651#issuecomment-678594233
this removes generic `generate` doc with examples not-relevant to bart.
2020-08-25 00:42:34 +08:00
Stas Bekman
a8d6716ecb Create PULL_REQUEST_TEMPLATE.md (#6660)
* Create PULL_REQUEST_TEMPLATE.md

Proposing to copy this neat feature from pytorch. This is a small template that let's a PR submitter tell which issue that PR closes.

* Update .github/PULL_REQUEST_TEMPLATE.md

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-25 00:30:38 +08:00
Sylvain Gugger
8f98faf934 Lat fix for Ray HP search (#6691) 2020-08-24 12:15:00 -04:00
Sylvain Gugger
3a7fdd3f52 Add hyperparameter search to Trainer (#6576)
* Add optuna hyperparameter search to Trainer

* @julien-c suggestions

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Make compute_objective an arg function

* Formatting

* Rework to make it easier to add ray

* Formatting

* Initial support for Ray

* Formatting

* Polish and finalize

* Add trial id to checkpoint with Ray

* Smaller default

* Use GPU in ray if available

* Formatting

* Fix test

* Update install instruction

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Address review comments

* Formatting post-merge

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-08-24 11:48:45 -04:00
vblagoje
dd522da004 Fix PL token classification examples (#6682) 2020-08-24 11:30:06 -04:00
Sylvain Gugger
a573777901 Update repo to isort v5 (#6686)
* Run new isort

* More changes

* Update CI, CONTRIBUTING and benchmarks
2020-08-24 11:03:01 -04:00
Teven
d329c9b05d Fixed DataCollatorForLanguageModeling not accepting lists of lists (#6685)
* Fixed DataCollatorForLanguageModeling + PermutationLanguageModeling not accepting lists of lists

* Update data_collator.py

* black was grumpy
2020-08-24 15:31:44 +02:00
sgugger
0a850d210e Missing commit 2020-08-24 09:23:06 -04:00
Sylvain Gugger
b30879fe0c Don't reset the dataset type + plug for rm unused columns (#6683)
* Don't reset the type of the dataset

* Formatting

* Update trainer.py

Co-authored-by: Teven <teven.lescao@gmail.com>
2020-08-24 09:22:03 -04:00
Jared T Nielsen
1a779ad7ec Specify config filename (#6626) 2020-08-24 07:27:58 -04:00
Sagor Sarker
a622705ef3 added multiple model_cards for below models (#6666)
* Create README.md

* Update README.md

* Create README.md

* Update README.md

* added multiple codeswitch model
2020-08-24 05:08:32 -04:00
Patrick von Platen
16e38940bd Add Roberta2Roberta shared 2020-08-23 17:02:22 +02:00
Sam Shleifer
f230a64094 new paper bibtex (#6656) 2020-08-23 10:03:41 -04:00
Patrick von Platen
f235ee2164 Add Roberta2Roberta model card 2020-08-23 10:01:58 +02:00
Sagor Sarker
068df740bd added model_card for model codeswitch-hineng-lid-lince and codeswitch-spaeng-lid-lince (#6663)
* Create README.md

* Update README.md

* Create README.md

* Update README.md
2020-08-22 12:13:21 -04:00
Patrick von Platen
97bb2497ab Correct bug in bert2bert-cnn_dailymail
Model was trained with the wrong tokenizer. Retrained with correct tokenizer - thanks for spotting @lhoestq !
2020-08-22 13:44:20 +02:00
Manuel Romero
0f94151dc7 Add model card for electricidad-base-generator (#6650)
I works like a charm!
Look at the output of the example code!
2020-08-21 14:18:15 -04:00
Suraj Patil
cbda72932c [Doc model summary] add MBart model summary (#6649) 2020-08-21 13:42:59 -04:00
Patrick von Platen
9e8c494da7 Add T5-11B disclaimer
@julien-c
2020-08-21 18:11:18 +02:00
Patrick von Platen
a4db4e3032 [Docs model summaries] Add pegasus to docs (#6640)
* add pegasus to docs

* Update docs/source/model_summary.rst
2020-08-21 16:22:10 +02:00
Suraj Patil
d0e42a7bed CamembertForCausalLM (#6577)
* added CamembertForCausalLM

* add in __init__ and auto model

* style

* doc
2020-08-21 13:52:54 +02:00
josephrocca
bdf7e5de92 Remove accidental comment (#6629) 2020-08-21 05:07:32 -04:00
Manuel Romero
efc7460553 model card for Spanish electra base (#6633) 2020-08-21 05:04:29 -04:00
Morgan Funtowicz
b105f2c6b3 Update ONNX doc to match the removal of --optimize argument.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-08-21 10:37:09 +02:00
Sylvain Gugger
e5f452275b Trainer automatically drops unused columns in nlp datasets (#6449)
* Add a classmethod to easily build a Trainer from nlp dataset and metric

* Fix docstrings

* Split train/eval

* Formatting

* Log dropped columns + docs

* Authorize callable activations

* Poc for auto activation

* Be framework-agnostic

* Formatting

* Remove class method

* Remove unnecessary code
2020-08-20 16:29:14 -04:00
Sam Shleifer
5bf4465e6c Regression test for pegasus bugfix (#6606) 2020-08-20 15:34:43 -04:00
sgugger
86c07e634f One last threshold to raise 2020-08-20 14:23:09 -04:00
Sylvain Gugger
e8af90c052 Move threshold up for flaky test with Electra (#6622)
* Move threshold up for flaky test with Electra

* Update above as well
2020-08-20 13:59:40 -04:00
Ivan Dolgov
953958372a XLNet Bug when training with apex 16-bit precision (#6567)
* xlnet fp16 bug fix

* comment cast added

* Update modeling_xlnet.py

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-21 01:34:23 +08:00
Patrick von Platen
505f2d749e [Tests] fix attention masks in Tests (#6621)
* fix distilbert

* fix typo
2020-08-20 13:23:47 -04:00
Denisa Roberts
c9454507cf Add tests for Reformer tokenizer (#6485) 2020-08-20 18:58:44 +02:00
Joe Davison
f9d280a959 TFTrainer dataset doc & fix evaluation bug (#6618)
* TFTrainer dataset doc & fix evaluation bug

discussed in #6551

* add docstring to test/eval datasets
2020-08-20 12:11:36 -04:00
Sylvain Gugger
573bdb0a5d Add tests to Trainer (#6605)
* Add tests to Trainer

* Test if removing long breaks everything

* Remove ugly hack

* Fix distributed test

* Use float for number of epochs
2020-08-20 11:13:50 -04:00
Joe Davison
039d8d65fc add intro to nlp lib & dataset links to custom datasets tutorial (#6583)
* add intro to nlp lib + links

* unique links...
2020-08-20 10:32:51 -04:00
sgugger
b3e54698dd Fix CI 2020-08-20 08:34:02 -04:00
Prajjwal Bhargava
33bf426498 removed redundant arg in prepare_inputs (#6614)
* removed redundant arg in prepare_inputs

* made same change in prediction_loop
2020-08-20 08:23:35 -04:00
Romain Rigaux
cabfdfafc0 Docs copy button misses ... prefixed code (#6518)
Tested in a local build of the docs.

e.g. Just above https://huggingface.co/transformers/task_summary.html#causal-language-modeling

Copy will copy the full code, e.g.

for token in top_5_tokens:
     print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))

Instead of currently only:

for token in top_5_tokens:


>>> for token in top_5_tokens:
...     print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help reduce our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help increase our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help decrease our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help offset our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help improve our carbon footprint.

Docs for the option fix:
https://sphinx-copybutton.readthedocs.io/en/latest/
2020-08-20 17:35:06 +08:00
Stas Bekman
61b5ee11e3 lighter 'make test' (#6512) 2020-08-20 17:24:25 +08:00
Siddharth Jain
3c3c46f563 Typo fix in 04-onnx-export (#6595) 2020-08-20 16:17:16 +08:00
Oren Amsalem
93c5c9a528 [cleanup] remove confusing newline (#6603) 2020-08-20 00:33:36 -04:00
Sylvain Gugger
18ca0e9140 Fix #6575 (#6596) 2020-08-19 13:04:33 -04:00
Suraj Patil
7581884dee [BartTokenizerFast] add prepare_seq2seq_batch (#6543) 2020-08-19 10:37:48 -04:00
Patrick von Platen
8bcceaceff fix model outputs test (#6593) 2020-08-19 16:18:51 +02:00
Sam Shleifer
9a86321b11 tf generation utils: remove unused kwargs (#6591) 2020-08-19 09:37:45 -04:00
Pradhy729
2a7402cbd3 Feed forward chunking others (#6365)
* Feed forward chunking for Distilbert & Albert

* Added ff chunking for many other models

* Change model signature

* Added chunking for XLM

* Cleaned up by removing some variables.

* remove test_chunking flag

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-08-19 14:31:10 +02:00
Patrick von Platen
fe0b85e77a [EncoderDecoder] Add functionality to tie encoder decoder weights (#6538)
* start adding tie encoder to decoder functionality

* finish model tying

* make style

* Apply suggestions from code review

* fix t5 list including cross attention

* apply sams suggestions

* Update src/transformers/modeling_encoder_decoder.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add max depth break point

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-19 14:23:45 +02:00
Sam Shleifer
ab42d74850 Fix bart base test (#6587) 2020-08-18 21:28:10 -04:00
Sam Shleifer
1529bf9680 add BartConfig.force_bos_token_to_be_generated (#6526)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-18 19:15:50 -04:00
Patrick von Platen
974bb4af26 [Model card] Bert2GPT2 EncoderDecoder model (#6569)
* Bert2GPT2 EncoderDecoder model

* Update README.md
2020-08-18 19:28:17 +02:00
Suraj Patil
6f972e1423 update xnli-mt url (#6580) 2020-08-18 13:10:47 -04:00
Suraj Patil
fb6844aff5 [Pegasus Doc] minor typo (#6579)
Minor typo correction
@sshleifer
2020-08-18 12:47:47 -04:00
Manuel Romero
aaab9ab187 Create README.md (#6556) 2020-08-18 12:43:20 -04:00
Manuel Romero
1dfce0f08a Create README.md (#6557) 2020-08-18 12:42:14 -04:00
Romain Rigaux
7516bcf273 [docs] Fix number of 'ug' occurrences in tokenizer_summary (#6574) 2020-08-18 10:23:25 -04:00
Romain Rigaux
5a5af22ed5 [docs] Fix wrong newline in the middle of a paragraph (#6573) 2020-08-18 10:22:43 -04:00
Stas Bekman
7659a8eb37 fix incorrect codecov reports (#6553)
As discussed at https://github.com/huggingface/transformers/issues/6317 codecov currently sends an invalid report when it fails to find a code coverage report for the base it checks against, so this gets fixed by:

-  require_base: yes        # don't report if there is no base coverage report

let's add this for clarity, this supposedly is already the default.

-  require_head: yes        # don't report if there is no head coverage report 

and perhaps no point reporting on doc changes as they don't make any difference and it just generates noise:

-  require_changes: true    # only comment if there was change in coverage
2020-08-18 10:21:13 -04:00
Stefan Schweter
cfa26d2b41 github: add @stefan-it to bug-report template for all token-classification related bugs (#6489) 2020-08-18 08:38:54 -04:00
Philip May
1fdf372f8c Small typo fixes for model card: electra-base-german-uncased (#6555)
* Update README.md

* Update model_cards/german-nlp-group/electra-base-german-uncased/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-18 08:21:52 -04:00
Ali Modarressi
5a81195ea9 Fixed label datatype for STS-B (#6492)
* fixed label datatype for sts-b

* naming update

* make style

* make style
2020-08-18 08:09:39 -04:00
Sam Shleifer
12d7624199 [marian] converter supports models from new Tatoeba project (#6342) 2020-08-17 23:55:42 -04:00
Jim Regan
fb7330b30e update with #s of sentences/tokens (#6546) 2020-08-17 16:48:05 -04:00
onepointconsulting
63144701ed Added first model card (#6530)
* Added first model card

* Add metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-17 16:24:10 -04:00
Ikram Ali
98ee802023 [model_cards] Add model cards for Urduhack model (roberta-urdu-small) (#6536)
* [model_cards] roberta-urdu-small added.

* [model_cards] typo fixed.

* Tweak license format (yaml expects a simple string)

Co-authored-by: Ikram Ali <mrikram1989>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-17 16:04:29 -04:00
Jim Regan
3a302904cb [model_cards] Add a new model for Irish (#6544) 2020-08-17 15:56:56 -04:00
Julien Chaumond
07971d8b18 [model_cards] Fix yaml for cedpsam/chatbot_fr 2020-08-17 21:33:32 +02:00
Suraj Patil
407da12ef1 [T5Tokenizer] add prepare_seq2seq_batch method (#6122)
* tests
2020-08-17 13:57:19 -04:00
Suraj Patil
c9564f5343 [Doc] add more MBart and other doc (#6490)
* add mbart example

* add Pegasus and MBart in readme

* typo

* add MBart in Pretrained models

* add pre-proc doc

* add DPR in readme

* fix indent

* doc fix
2020-08-17 12:30:26 -04:00
Stas Bekman
f68c873100 replace _ with __ rst links (#6541) 2020-08-17 12:27:02 -04:00
sgugger
7ca6ab67fc Fix CI 2020-08-17 12:20:40 -04:00
Stas Bekman
b732e7e111 [doc] multiple corrections to "Summary of the tasks" (#6509)
* [doc] multiple corrections to "Summary of the tasks"

* fix indentation

* correction

* fix links, add links to examples/seq2seq/README.md instead of non-existing script
2020-08-17 11:49:16 -04:00
Suraj Patil
2a77813d53 [BartTokenizer] add prepare s2s batch (#6212)
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
2020-08-17 11:44:46 -04:00
Stas Bekman
84d33317ae [doc] make the text more readable, fix some typos, add some disambiguation (#6508)
* [doc] make the text more readable, fix some typos, add some disambiguation

* Update docs/source/glossary.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-17 11:07:58 -04:00
Joe Davison
d0c2389f48 add custom datasets tutorial (#6466)
* add custom datasets tutorial

* python -> bash code blocks

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* minor review feedback changes

* add working native QA snippet

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-17 09:15:34 -04:00
Sam Shleifer
d2da2cb232 allow spaces in bash args with "$@" (#6521) 2020-08-17 09:06:35 -04:00
Funtowicz Morgan
b41cc0b86a Fix flaky ONNX tests (#6531) 2020-08-17 09:04:35 -04:00
Stas Bekman
39c3b1d9de [sched] polynomial_decay_schedule use default power=1.0 (#6473) 2020-08-17 08:33:12 -04:00
Stas Bekman
9dbe4094f2 [testing] a new TestCasePlus subclass + get_auto_remove_tmp_dir() (#6494)
* [testing] switch to a new TestCasePlus + get_auto_remove_tmp_dir() for auto-removal of tmp dirs

* respect after=True for tempfile, simplify code

* comments

* comment fix

* put `before` last in args, so can make debug even faster
2020-08-17 08:12:19 -04:00
Patrick von Platen
36010cb1e2 fix pegasus doc (#6533) 2020-08-17 12:24:43 +02:00
Kevin Canwen Xu
37709b5909 Remove deprecated assertEquals (#6532)
`assertEquals` is deprecated: https://stackoverflow.com/questions/930995/assertequals-vs-assertequal-in-python/931011
This PR replaces these deprecated methods.
2020-08-17 17:13:58 +08:00
Stas Bekman
49d8076fa2 [doc] Summary of the models fixes (#6511)
* [doc] Summary of the models fixes

* correction
2020-08-17 16:04:53 +08:00
Cahya Wirawan
72911c893a Create model cards for indonesian models (#6522)
* added model cards for indonesian gpt2-small, bert-base and roberta-base models

* removed bibtex entries
2020-08-17 15:42:25 +08:00
Masatoshi Suzuki
48c6c6139f Support additional dictionaries for BERT Japanese tokenizers (#6515)
* Update BERT Japanese tokenizers

* Update CircleCI config to download unidic

* Specify to use the latest dictionary packages
2020-08-17 12:00:23 +08:00
Stas Bekman
423eb5b1d7 [doc] fix invalid env vars (#6504)
- remove invalid `ENV_` prefix.
- add a few ':' while at it
2020-08-17 11:11:40 +08:00
Philip May
3c72f5584b Add Model Card for electra-base-german-uncased (#6496)
* Add Model Card for electra-base-german-uncased

* Update README.md

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-17 11:02:32 +08:00
Stas Bekman
df15c7c226 typos (#6505) 2020-08-17 10:57:36 +08:00
fabiocapsouza
6d38ab1cc3 Update bert-base-portuguese-cased and bert-large-portuguese-cased model cards (#6527)
Co-authored-by: Fabio Souza <fabiosouza@neuralmind.ai>
2020-08-17 10:49:49 +08:00
Sam Shleifer
84c265ffcc [lightning_base] fix s2s logging, only make train_loader once (#6404) 2020-08-16 22:49:41 -04:00
Sam Shleifer
72add6c98f [s2s] docs, document desired filenames nicely (#6525) 2020-08-16 20:31:22 -04:00
Kyle Piira
2060181126 Fixes paths with spaces in seq2seq example (#6493) 2020-08-16 13:36:38 -04:00
Kevin Canwen Xu
fe61c05b85 Add examples/bert-loses-patience who can help (#6499) 2020-08-16 16:30:16 +08:00
Jin Young (Daniel) Sohn
24107c2c83 Fix TPU Convergence bug introduced by PR#6151 (#6488)
Currently with the bug introduced we're taking two optimizer steps per
batch: one global one, where `xm.optimizer_step` injects a CRS between
all cores in training, and one without. This has been affecting training
accuracy (for example, XLNet GLUE on MNLI is not converging, etc.).
2020-08-14 12:47:37 -04:00
Sylvain Gugger
895ed8f451 Generation doc (#6470)
* Generation doc

* MBartForConditionalGeneration (#6441)

* add MBartForConditionalGeneration

* style

* rebase and fixes

* add mbart test in TEST_FILES_WITH_NO_COMMON_TESTS

* fix docs

* don't ignore mbart

* doc

* fix mbart fairseq link

* put mbart before bart

* apply doc suggestions

* Use hash to clean the test dirs (#6475)

* Use hash to clean the test dirs

* Use hash to clean the test dirs

* Use hash to clean the test dirs

* fix

* [EncoderDecoder] Add Cross Attention for GPT2 (#6415)

* add cross attention layers for gpt2

* make gpt2 cross attention work

* finish bert2gpt2

* add explicit comments

* remove attention mask since not yet supported

* revert attn mask in pipeline

* Update src/transformers/modeling_gpt2.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_encoder_decoder.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Sort unique_no_split_tokens to make it deterministic (#6461)

* change unique_no_split_tokens's type to set

* use sorted list instead of set

* style

* Import accuracy_score (#6480)

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address comments

* Styling

* Generation doc

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address comments

* Styling

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Co-authored-by: gijswijnholds <gijswijnholds@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-08-14 09:46:39 -04:00
gijswijnholds
b5ba758ba9 Import accuracy_score (#6480) 2020-08-14 08:16:16 -04:00
Quentin Lhoest
9a8c168f56 Sort unique_no_split_tokens to make it deterministic (#6461)
* change unique_no_split_tokens's type to set

* use sorted list instead of set

* style
2020-08-14 10:36:58 +02:00
Patrick von Platen
1d6e71e116 [EncoderDecoder] Add Cross Attention for GPT2 (#6415)
* add cross attention layers for gpt2

* make gpt2 cross attention work

* finish bert2gpt2

* add explicit comments

* remove attention mask since not yet supported

* revert attn mask in pipeline

* Update src/transformers/modeling_gpt2.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_encoder_decoder.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-14 09:43:29 +02:00
Kevin Canwen Xu
eb613b566a Use hash to clean the test dirs (#6475)
* Use hash to clean the test dirs

* Use hash to clean the test dirs

* Use hash to clean the test dirs

* fix
2020-08-14 15:34:39 +08:00
Suraj Patil
680f1337c3 MBartForConditionalGeneration (#6441)
* add MBartForConditionalGeneration

* style

* rebase and fixes

* add mbart test in TEST_FILES_WITH_NO_COMMON_TESTS

* fix docs

* don't ignore mbart

* doc

* fix mbart fairseq link

* put mbart before bart

* apply doc suggestions
2020-08-14 03:21:16 -04:00
Manuel Romero
05810cd80a Fix typo (#6469) 2020-08-13 15:01:08 -04:00
Kevin Canwen Xu
7bc00569df Clean directory after script testing (#6453)
* Clean Dir after testing

* remove pabee ignore
2020-08-14 00:34:03 +08:00
Sam Shleifer
e92efcf728 Mult rouge by 100: standard units (#6359) 2020-08-13 12:15:54 -04:00
vblagoje
eda07efaa5 Add POS tagging and Phrase chunking token classification examples (#6457)
* Add more token classification examples

* POS tagging example

* Phrase chunking example

* PR review fixes

* Add conllu to third party list (used in token classification examples)
2020-08-13 12:09:51 -04:00
Suraj Patil
f51161e230 add BartTokenizerFast in AutoTokenizer (#6464)
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-08-13 12:08:11 -04:00
Suraj Patil
a442f87adc add LongformerTokenizerFast in AutoTokenizer (#6463) 2020-08-13 12:06:43 -04:00
Lysandre Debut
f7cbc13db7 Test model outputs equivalence (#6445)
* Test model outputs equivalence

* Fix failing tests

* From dict to kwargs

* DistilBERT

* Addressing @sgugger and @patrickvonplaten's comments
2020-08-13 11:59:35 -04:00
Prajjwal Bhargava
54c687e97c typo fix (#6462) 2020-08-13 09:36:48 -04:00
Zhu Baohe
9d94aecd51 Fix docs and bad word tokens generation_utils.py (#6387)
* fix

* fix2

* fix3
2020-08-13 13:12:16 +02:00
cedspam
0ed7c00ba6 Update README.md (#6435)
* Update README.md

* Update README.md

* Update README.md
2020-08-13 11:01:17 +02:00
Stas Bekman
e983da0e7d cleanup tf unittests: part 2 (#6260)
* cleanup torch unittests: part 2

* remove trailing comma added by isort, and which breaks flake

* one more comma

* revert odd balls

* part 3: odd cases

* more ["key"] -> .key refactoring

* .numpy() is not needed

* more unncessary .numpy() removed

* more simplification
2020-08-13 04:29:06 -04:00
Joe Davison
bc820476a5 add targets arg to fill-mask pipeline (#6239)
* add targets arg to fill-mask pipeline

* add tests and more error handling

* quality

* update docstring
2020-08-12 12:48:29 -04:00
Patrick von Platen
0735def8e1 [EncoderDecoder] Add encoder-decoder for roberta/ vanilla longformer (#6411)
* add encoder-decoder for roberta

* fix headmask

* apply Sylvains suggestions

* fix typo

* Apply suggestions from code review
2020-08-12 18:23:30 +02:00
zcain117
fd3de2000f Get GKE logs via kubectl logs instead of gcloud logging read. (#6446) 2020-08-12 11:46:24 -04:00
Sam Shleifer
f94a52cd79 [s2s] add BartTranslationDistiller for distilling mBART (#6363) 2020-08-12 11:41:04 -04:00
Sylvain Gugger
d2370e1bd8 Adding PaddingDataCollator (#6442)
* Data collator with padding

* Add type annotation

* Support tensors as well

* Add comment

* Fix for labels wrong shape

* Data collator with padding

* Add type annotation

* Support tensors as well

* Add comment

* Fix for labels wrong shape

* Remove changes rendered unnecessary
2020-08-12 11:32:27 -04:00
Sylvain Gugger
96c3329f19 Fix #6428 (#6437) 2020-08-12 08:47:30 -04:00
Sylvain Gugger
a8db954cda Activate check on the CI (#6427)
* Activate check on the CI

* Fix repo inconsistencies

* Don't document too much
2020-08-12 08:42:14 -04:00
Sylvain Gugger
34fabe1697 Move prediction_loss_only to TrainingArguments (#6426) 2020-08-12 08:03:45 -04:00
Sylvain Gugger
e9c3031463 Fixes to make life easier with the nlp library (#6423)
* allow using tokenizer.pad as a collate_fn in pytorch

* allow using tokenizer.pad as a collate_fn in pytorch

* Add documentation and tests

* Make attention mask the right shape

* Better test

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-08-12 08:00:56 -04:00
Stas Bekman
87b359439f [test] replace capsys with the more refined CaptureStderr/CaptureStdout (#6422)
* replace capsys with the more refined CaptureStderr/CaptureStdout

* Update examples/seq2seq/test_seq2seq_examples.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-12 07:54:28 -04:00
Jared T Nielsen
ac5bcf236e Fix FFN dropout in TFAlbertLayer, and split dropout in TFAlbertAttent… (#4323)
* Fix FFN dropout in TFAlbertLayer, and split dropout in TFAlbertAttention into two separate dropout layers.

* Same dropout fixes for PyTorch.
2020-08-12 07:52:42 -04:00
Lysandre Debut
4ffea5ce2f Disabled pabee test (#6431) 2020-08-12 02:52:50 -04:00
Rohan Rajpal
155288f04b [model_card] rohanrajpal/bert-base-codemixed-uncased-sentiment (#6324)
* Create README.md

* Update model_cards/rohanrajpal/bert-base-codemixed-uncased-sentiment/README.md

* Update model_cards/rohanrajpal/bert-base-codemixed-uncased-sentiment/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-11 18:38:18 -04:00
Manuel Romero
4e6245fc7e Create model card T5-base fine-tuned on event2Mind for Intent Prediction (#6412) 2020-08-11 18:35:27 -04:00
Manuel Romero
46e3a0a6ec Create README.md (#6381) 2020-08-11 18:34:11 -04:00
Manuel Romero
31dfde7429 Create README.md (#6378) 2020-08-11 18:32:37 -04:00
Manuel Romero
25e29150a2 Add metadata to be indexed properly (#6380) 2020-08-11 18:32:29 -04:00
Manuel Romero
471be5f279 Change metadata to be indexed correctly (#6379) 2020-08-11 18:32:18 -04:00
Rohan Rajpal
42ee0bc63d Create README.md (#6346)
* Create README.md

* add results on SAIL dataset

* Update model_cards/rohanrajpal/bert-base-multilingual-codemixed-cased-sentiment/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-11 18:31:34 -04:00
Sam Shleifer
3f071c4b6e [examples] add pytest dependency (#6425) 2020-08-11 17:58:09 -04:00
Stas Bekman
ece0903e11 lr_schedulers: add get_polynomial_decay_schedule_with_warmup (#6361)
* [wip] add get_polynomial_decay_schedule_with_warmup

* style

* add assert

* change lr_end to a much smaller default number

* check for exact equality

* [model_cards] electra-base-turkish-cased-ner (#6350)

* for electra-base-turkish-cased-ner

* Add metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Temporarily de-activate TPU CI

* Update modeling_tf_utils.py (#6372)

fix typo: ckeckpoint->checkpoint

* the test now works again (#6371)

* correct pl link in readme (#6364)

* refactor almost identical tests (#6339)

* refactor almost identical tests

* important to add a clear assert error message

* make the assert error even more descriptive than the original bt

* Small docfile fixes (#6328)

* Patch models (#6326)

* TFAlbertFor{TokenClassification, MultipleChoice}

* Patch models

* BERT and TF BERT info


s

* Update check_repo

* Ci GitHub caching (#6382)

* Cache Github Actions CI

* Remove useless file

* Colab button (#6389)

* Add colab button

* Add colab link for tutorials

* Fix links for open in colab (#6391)

* Update src/transformers/optimization.py

consistently use lr_end=1e-7 default

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [wip] add get_polynomial_decay_schedule_with_warmup

* style

* add assert

* change lr_end to a much smaller default number

* check for exact equality

* Update src/transformers/optimization.py

consistently use lr_end=1e-7 default

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove dup (leftover from merge)

* convert the test into the new refactored format

* stick to using the current_step as is, without ++

Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Alexander Measure <ameasure@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-08-11 17:56:41 -04:00
cedspam
6c87b73d6b Create README.md (#6386)
* Create README.md

* Update README.md
2020-08-11 16:56:51 -04:00
Stas Bekman
0203d6517f [pl] restore lr logging behavior for glue, ner examples (#6314) 2020-08-11 16:27:11 -04:00
Sam Shleifer
be1520d3a3 rename prepare_translation_batch -> prepare_seq2seq_batch (#6103) 2020-08-11 15:57:07 -04:00
Sam Shleifer
66fa8ceaea PegasusForConditionalGeneration (torch version) (#6340)
Co-authored-by: Jingqing  Zhang <jingqing.zhang15@imperial.ac.uk>
2020-08-11 14:31:23 -04:00
Stas Bekman
f6cb0f806e [s2s] wmt download script use less ram (#6405) 2020-08-11 12:04:17 -04:00
Stas Bekman
7c6a085ebf pl version: examples/requirements.txt is single source of truth (#6309) 2020-08-11 10:58:54 -04:00
Pranav Vadrevu
1d1d5bec1b Create Model Card File (#6357) 2020-08-11 10:36:15 -04:00
Abed khooli
00ce881c07 Create README.md (#6413)
* Create README.md

Model card for https://huggingface.co/akhooli/gpt2-small-arabic

* Update model_cards/akhooli/gpt2-small-arabic/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-11 10:35:31 -04:00
Nick Doiron
3ae30787b5 switch Hindi-BERT to S3 README (#6396) 2020-08-11 10:34:22 -04:00
Abed khooli
824e651e17 Create README.md (#6397)
* Create README.md

* Update model_cards/akhooli/gpt2-small-arabic-poetry/README.md

* Update model_cards/akhooli/gpt2-small-arabic-poetry/README.md

* Update model_cards/akhooli/gpt2-small-arabic-poetry/README.md

* Update model_cards/akhooli/gpt2-small-arabic-poetry/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-11 09:03:23 -04:00
guillaume-be
404782912a [Performance improvement] "Bad tokens ids" optimization (#6064)
* Optimized banned token masking

* Avoid duplicate EOS masking if in bad_words_id

* Updated mask generation to handle empty banned token list

* Addition of unit tests for the updated bad_words_ids masking

* Updated timeout handling in `test_postprocess_next_token_scores_large_bad_words_list` unit test

* Updated timeout handling in `test_postprocess_next_token_scores_large_bad_words_list` unit test (timeout does not work on Windows)

* Moving Marian import to the test context to allow TF only environments to run

* Moving imports to torch_available test

* Updated operations device and test

* Updated operations device and test

* Added docstring and comment for in-place scores modification

* Moving test to own test_generation_utils, use of lighter models for testing

* removed unneded imports in test_modeling_common

* revert formatting change for ModelTesterMixin

* Updated caching, simplified eos token id test, removed unnecessary @require_torch

* formatting compliance
2020-08-11 05:56:40 -04:00
David LaPalomento
87e124c245 Warn if debug requested without TPU fixes (#6308) (#6390)
* Warn if debug requested without TPU fixes (#6308)
Check whether a PyTorch compatible TPU is available before attempting to print TPU metrics after training has completed. This way, users who apply `--debug` without reading the documentation aren't suprised by a stacktrace.

* Style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-11 05:31:26 -04:00
Junyuan Zheng
cdf1f7edb2 Fix tokenizer saving and loading error (#6026)
* fix tokenizer saving and loading bugs when adding AddedToken to additional special tokens

* Add tokenizer test

* Style

* Style 2

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-11 04:49:16 -04:00
Stas Bekman
83984a61c6 testing utils: capturing std streams context manager (#6231)
* testing utils: capturing std streams context manager

* style

* missing import

* add the origin of this code
2020-08-11 03:56:47 -04:00
Stas Bekman
f6c0680d36 add pl_glue example test (#6034)
* add pl_glue example test

* for now just test that it runs, next validate results of eval or predict?

* complete the run_pl_glue test to validate the actual outcome

* worked on my machine, CI gets less accuracy - trying higher epochs

* match run_pl.sh hparms

* more epochs?

* trying higher lr

* for now just test that the script runs to a completion

* correct the comment

* if cuda is available, add --fp16 --gpus=1 to cover more bases

* style
2020-08-11 03:16:52 -04:00
Pradhy729
b25cec13c5 Feed forward chunking (#6024)
* Chunked feed forward for Bert

This is an initial implementation to test applying feed forward chunking for BERT.
Will need additional modifications based on output and benchmark results.

* Black and cleanup

* Feed forward chunking in BertLayer class.

* Isort

* add chunking for all models

* fix docs

* Fix typo

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-08-11 03:12:45 -04:00
Lysandre
8a3db6b303 Add TPU testing once again 2020-08-11 08:49:37 +02:00
zcain117
f65ac1faf2 Add missing docker arg for TPU CI. (#6393) 2020-08-11 02:48:49 -04:00
Sam Shleifer
b9ecd92ee4 [s2s] Script to save wmt data to disk (#6403) 2020-08-10 22:49:39 -04:00
Patrick von Platen
00bb0b25ed TF Longformer (#5764)
* improve names and tests longformer

* more and better tests for longformer

* add first tf test

* finalize tf basic op functions

* fix merge

* tf shape test passes

* narrow down discrepancies

* make longformer local attn tf work

* correct tf longformer

* add first global attn function

* add more global longformer func

* advance tf longformer

* finish global attn

* upload big model

* finish all tests

* correct false any statement

* fix common tests

* make all tests pass except keras save load

* fix some tests

* fix torch test import

* finish tests

* fix test

* fix torch tf tests

* add docs

* finish docs

* Update src/transformers/modeling_longformer.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_tf_longformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply Lysandres suggestions

* reverse to assert statement because function will fail otherwise

* applying sylvains recommendations

* Update src/transformers/modeling_longformer.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update src/transformers/modeling_tf_longformer.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-10 23:25:06 +02:00
Patrick von Platen
3425936643 [EncoderDecoderModel] add a add_cross_attention boolean to config (#6377)
* correct encoder decoder model

* Apply suggestions from code review

* apply sylvains suggestions
2020-08-10 19:46:48 +02:00
Sylvain Gugger
06bc347c97 Fix links for open in colab (#6391) 2020-08-10 11:16:17 -04:00
Sylvain Gugger
3e0fe3cf5c Colab button (#6389)
* Add colab button

* Add colab link for tutorials
2020-08-10 11:12:29 -04:00
Lysandre Debut
79588e6fdb Ci GitHub caching (#6382)
* Cache Github Actions CI

* Remove useless file
2020-08-10 10:39:31 -04:00
Lysandre Debut
b99098abc7 Patch models (#6326)
* TFAlbertFor{TokenClassification, MultipleChoice}

* Patch models

* BERT and TF BERT info


s

* Update check_repo
2020-08-10 10:39:17 -04:00
Sylvain Gugger
6028ed92bd Small docfile fixes (#6328) 2020-08-10 05:37:12 -04:00
Stas Bekman
1429b920d4 refactor almost identical tests (#6339)
* refactor almost identical tests

* important to add a clear assert error message

* make the assert error even more descriptive than the original bt
2020-08-10 05:31:20 -04:00
Rohit Gupta
35eb96de4d correct pl link in readme (#6364) 2020-08-10 03:08:46 -04:00
Stas Bekman
0830e79512 the test now works again (#6371) 2020-08-10 02:55:52 -04:00
Alexander Measure
3a556b0fb7 Update modeling_tf_utils.py (#6372)
fix typo: ckeckpoint->checkpoint
2020-08-10 02:55:11 -04:00
Lysandre
1bbc54a87c Temporarily de-activate TPU CI 2020-08-10 08:11:40 +02:00
M. Yusuf Sarıgöz
6e8a38568e [model_cards] electra-base-turkish-cased-ner (#6350)
* for electra-base-turkish-cased-ner

* Add metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-09 03:39:51 -04:00
Sam Shleifer
9a5ef83748 [s2s] fix --gpus clarg collision (#6358) 2020-08-08 21:51:37 -04:00
Patrick von Platen
1aec991643 [GPT2] Correct typo in docs (#6352) 2020-08-08 20:37:29 +02:00
elsanns
9f57e39f71 Add notebook on fine-tuning and interpreting Electra (#6321)
Co-authored-by: eliska <3648991+elisans@users.noreply.github.com>
2020-08-08 11:47:33 +02:00
Suraj Patil
9bed355449 [s2s] fix label_smoothed_nll_loss (#6344) 2020-08-08 04:21:12 -04:00
Sam Shleifer
99f73bcc71 [s2s] tiny QOL improvement: run_eval prints scores (#6341) 2020-08-08 02:45:55 -04:00
Stas Bekman
322dffc6c9 remove a TODO item to use a tiny model (#6338)
as discussed with @sshleifer, removing this TODO to switch to a tiny model, since it won't be able to test the results of the evaluation (i.e. the results are meaningless).
2020-08-07 21:30:39 -04:00
Sam Shleifer
1f8e826518 [CI] Self-scheduled runner also pins torch (#6332) 2020-08-07 18:40:21 -04:00
zcain117
1b8a7ffcfd Add setup for TPU CI to run every hour. (#6219)
* Add setup for TPU CI to run every hour.

* Re-organize config.yml

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-07 11:17:07 -04:00
Stas Bekman
6695450a23 [examples] consistently use --gpus, instead of --n_gpu (#6315) 2020-08-07 10:36:32 -04:00
Julien Plu
0e36e51515 Fix the tests for Electra (#6284)
* Fix the tests for Electra

* Apply style
2020-08-07 09:30:57 -04:00
Sylvain Gugger
6ba540b747 Add a script to check all models are tested and documented (#6298)
* Add a script to check all models are tested and documented

* Apply suggestions from code review

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>

* Address comments

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-07 09:18:37 -04:00
Stas Bekman
e1638dce16 fix the slow tests doc (#6167)
remove unnecessary duplication wrt `RUN_SLOW=yes`
2020-08-07 09:17:32 -04:00
Binny Mathew
7e9861f7f4 dehate-bert Model Card (#6248)
Added citation and paper links.
2020-08-07 17:51:03 +08:00
Binny Mathew
f6df6d98dd dehate-bert Model Card (#6249)
Added citation and paper links.
2020-08-07 17:48:38 +08:00
Binny Mathew
26691ecba6 dehate-bert Model Card (#6250)
Added citation and paper links.
2020-08-07 17:48:09 +08:00
Binny Mathew
60657b295c dehate-bert Model Card (#6251)
Added citation and paper links.
2020-08-07 17:47:42 +08:00
Binny Mathew
7218261991 dehate-bert Model Card (#6252)
Added citation and paper links.
2020-08-07 17:47:26 +08:00
Binny Mathew
396d227cd4 dehate-bert Model Card (#6253)
Added citation and paper links.
2020-08-07 17:47:04 +08:00
Binny Mathew
8be260f18a dehate-bert Model Card (#6254)
Added citation and paper links.
2020-08-07 17:46:27 +08:00
Binny Mathew
dce7278cdf dehate-bert Model Card (#6255)
Added citation and paper links.
2020-08-07 17:45:52 +08:00
idoh
3be2d04884 fix consistency CrossEntropyLoss in modeling_bart (#6265) 2020-08-07 17:44:28 +08:00
Lysandre
c72f9c90a1 Remove --no-cache-dir from github CI 2020-08-07 09:07:22 +02:00
Lysandre Debut
0d9328f2ef Patch GPU failures (#6281)
* Pin to 1.5.0

* Patch XLM GPU test
2020-08-07 02:58:15 -04:00
Lysandre Debut
80a0676a51 CI dependency wheel caching (#6287)
* Single workflow cache test




Remove cache dir, re-trigger cache


Only pip archives


Not sudo when pip

* All workflow cache

Remove no-cache-dir instruction


Remove last sudo occurrences


v0.3
2020-08-07 02:48:59 -04:00
Stas Bekman
175cd45e13 fix the shuffle agrument usage and the default (#6307) 2020-08-06 20:32:28 -04:00
Bhashithe Abeysinghe
ffceef2042 [Fix] text-classification PL example (#6027)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-06 15:46:43 -04:00
xujiaze13
eb2bd8d6eb Remove redundant line in run_pl_glue.py (#6305) 2020-08-06 15:43:45 -04:00
Patrick von Platen
118ecfd427 fix for pytorch < 1.6 (#6300) 2020-08-06 21:14:46 +02:00
Sam Shleifer
2804fff839 [s2s]Use prepare_translation_batch for Marian finetuning (#6293)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-06 14:58:38 -04:00
Teven
2f2aa0c89c added n_inner argument to gpt2 config (#6296) 2020-08-06 17:47:32 +02:00
Manuel Romero
0a0d53dcf8 Update model card (#6290)
Add links to RuPERTa models fine-tuned on Spanish SQUAD datasets
2020-08-06 11:42:43 -04:00
Doug Blank
b923871bb7 Adds comet_ml to the list of auto-experiment loggers (#6176)
* Support for Comet.ml

* Need to import comet first

* Log this model, not the one in the backprop step

* Log args as hyperparameters; use framework to allow fine control

* Log hyperparameters with context

* Apply black formatting

* isort fix integrations

* isort fix __init__

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer_tf.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Address review comments

* Style + Quality, remove Tensorboard import test

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-06 11:31:30 -04:00
Philip May
d5bc32ce92 Add strip_accents to basic BertTokenizer. (#6280)
* Add strip_accents to basic tokenizer

* Add tests for strip_accents.

* fix style with black

* Fix strip_accents test

* empty commit to trigger CI

* Improved strip_accents check

* Add code quality with is not False
2020-08-06 18:52:28 +08:00
JME-P
31da35cc89 Create README.md (#6273)
I am adding a descriptive README.md file to my recently uploaded twitter classification model: shrugging-grace/tweetclassifier.
2020-08-05 12:36:24 -04:00
JME-P
a8bdba232f Create README.md for uploaded classifier (#6272)
I am adding a descriptive README.md file to my recently uploaded twitter classification model: shrugging-grace/tweetclassifier.
2020-08-05 12:27:46 -04:00
HUSEIN ZOLKEPLI
a23a535c10 added t5 bahasa summarization readme (#6269) 2020-08-05 12:27:27 -04:00
Sylvain Gugger
c67d1a0259 Tf model outputs (#6247)
* TF outputs and test on BERT

* Albert to DistilBert

* All remaining TF models except T5

* Documentation

* One file forgotten

* TF outputs and test on BERT

* Albert to DistilBert

* All remaining TF models except T5

* Documentation

* One file forgotten

* Add new models and fix issues

* Quality improvements

* Add T5

* A bit of cleanup

* Fix for slow tests

* Style
2020-08-05 11:34:39 -04:00
Teven
bd0eab351a Trainer + wandb quality of life logging tweaks (#6241)
* added `name` argument for wandb logging, also logging model config with trainer arguments

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added tf, post-review changes

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-05 09:05:52 -04:00
Julien Plu
33966811bd Add SequenceClassification and MultipleChoice TF models to Electra (#6227)
* Add SequenceClassification and MultipleChoice TF models to Electra

* Apply style

* Add summary_proj_to_labels to Electra config

* Finally mirroring the PT version of these models

* Apply style

* Fix Electra test
2020-08-05 09:04:27 -04:00
Stas Bekman
376c02e9a9 [WIP] lightning_base: support --lr_scheduler with multiple possibilities (#6232)
* support --lr_scheduler with multiple possibilities

* correct the error message

* add a note about supported schedulers

* cleanup

* cleanup2

* needs the argument default

* style

* add another assert in the test

* implement requested changes

* cleanups

* fix relative import

* cleanup
2020-08-05 09:01:17 -04:00
Zhu Baohe
d89acd07cc fix (#6257) 2020-08-05 07:37:57 -04:00
Ninnart Fuengfusin
24c5a6e351 Update optimization.py (#6261) 2020-08-05 07:34:57 -04:00
Lilian Bordeau
ed6b8f3128 Update to match renamed attributes in fairseq master (#5972)
* Update to match renamed attributes in fairseq master

RobertaModel no longer have model.encoder and args.num_classes attributes as of 5/28/20.

* Quality

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-05 07:23:55 -04:00
Ali Safaya
d9149f00d1 Update README.md (#6201) 2020-08-04 17:44:14 -04:00
Ali Safaya
ddfdbb86c1 Update README.md (#6200) 2020-08-04 17:44:05 -04:00
Ali Safaya
4f67955662 Update README.md (#6199) 2020-08-04 17:43:48 -04:00
Ali Safaya
869ec441c9 Update README.md (#6198) 2020-08-04 17:43:38 -04:00
Adam Montgomerie
5177dca634 Create README.md (#6123) 2020-08-04 17:42:53 -04:00
Manuel Romero
3f30ebe6ca Create README.md (#6075) 2020-08-04 17:41:23 -04:00
Binny Mathew
aa7c22a283 Update Model Card (#6246)
Added citation and paper links.
2020-08-04 17:40:47 -04:00
Joe Davison
972535ea74 fix zero shot pipeline docs (#6245) 2020-08-04 16:37:49 -04:00
Timo Moeller
5920a37a4c Add license info to German Bert models (#6242)
* Add xlm-r QA model card

* Add tags

* Add license info to german bert
2020-08-04 13:40:49 -04:00
Patrick von Platen
6c9ba1d8fc [Reformer] Make random seed generator available on random seed and not on model device (#6244)
* improve if else statement random seeds

* Apply suggestions from code review

* Update src/transformers/modeling_reformer.py
2020-08-04 13:22:43 -04:00
Sam Shleifer
d5b0a0e235 mBART Conversion script (#6230) 2020-08-04 09:53:51 -04:00
Stas Bekman
268bf34630 typo (#6225) 2020-08-04 09:31:49 -04:00
Patrick von Platen
7f65daa2e1 fix reformer fp16 (#6237) 2020-08-04 13:02:25 +02:00
Andrés Felipe Cruz
7ea9b2db37 Encoder decoder config docs (#6195)
* Adding docs for how to load encoder_decoder pretrained model with individual config objects

* Adding docs for loading encoder_decoder config from pretrained folder

* Fixing  W293 blank line contains whitespace

* Update src/transformers/modeling_encoder_decoder.py

* Update src/transformers/modeling_encoder_decoder.py

* Update src/transformers/modeling_encoder_decoder.py

* Apply suggestions from code review

model file should only show examples for how to load save model

* Update src/transformers/configuration_encoder_decoder.py

* Update src/transformers/configuration_encoder_decoder.py

* fix space

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-08-04 09:23:28 +02:00
Lysandre Debut
1d5c3a3d96 Test with --no-cache-dir (#6235) 2020-08-04 03:20:19 -04:00
Sam Shleifer
6730ecdd3c Remove redundant coverage (#6224) 2020-08-04 02:59:21 -04:00
Stas Bekman
5deed37f9f cleanup torch unittests (#6196)
* improve unit tests

this is a sample of one test according to the request in https://github.com/huggingface/transformers/issues/5973
before I apply it to the rest

* batch 1

* batch 2

* batch 3

* batch 4

* batch 5

* style

* non-tf template

* last deletion of check_loss_output
2020-08-04 02:42:56 -04:00
Gong Linyuan
b390a5672a Make the order of additional special tokens deterministic (#5704)
* Make the order of additional special tokens deterministic regardless of hash seeds

* Fix
2020-08-04 02:38:30 -04:00
Lysandre Debut
d740351f7d Upgrade pip when doing CI (#6234)
* Upgrade pip when doing CI

* Don't forget Github CI
2020-08-04 02:37:12 -04:00
Sam Shleifer
57eb1cb68d [s2s] Document better mbart finetuning command (#6229)
* Document better MT command

* improve multigpu command
2020-08-03 18:22:31 -04:00
Victor SANH
0513f8d275 correct label extraction + add note on discrepancies on trained MNLI model and HANS (#6221) 2020-08-03 15:02:51 -04:00
Kevin Canwen Xu
3c289fb38c Remove outdated BERT tips (#6217)
* Remove out-dated BERT tips

* Update modeling_outputs.py

* Update bert.rst

* Update bert.rst
2020-08-04 01:17:56 +08:00
Sylvain Gugger
e4920c92d6 Doc pipelines (#6175)
* Init work on pipelines doc

* Work in progress

* Work in progress

* Doc pipelines

* Rm unwanted default

* Apply suggestions from code review

Lysandre comments

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-08-03 11:44:46 -04:00
Sam Shleifer
b6b2f2270f s2s: fix LR logging, remove some dead code. (#6205) 2020-08-03 10:36:26 -04:00
Maurice Gonzenbach
06f1692b02 Fix _shift_right function in TFT5PreTrainedModel (#6214) 2020-08-03 16:21:23 +02:00
Suraj Patil
0b41867357 fix labels (#6213) 2020-08-03 10:19:35 -04:00
Jay Mody
cedc547e7e Adds train_batch_size, eval_batch_size, and n_gpu to to_sanitized_dict output for logging. (#5331)
* Adds train_batch_size, eval_batch_size, and n_gpu to to_sanitized_dict() output

* Update wandb config logging to use to_sanitized_dict

* removed n_gpu from sanitized dict

* fix quality check errors
2020-08-03 09:00:39 -04:00
Julien Plu
9996f697e3 Fix saved model creation (#5468)
* Fix TF Serving when output_hidden_states and output_attentions are True

* Add tests for saved model creation + bug fix for multiple choices models

* remove unused import

* Fix the input for several layers

* Fix test

* Fix conflict printing

* Apply style

* Fix XLM and Flaubert for TensorFlow

* Apply style

* Fix TF check version

* Apply style

* Trigger CI
2020-08-03 08:10:40 -04:00
Teven
5a0dac53bf Empty assert hunt (#6056)
* Fixed empty asserts

* black-reformatted stragglers in templates

* More code quality checks

* Update src/transformers/convert_marian_to_pytorch.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/convert_marian_to_pytorch.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* removed unused line as per @sshleifer

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-03 10:19:03 +02:00
Martin Müller
16c2240164 Add script to convert tf2.x checkpoint to PyTorch (#5791)
* Add script to convert tf2.x checkpoint to pytorch

The script converts the newer TF2.x checkpoints (as published on their official GitHub: https://github.com/tensorflow/models/tree/master/official/nlp/bert) to Pytorch.

* rename file in order to stay consistent with naming convention
2020-08-03 03:53:38 -04:00
Philip May
82a0e2b67e Fix docstring for BertTokenizerFast (#6185)
- remove duplicate doc-entry for tokenize_chinese_chars
- add doc for strip_accents and wordpieces_prefix
2020-08-02 15:58:26 +08:00
Stas Bekman
d8dbf3b75d [s2s] clean up + doc (#6184)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-01 14:51:07 -04:00
Faiaz Rahman
a39dfe4fb1 Fixed typo in Longformer (#6180) 2020-08-01 18:20:48 +08:00
Joe Davison
8edfaaa81b bart-large-mnli-yahoo-answers model card (#6133)
* Add bart-large-mnli-yahoo-answers model card

* Add examples

* Add widget example

* Rm bart tag

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-31 10:56:32 -04:00
Sylvain Gugger
d951c14ae4 Model output test (#6155)
* Use return_dict=True in all tests

* Formatting
2020-07-31 09:44:37 -04:00
Sylvain Gugger
86caab1e0b Harmonize both Trainers API (#6157)
* Harmonize both Trainers API

* Fix test

* main_prcess -> process_zero
2020-07-31 09:43:23 -04:00
Mehrdad Farahani
603cd81a01 readme m3hrdadfi/albert-fa-base-v2 (#6153)
* readme m3hrdadfi/albert-fa-base-v2

model_card readme for m3hrdadfi/albert-fa-base-v2

* Update model_cards/m3hrdadfi/albert-fa-base-v2/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-31 06:19:06 -04:00
Suraj Patil
838dc06ff5 parse arguments from dict (#4869)
* add parse_dict to parse arguments from dict

* add unit test for parse_dict
2020-07-31 04:44:23 -04:00
Paul O'Leary McCann
cf3cf304ca Replace mecab-python3 with fugashi for Japanese tokenization (#6086)
* Replace mecab-python3 with fugashi

This replaces mecab-python3 with fugashi for Japanese tokenization. I am
the maintainer of both projects.

Both projects are MeCab wrappers, so the underlying C++ code is the
same. fugashi is the newer wrapper and doesn't use SWIG, so for basic
use of the MeCab API it's easier to use.

This code insures the use of a version of ipadic installed via pip,
which should make versioning and tracking down issues easier.

fugashi has wheels for Windows, OSX, and Linux, which will help with
issues with installing old versions of mecab-python3 on Windows.
Compared to mecab-python3, because fugashi doesn't use SWIG, it doesn't
require a C++ runtime to be installed on Windows.

In adding this change I removed some code dealing with `cursor`,
`token_start`, and `token_end` variables. These variables didn't seem to
be used for anything, it is unclear to me why they were there.

I ran the tests and they passed, though I couldn't figure out how to run
the slow tests (`--runslow` gave an error) and didn't try testing with
Tensorflow.

* Style fix

* Remove unused variable

Forgot to delete this...

* Adapt doc with install instructions

* Fix typo

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-31 04:41:14 -04:00
Stas Bekman
f250beb8aa enable easy checkout switch (#5645)
* enable easy checkout switch

allow having multiple repository checkouts and not needing to remember to rerun 'pip install -e .[dev]' when switching between checkouts and running tests.

* make isort happy

* examples needs one too
2020-07-31 04:34:46 -04:00
kolk
7d50af4b02 Create README.md (#6169) 2020-07-31 04:28:35 -04:00
Prajjwal Bhargava
0034a1d248 Add Pytorch Native AMP support in Trainer (#6151)
* fixed type; add Pytorch Native CUDA AMP support

* reverted commit on modeling_utils

* confirming to HF black formatting rule

* changed bool value of _use_apex

* scaler support for gradient clipping

* fix inplace operation of clip_grad_norm

* removed not while version comparison
2020-07-31 04:23:29 -04:00
Funtowicz Morgan
7231f7b503 Enable ONNX/ONNXRuntime optimizations through converter script (#6131)
* Add onnxruntime transformers optimization support

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added Optimization section in ONNX/ONNXRuntime documentation.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve note reference

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fixing imports order.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Add warning about different level of optimization between torch and tf export.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Address @LysandreJik wording suggestion

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address @LysandreJik wording suggestion

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Always optimize model before quantization for maximum performances.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Address comments on the documentation.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve TensorFlow optimization message as suggested by @yufenglee

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Removed --optimize parameter

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Warn the user about current quantization limitation when model is larger than 2GB.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Trigger CI for last check

* Small change in print for the optimization section.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-31 09:45:13 +02:00
Stas Bekman
c0b93a1c7a correct the correction (#6163) 2020-07-30 18:00:02 -04:00
Stas Bekman
a2f6d521c1 typos (#6162)
* 2 small typos

* more typos

* correct path
2020-07-30 17:18:27 -04:00
Sylvain Gugger
f3065abdb8 Doc tokenizer (#6110)
* Start doc tokenizers

* Tokenizer documentation

* Start doc tokenizers

* Tokenizer documentation

* Formatting after rebase

* Formatting after merge

* Update docs/source/main_classes/tokenizer.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address comment

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address Thom's comments

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-07-30 14:51:19 -04:00
guillaume-be
e642c78908 Addition of a DialoguePipeline (#5516)
* initial commit for pipeline implementation

Addition of input processing and history concatenation

* Conversation pipeline tested and working for single & multiple conversation inputs

* Added docstrings for dialogue pipeline

* Addition of dialogue pipeline integration tests

* Delete test_t5.py

* Fixed max code length

* Updated styling

* Fixed test broken by formatting tools

* Removed unused import

* Added unit test for DialoguePipeline

* Fixed Tensorflow compatibility

* Fixed multi-framework support using framework flag

* - Fixed docstring
- Added `min_length_for_response` as an initialization parameter
- Renamed `*args` to `conversations`, `conversations` being a `Conversation` or a `List[Conversation]`
- Updated truncation to truncate entire segments of conversations, instead of cutting in the middle of a user/bot input

* - renamed pipeline name from dialogue to conversational
- removed hardcoded default value of 1000 and use config.max_length instead
- added `append_response` and `set_history` method to the Conversation class to avoid direct fields mutation
- fixed bug in history truncation method

* - Updated ConversationalPipeline to accept only active conversations (otherwise a ValueError is raised)

* - Simplified input tensor conversion

* - Updated attention_mask value for Tensorflow compatibility

* - Updated last dialogue reference to conversational & fixed integration tests

* Fixed conflict with master

* Updates following review comments

* Updated formatting

* Added Conversation and ConversationalPipeline to the library __init__, addition of docstrings for Conversation, added both to the docs

* Update src/transformers/pipelines.py

Updated docsting following review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-30 14:11:39 -04:00
Lysandre Debut
ec0267475c Fix FlauBERT GPU test (#6142)
* Fix GPU test

* Remove legacy constructor
2020-07-30 11:11:48 -04:00
Sylvain Gugger
91cb95461e Switch from return_tuple to return_dict (#6138)
* Switch from return_tuple to return_dict

* Fix test

* [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614)

* Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests

* AutoModels


Tiny tweaks

* Style

* Final changes before merge

* Re-order for simpler review

* Final fixes

* Addressing @sgugger's comments

* Test MultipleChoice

* Rework TF trainer (#6038)

* Fully rework training/prediction loops

* fix method name

* Fix variable name

* Fix property name

* Fix scope

* Fix method name

* Fix tuple index

* Fix tuple index

* Fix indentation

* Fix variable name

* fix eval before log

* Add drop remainder for test dataset

* Fix step number + fix logging datetime

* fix eval loss value

* use global step instead of step + fix logging at step 0

* Fix logging datetime

* Fix global_step usage

* Fix breaking loop + logging datetime

* Fix step in prediction loop

* Fix step breaking

* Fix train/test loops

* Force TF at least 2.2 for the trainer

* Use assert_cardinality to facilitate the dataset size computation

* Log steps per epoch

* Make tfds compliant with TPU

* Make tfds compliant with TPU

* Use TF dataset enumerate instead of the Python one

* revert previous commit

* Fix data_dir

* Apply style

* rebase on master

* Address Sylvain's comments

* Address Sylvain's and Lysandre comments

* Trigger CI

* Remove unused import

* Switch from return_tuple to return_dict

* Fix test

* Add recent model

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
2020-07-30 09:17:00 -04:00
Sylvain Gugger
562b6369c4 Tf trainer cleanup (#6143)
* Clean up TFTrainer

* Add import

* Fix conflicts
2020-07-30 09:13:16 -04:00
Oren Amsalem
c127d055e6 add another e.g. to avoid confusion (#6055) 2020-07-30 08:53:35 -04:00
Oren Amsalem
d24ea708d7 Actually the extra_id are from 0-99 and not from 1-100 (#5967)
a = tokenizer.encode("we got a <extra_id_99>", return_tensors='pt',add_special_tokens=True)
print(a)
>tensor([[   62,   530,     3,     9, 32000]])
a = tokenizer.encode("we got a <extra_id_100>", return_tensors='pt',add_special_tokens=True)
print(a)
>tensor([[   62,   530,     3,     9,     3,     2, 25666,   834,    23,    26,
           834,  2915,  3155]])
2020-07-30 06:13:29 -04:00
Stas Bekman
3212b8850d [s2s] add support for overriding config params (#6149) 2020-07-30 01:09:46 -04:00
Julien Plu
54f9fbeff8 Rework TF trainer (#6038)
* Fully rework training/prediction loops

* fix method name

* Fix variable name

* Fix property name

* Fix scope

* Fix method name

* Fix tuple index

* Fix tuple index

* Fix indentation

* Fix variable name

* fix eval before log

* Add drop remainder for test dataset

* Fix step number + fix logging datetime

* fix eval loss value

* use global step instead of step + fix logging at step 0

* Fix logging datetime

* Fix global_step usage

* Fix breaking loop + logging datetime

* Fix step in prediction loop

* Fix step breaking

* Fix train/test loops

* Force TF at least 2.2 for the trainer

* Use assert_cardinality to facilitate the dataset size computation

* Log steps per epoch

* Make tfds compliant with TPU

* Make tfds compliant with TPU

* Use TF dataset enumerate instead of the Python one

* revert previous commit

* Fix data_dir

* Apply style

* rebase on master

* Address Sylvain's comments

* Address Sylvain's and Lysandre comments

* Trigger CI

* Remove unused import
2020-07-29 14:32:01 -04:00
Lysandre Debut
3f94170a10 [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614)
* Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests

* AutoModels


Tiny tweaks

* Style

* Final changes before merge

* Re-order for simpler review

* Final fixes

* Addressing @sgugger's comments

* Test MultipleChoice
2020-07-29 14:26:26 -04:00
Sylvain Gugger
8a8ae27617 Use google style to document properties (#6130)
* Use google style to document properties

* Update src/transformers/configuration_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-29 12:28:12 -04:00
Julien Plu
fc64559c45 Fix TF CTRL model naming (#6134) 2020-07-29 12:20:00 -04:00
Lysandre Debut
641b873c13 XLNet PLM Readme (#6121) 2020-07-29 11:38:15 -04:00
Timo Moeller
8d157c930b add deepset/xlm-roberta-large-squad2 model card (#6128)
* Add xlm-r QA model card

* Add tags
2020-07-29 17:34:16 +02:00
Funtowicz Morgan
6c002853a6 Added capability to quantize a model while exporting through ONNX. (#6089)
* Added capability to quantize a model while exporting through ONNX.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

We do not support multiple extensions

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Reformat files

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* More quality

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Ensure test_generate_identified_name compares the same object types

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added documentation everywhere on ONNX exporter

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use pathlib.Path instead of plain-old string

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use f-string everywhere

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use the correct parameters for black formatting

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use Python 3 super() style.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use packaging.version to ensure installed onnxruntime version match requirements

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fixing imports sorting order.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Missing raise(s)

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added quantization documentation

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix some spelling.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix bad list header format

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-07-29 13:21:29 +02:00
Sylvain Gugger
25de74ccfe Use FutureWarning to deprecate (#6111) 2020-07-29 05:20:53 -04:00
Funtowicz Morgan
640550fc7a ONNX documentation (#5992)
* Move torchscript and add ONNX documentation under modle_export

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Let's follow guidelines by the gurus: Renamed torchscript.rst to serialization.rst

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove previously introduced tree element

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* WIP doc

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* ONNX documentation

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix invalid link

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve spelling

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Final wording pass

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-07-29 11:02:35 +02:00
Sam Shleifer
92f8ce2ed6 Fix deebert tests (#6102) 2020-07-28 18:30:16 -04:00
Sam Shleifer
c49cd927f7 [Fix] position_ids tests again (#6100) 2020-07-28 18:29:35 -04:00
Sam Shleifer
40796c5801 [fix] add bart to LM_MAPPING (#6099) 2020-07-28 18:29:18 -04:00
Sam Shleifer
5abe50381a Fix #6096: MBartTokenizer's mask token (#6098) 2020-07-28 18:27:58 -04:00
Joe Davison
b1c8b76907 Fix zero-shot pipeline single seq output shape (#6104) 2020-07-28 14:46:03 -04:00
Lysandre Debut
06834bc332 Logs should not be hidden behind a logger.info (#6097) 2020-07-28 12:44:25 -04:00
Sam Shleifer
dafa296c95 [s2s] Delete useless method, log tokens_per_batch (#6081) 2020-07-28 11:24:23 -04:00
Tanmay Thakur
dc4755c6d5 create model-card for lordtt13/emo-mobilebert (#6030) 2020-07-28 10:00:23 -04:00
Sylvain Gugger
28931f81b7 Fix #6092 (#6093)
* Fix #6092

* Format
2020-07-28 09:48:39 -04:00
Manuel Romero
5e97c82940 Create README.md (#6076) 2020-07-28 09:36:00 -04:00
Clement
54f49af4ae Add inference widget examples (#5825) 2020-07-28 09:14:00 -04:00
Sylvain Gugger
0206efb4cf Make all data collators accept dict (#6065)
* Make all data collators accept dict

* Style
2020-07-28 09:08:20 -04:00
Sam Shleifer
31a5486e42 github issue template suggests who to tag (#5790)
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Teven <teven.lescao@gmail.com>
2020-07-28 08:41:27 -04:00
Stas Bekman
f0c70085c2 link to README.md (#6068)
* add a link to README.md

* Update README.md
2020-07-28 20:34:58 +08:00
Pavel Soriano
4f814fd587 [Model Card] camembert-base-squadFR-fquad-piaf (#6087) 2020-07-28 20:33:52 +08:00
Sam Shleifer
3c7fbf35a6 MBART: support summarization tasks where max_src_len > max_tgt_len (#6003)
* MBART: support summarization tasks

* fix test

* Style

* add tokenizer test
2020-07-28 08:18:11 -04:00
Tanmay Thakur
842eb45606 New Community NB Add (#5824)
Signed-off-by: lordtt13 <thakurtanmay72@yahoo.com>
2020-07-28 04:25:12 -04:00
Andrés Felipe Cruz
018d61fa24 Moving transformers package import statements to relative imports in some files (#5796)
* Moving rom transformers statements to relative imports in some files under src/

* Import order

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-28 04:19:17 -04:00
Lysandre Debut
7214954db4 Should return a tuple for serialization (#6061) 2020-07-28 03:14:31 -04:00
Sam Shleifer
7a68d40138 [s2s] Don't mention packed data in README (#6079) 2020-07-27 20:07:21 -04:00
Sam Shleifer
b7345d22d0 [fix] no warning for position_ids buffer (#6063) 2020-07-27 20:00:44 -04:00
Sam Shleifer
1e00ef681d [s2s] dont document packing because it hurts performance (#6077) 2020-07-27 18:26:00 -04:00
sgugger
9d0d3a6645 Pin TF while we wait for a fix 2020-07-27 18:03:09 -04:00
Ramsri Goutham Golla
769e6ba01f Create README.md (#6032)
Adding model card - readme
2020-07-27 16:25:37 -04:00
Sylvain Gugger
fd347e0da7 Add fire to setup.cfg to make isort happy (#6066) 2020-07-27 15:17:33 -04:00
Sam Shleifer
11792d7826 CL util to convert models to fp16 before upload (#5953) 2020-07-27 12:21:25 -04:00
Sam Shleifer
4302ace5bd [pack_dataset] don't sort before packing, only pack train (#5954) 2020-07-27 12:14:23 -04:00
Suraj Patil
c8bdf7f4ec Add new AutoModel classes in pipeline (#6062)
* use new AutoModel classed

* make style and quality
2020-07-27 11:50:08 -04:00
Cola
5779e5434d ✏️ Fix typo (#5734) 2020-07-27 10:55:15 -04:00
Suraj Patil
d1d15d6f2d [examples (seq2seq)] fix preparing decoder_input_ids for T5 (#5994) 2020-07-27 10:10:43 -04:00
Joe Davison
3deffc1d67 Zero shot classification pipeline (#5760)
* add initial zero-shot pipeline

* change default args

* update default template

* add label string splitting

* add str labels support, remove nli from name

* style

* add input validation and working tf defaults

* tests

* quality check

* add docstring to __call__

* add slow tests

* Change truncation to only_first

also lower precision on tests for readibility

* style
2020-07-27 09:42:58 -04:00
Sylvain Gugger
1246b20f6d Fix the return documentation rendering for all model outputs (#6022)
* Fix the return documentation rendering for all model outputs

* Formatting
2020-07-27 09:18:59 -04:00
Sylvain Gugger
3b64ad5d5c Remove unused file (#6023) 2020-07-27 08:31:24 -04:00
Xin Wen
b9b11795cf Update model_summary.rst (#5737)
Add '-' to make the reference of Transformer-XL more accurate and formal.
2020-07-27 05:34:02 -04:00
Gong Linyuan
b21993b362 Allow to set Adam beta1, beta2 in TrainingArgs (#5592)
* Add Adam beta1, beta2 to trainier

* Make style consistent
2020-07-27 05:31:37 -04:00
Pavel Soriano
7969e96f4a draft etalab QA model (#6040) 2020-07-27 05:15:08 -04:00
Vamsi995
a9585fd107 Model card for Vamsi/T5_Paraphrase_Paws (#6037)
* Model card for Vamsi/T5_Paraphrase_Paws

* Update model_cards/Vamsi/T5_Paraphrase_Paws/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-27 05:12:45 -04:00
Rodolfo De Nadai
f7f03b22dc Update README.md of my model (#6042) 2020-07-26 23:31:49 +02:00
Stas Bekman
fb0589a03d don't complain about missing W&B when WANDB_DISABLED=true (#6036)
* don't complain about missing W&B when WANDB_DISABLED=true

* reformat to elif

* typo
2020-07-26 14:29:54 -04:00
Stas Bekman
daa5dd1202 add a summary report flag for run_examples on CI (#6035)
Currently, it's hard to derive which example tests were run on CI, and which weren't. Adding `-rA` flag to `pytest`, will now include a summary like:

```
==================================================================== short test summary info =====================================================================
PASSED examples/test_examples.py::ExamplesTests::test_generation
PASSED examples/test_examples.py::ExamplesTests::test_run_glue
PASSED examples/test_examples.py::ExamplesTests::test_run_language_modeling
PASSED examples/test_examples.py::ExamplesTests::test_run_squad
FAILED examples/test_examples.py::ExamplesTests::test_run_pl_glue - AttributeError: 'Namespace' object has no attribute 'gpus'
============================================================ 1 failed, 4 passed, 8 warnings in 42.96s ============================================================
```
which makes it easier to validate whether some example is being covered by CI or not.
2020-07-26 14:09:14 -04:00
Sam Shleifer
c69ea5efc4 [CI] Don't test apex (#6021) 2020-07-24 15:34:16 -04:00
Sylvain Gugger
a884b7fa38 Update the new model template (#6019) 2020-07-24 14:15:37 -04:00
Julien Chaumond
295466aae6 [model_card] Sample input for rdenadai/BR_BERTo
cc @rdenadai
2020-07-24 14:14:10 -04:00
Manuel Romero
518361d69a Create model card for RuPERTa-base (#6016)
* Update README.md

* Update model_cards/mrm8488/RuPERTa-base/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-24 14:12:29 -04:00
Manuel Romero
bd51f0a7ab Create README.md (#5952) 2020-07-24 14:12:14 -04:00
Manuel Romero
87a779dfa8 Create README.md (#5951) 2020-07-24 14:12:09 -04:00
Rodolfo De Nadai
d115872b38 Create README.md (#6020)
* Create README.md

* Update README.md

* Update model_cards/rdenadai/BR_BERTo/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Update model_cards/rdenadai/BR_BERTo/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-24 14:11:14 -04:00
Sylvain Gugger
3996041d0a Fix question template (#6014) 2020-07-24 10:04:25 -04:00
Victor SANH
778e635fc9 [model_cards] roberta-base-finetuned-yelp-polarity (#6009)
* [model_cards] roberta-base-finetuned-yelp-polarity

* Update model_cards/VictorSanh/roberta-base-finetuned-yelp-polarity/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-24 09:45:21 -04:00
Funtowicz Morgan
614fef1691 Ensure OpenAI GPT position_ids is correctly initialized and registered at init. (#5773)
* Ensure OpenAI GPT position_ids is correctly initialized and registered as buffer at init.

This will make it compatible with TorchScript export.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix missing slice operator on the tensor data accessor.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Style.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fixed BertEmbedding position_ids buffer created at forward.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fixed MobileBertEmbedding position_ids buffer created at forward.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fixed XLM position_ids buffer created at forward.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-07-24 15:37:52 +02:00
Sylvain Gugger
3b44aa935a Model utils doc (#6005)
* Document TF modeling utils

* Document all model utils
2020-07-24 09:16:28 -04:00
sgugger
a540405213 Fix commit hash for stable doc 2020-07-24 09:07:40 -04:00
Qingqing Cao
fc0fe2a532 fix: model card readme clutter (#6008)
this removes the clutter line in the readme.md of model card `csarron/roberta-base-squad-v1`. It also fixes the result table.
2020-07-24 04:17:52 -04:00
Sylvain Gugger
f5b5c5bd7e Avoid unnecessary warnings when loading pretrained model (#5922)
* Avoid unnecessary warnings when loading pretrained model

* Fix test

* Add other keys to ignore

* keys_to_ignore_at_load -> authorized_missing_keys
2020-07-23 18:13:36 -04:00
Philip May
29afb5764f Bert german dbmdz uncased sentence stsb (#6000)
* Describe usage of sentence model

* fix typo usage

* add use and description to readme

* fix typo in readme

* readme formatting

* add training procedure to readme

* description name and company

* readme formatting

* dataset training readme

* typo

* readme
2020-07-23 17:56:45 -04:00
Qingqing Cao
2b5ef9706d Model cards: add roberta-base-squad-v1 and bert-base-uncased-squad-v1 (#6006)
* add: bert-base-uncased-squad-v1

* add: roberta-base-squad-v1
2020-07-23 17:53:47 -04:00
Sam Shleifer
9827d666eb MbartTokenizer: do not hardcode vocab size (#5998) 2020-07-23 15:41:14 -04:00
Sylvain Gugger
6e16195510 Fix #5974 (#5999) 2020-07-23 13:51:29 -04:00
Sylvain Gugger
e168488a74 Cleanup Trainer and expose customization points (#5982)
* Clean up Trainer and expose customization points

* Formatting

* eval_step -> prediction_step
2020-07-23 12:05:41 -04:00
Qingqing Cao
76f52324b1 add fine-tuned mobilebert squad v1 and squad v2 model cards (#5980)
* add mobilebert-uncased-squad-v2

* fix shell cmd, add creator info

* add mobilebert-uncased-squad-v1
2020-07-23 11:57:29 -04:00
GmailB
7e251ae039 Create README.md (#5989) 2020-07-23 11:41:33 -04:00
Sylvain Gugger
33d7506ea1 Update doc of the model page (#5985) 2020-07-22 18:14:57 -04:00
Sam Shleifer
c3206eef44 [test] partial coverage for train_mbart_enro_cc25.sh (#5976) 2020-07-22 14:34:49 -04:00
Stas Bekman
2c0da7803a minor doc fixes (#5831)
* minor doc fixes

correct superclass name and small grammar fixes

* correct the instance name in the error message

It appears to be `BaseTokenizer` from looking at:

`from tokenizers.implementations import BaseTokenizer as BaseTokenizerFast`

and not `Tokenizer` as it currently says.
2020-07-22 13:22:34 -04:00
Sam Shleifer
feeb956a19 [docs] Add integration test example to copy pasta template (#5961)
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-22 12:48:38 -04:00
Sam Shleifer
01116d3c5b T5 Model Cards (#5759)
* T5 Model Cards

* Fix paths

* Fix tags

* lang-en
2020-07-22 11:38:37 -04:00
Funtowicz Morgan
896300177b Expose padding_strategy on squad processor to fix QA pipeline performance regression (#5932)
* Attempt to fix the way squad_convert_examples_to_features pad the elements for the QA pipeline.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Quality

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make the code easier to read and avoid testing multiple test the same thing.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* missing enum value on truncation_strategy.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Rethinking for the easiest fix: expose the padding strategy on squad_convert_examples_to_features.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove unused imports.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-07-22 16:11:57 +02:00
Sam Shleifer
ae67b2439f [CI] Install examples/requirements.txt (#5956) 2020-07-21 21:07:48 -04:00
Sylvain Gugger
e714412fe6 Update doc to new model outputs (#5946)
* Update doc to new model outputs

* Fix outputs in quicktour
2020-07-21 18:13:55 -04:00
Sam Shleifer
ddd40b3211 [CI] self-scheduled runner tests examples/ (#5927) 2020-07-21 17:01:07 -04:00
Sam Shleifer
9dab39feea seq2seq/run_eval.py can take decoder_start_token_id (#5949) 2020-07-21 16:58:45 -04:00
Sam Shleifer
5b193b39b0 [examples/seq2seq]: add --label_smoothing option (#5919) 2020-07-21 16:51:39 -04:00
Sam Shleifer
95d1962b9c [Doc] explaining romanian postprocessing for MBART BLEU hacking (#5943) 2020-07-21 14:12:48 -04:00
Jannes
604a2355dc Create README.md (#5876) 2020-07-21 13:28:22 -04:00
Jannes
77c718edef Create README.md (#5873) 2020-07-21 13:28:06 -04:00
Jannes
325b277db9 Create README.md (#5874) 2020-07-21 13:27:30 -04:00
Jannes
d15be2216c Create README.md (#5879) 2020-07-21 13:27:13 -04:00
Jannes
f3e23dd90a Create README.md (#5878) 2020-07-21 13:20:47 -04:00
Jannes
8b01d15c05 Create README.md (#5877) 2020-07-21 13:20:43 -04:00
Jannes
05bddf304e Create README.md (#5875) 2020-07-21 13:20:32 -04:00
Jannes
783a0c7ee9 Create README.md (#5872) 2020-07-21 13:20:21 -04:00
Jannes
e7844d60c2 Create README.md (#5871) 2020-07-21 13:19:48 -04:00
tuner007
b1ee69763c Create README.md (#5864) 2020-07-21 13:15:07 -04:00
Manuel Romero
5f809e4976 Update README.md (#5857)
Add nlp dataset used
2020-07-21 13:14:27 -04:00
Manuel Romero
4215f59c99 Update README.md (#5856)
Add dataset used as it is now part of nlp package
2020-07-21 13:11:08 -04:00
Ali Hamdi Ali Fadel
1d72460d55 Add ComVE model cards (#5884)
* Add ComVE model cards

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-21 12:54:29 -04:00
Aditya Soni
ccbf74a685 typos in seq2seq/readme (#5937) 2020-07-21 09:44:59 -04:00
BatJedi
d32279438a Created model card for my extreme summarization model (#5839)
* Created model card for my extreme summarization model

* Update model_cards/yuvraj/xSumm/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-21 03:54:57 -04:00
BatJedi
abf5c56e9d Created model card for my summarization model (#5838)
* Created model card for my summarization model

* Update model_cards/yuvraj/summarizer-cnndm/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-21 03:54:14 -04:00
Manuel Romero
d73baeebc5 Create README.md (#5921)
- Maybe the result of this query answers the question You did some days ago @julien-c ;-)
2020-07-21 03:52:52 -04:00
Manuel Romero
50acfc8717 Create README.md (#5924) 2020-07-21 03:41:37 -04:00
Manuel Romero
7249533404 Create README.md (#5920) 2020-07-21 03:31:42 -04:00
Sylvain Gugger
4781afd045 Clarify arg class (#5916) 2020-07-20 19:47:06 -04:00
Qingqing Cao
8e0bcb56ec DataParallel fix: multi gpu evaluation (#5926)
The DataParallel training was fixed in https://github.com/huggingface/transformers/pull/5733, this commit also fixes the evaluation. It's more convenient when the user enables both `do_train` and `do_eval`.
2020-07-20 17:54:08 -04:00
Sylvain Gugger
a20969170b Add AlbertForPretraining to doc (#5914) 2020-07-20 17:53:21 -04:00
Sam Shleifer
f1a4e06f1f [Fix] seq2seq pack_dataset.py actually packs (#5913)
Huge MT speedup!
2020-07-20 15:18:26 -04:00
Sylvain Gugger
32883b310b Improve doc of use_cache (#5912)
* Improve doc of use_cache

* Update src/transformers/configuration_xlnet.py

Co-authored-by: Teven <teven.lescao@gmail.com>

Co-authored-by: Teven <teven.lescao@gmail.com>
2020-07-20 11:50:41 -04:00
Clement
9ccb45a263 Update gpt2-README.md 2020-07-20 11:40:33 -04:00
Clement
f19751117d Create gpt2-medium-README.md 2020-07-20 10:47:42 -04:00
Clement
511523672b Create gpt2-large-README.md 2020-07-20 10:47:27 -04:00
Clement
182c611934 Update gpt2-README.md 2020-07-20 10:47:11 -04:00
Clement
a9ae27cd0f add link to write with transformers to model card 2020-07-20 10:46:10 -04:00
Sam Shleifer
01c40db4f8 [cleanup] squad processor (#5868) 2020-07-20 10:44:10 -04:00
Stas Bekman
35cb101eae DataParallel fixes (#5733)
* DataParallel fixes:

1. switched to a more precise check
-        if self.args.n_gpu > 1:
+        if isinstance(model, nn.DataParallel):

2. fix tests - require the same fixup under DataParallel as the training module

* another fix
2020-07-20 09:29:12 -04:00
Pradhy729
290b6e18ac Trainer support for iterabledataset (#5834)
* Don't pass sampler for iterable dataset

* Added check for test and eval dataloaders.

* Formatting

* Don't pass sampler for iterable dataset

* Added check for test and eval dataloaders.

* Formatting

* Cleaner if nesting.

* Added test for trainer and iterable dataset

* Formatting for test

* Fixed import when torch is available only.

* Added require torch decorator to helper class

* Moved dataset class inside unittest

* Removed nested if and changed model in test

* Checking torch availability for IterableDataset
2020-07-20 09:07:37 -04:00
Julien Chaumond
82dd96cae7 [model_cards] Dataset ids are case-sensitive
cc @lhoestq @thomwolf

Also cc'ing model author @nreimers => Model pages now properly link to the dataset pages (and in the future, eval results, etc.)
2020-07-20 12:47:28 +02:00
Manuel Romero
b01a8844a9 Create README.md (#5813) 2020-07-20 04:06:42 -04:00
Alan deLevie
223bad242d fix typo in (#5893) 2020-07-20 03:53:03 -04:00
Alan deLevie
d441f8d29d fix typo in training_args_tf.py (#5894) 2020-07-20 03:48:22 -04:00
Sam Shleifer
09a2f40684 Seq2SeqDataset uses linecache to save memory by @Pradhy729 (#5792)
Co-authored-by: Pradhy729 <49659913+Pradhy729@users.noreply.github.com>
2020-07-18 13:57:33 -04:00
Teven
4b506a37e3 Xlnet outputs (#5883)
Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
2020-07-18 17:33:13 +02:00
Teven
a55809241f Revert "Xlnet outputs (#5881)" (#5882)
This reverts commit 13be487212.
2020-07-18 17:15:40 +02:00
Teven
13be487212 Xlnet outputs (#5881)
Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
2020-07-18 16:53:29 +02:00
Sebastian
eae6d8d14f Update tokenizers to 0.8.1.rc to fix Mac OS X issues (#5867) 2020-07-18 08:20:11 -04:00
Sam Shleifer
dad5e12e54 [seq2seq] distillation.py accepts trainer arguments (#5865) 2020-07-18 07:43:57 -04:00
Sam Shleifer
ba2400189b [seq2seq] MAX_LEN env var for MT commands (#5837) 2020-07-17 22:51:31 -04:00
Nathan Raw
529850ae7b Lightning Updates for v0.8.5 (#5798)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-07-17 22:43:06 -04:00
Teven
615be03f9d Revert "XLNet use_cache refactor (#5770)" (#5854)
This reverts commit 0b2da0e592.
2020-07-17 20:33:44 +02:00
Teven
0b2da0e592 XLNet use_cache refactor (#5770)
Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
2020-07-17 20:24:16 +02:00
Jannes
9750e1300c Create README.md (#5847) 2020-07-17 14:03:53 -04:00
Julien Chaumond
1bca4fbd39 [model_card] Fix metadata 2020-07-17 13:55:37 -04:00
Gianpaolo Di Pietro
a9d56a675a Added model card for neuraly/bert-base-italian-cased-sentiment (#5845)
* Added model card for neuraly/bert-base-italian-cased-sentiment

* Update model_cards/neuraly/bert-base-italian-cased-sentiment/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Gianpy15 <g.dipietro@neuraly.ai>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-17 13:50:49 -04:00
Patrick von Platen
12f14710ce [Model card] Bert2Bert
Add Rouge2 results
2020-07-17 18:22:05 +02:00
Patrick von Platen
9d37c56bab [Reformer] - Cache hidden states and buckets to speed up inference (#5578)
* fix merge rebase

* add intermediate reformer code

* save intermediate caching results

* save intermediate

* save intermediate results

* save intermediate

* upload next step

* fix generate tests

* make tests work

* add named tuple output

* Apply suggestions from code review

* fix use_cache for False case

* fix tensor to gpu

* fix tensor to gpu

* refactor

* refactor and make style
2020-07-17 16:17:42 +02:00
Patrick von Platen
0b6c255a95 [Model card] Bert2Bert (#5841)
* Create README.md

* Update README.md

* Update README.md

* Update README.md
2020-07-17 11:41:56 +02:00
Sam Shleifer
3d9556a72b [cleanups] make Marian save as Marian (#5830) 2020-07-17 02:54:25 -04:00
Sam Shleifer
e238e3d55a [seq2seq] Don't copy self.source in sortishsampler (#5818) 2020-07-17 01:53:25 -04:00
Bayartsogt Yadamsuren
2e4624b415 language tag addition on albert-mongolian (#5828)
* language tag addition on albert-mongolian

* Update model_cards/bayartsogt/albert-mongolian/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-17 01:40:38 -04:00
Manuel Romero
d088d744ad Create README.md (#5821) 2020-07-16 15:18:31 -04:00
Nick Doiron
233072fc1e dv-wave (#5823) 2020-07-16 15:13:51 -04:00
Sam Shleifer
283500ff9f [seq2seq] pack_dataset.py rewrites dataset in max_tokens format (#5819) 2020-07-16 14:06:49 -04:00
Manuel Romero
c45d7a707d Update README.md (#5812)
Fix missig "-" in meta data
2020-07-16 10:25:50 -04:00
Patrick von Platen
057411c56a fix longformer slow down (#5811) 2020-07-16 16:19:37 +02:00
Patrick von Platen
89a78be51f fix benchmark for longformer (#5808) 2020-07-16 15:15:10 +02:00
Patrick von Platen
aefc0c0429 fix benchmark non standard model (#5801) 2020-07-16 12:13:10 +02:00
Martin Müller
8ce610bc96 Update README.md (#5789) 2020-07-16 05:26:17 -04:00
Julien Chaumond
6b6d035d8f [model_card] illuin/lepetit 2020-07-16 03:50:47 -04:00
HuYong
d1f74b9aff ADD ERNIE model (#5763)
* ERNIE model card

* Update Readme.md

* Update Readme.md

* Update Readme.md

* Rename Readme.md to README.md

* Update README.md

* Update Readme.md

* Update README.md

* Rename Readme.md to README.md

* Update Readme.md

* Update Readme.md

* Rename Readme.md to README.md

* Update and rename Readme.md to README.md

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-07-16 11:03:05 +08:00
Clement
3b924fabee Create distilbert squad tags 2020-07-15 17:59:06 -04:00
Clement
067814102c fix readme 2020-07-15 17:50:46 -04:00
Clement
d179fd69ca test readme change 2020-07-15 17:48:22 -04:00
Manuel Romero
63761614eb Update README.md (#5776)
Add cherry picked example for the widget

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-15 16:19:21 -04:00
Manuel Romero
221e23c6c1 Create README.md (#5781)
* Create README.md

* Update model_cards/mrm8488/RoBasquERTa/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-15 16:17:25 -04:00
Manuel Romero
d4cda29af1 Create README.md (#5782)
* Create README.md

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-15 16:17:19 -04:00
Julien Chaumond
62ec28ce4f [model_cards] Fix pierreguillou/gpt2-small-portuguese 2020-07-15 22:14:52 +02:00
Pierre Guillou
a946724bbf metadata (#5758)
* metadata

* Update model_cards/pierreguillou/gpt2-small-portuguese/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-15 16:13:28 -04:00
Julien Chaumond
015dc51fe3 [model_card] bert-portuguese: add language meta
cc @rodrigonogueira4 @abiocapsouza @robertoalotufo

Also cc @piegu

Obrigado :)
2020-07-15 21:25:52 +02:00
Sam Shleifer
1a647abf0b [fix] check code quality (#5772) 2020-07-15 14:59:38 -04:00
Julien Chaumond
b23d3a5ad4 [model_cards] Switch all languages codes to ISO-639-{1,2,3} 2020-07-15 18:59:20 +02:00
Funtowicz Morgan
d533c7e9b9 [fix] T5 ONNX test: model.to(torch_device) (#5769)
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-07-15 10:11:22 -04:00
Sam Shleifer
d0486c8bc2 [cleanup] T5 test, warnings (#5761) 2020-07-15 08:23:22 -04:00
Patrick von Platen
ec0a945cf9 [AutoModels] Fix config params handling of all PT and TF AutoModels (#5665)
* fix auto model causal lm

* leverage given functionality

* apply unused kwargs to all auto models
2020-07-15 09:51:14 +02:00
Julien Chaumond
8ab565a4be [model_card] Fix syntax 2020-07-14 22:27:07 +02:00
Bashar Talafha
92dc959224 Update README.md (#5752) 2020-07-14 15:48:59 -04:00
Bashar Talafha
baf93b02c4 Update README.md (#5696) 2020-07-14 12:51:57 -04:00
Joe Davison
5d178954c9 tiny ppl doc typo fix (#5751) 2020-07-14 10:39:44 -06:00
Manuel Romero
ac921f0385 RuPERTa model card (#5743)
* Customize inference widget input

* Update model_cards/mrm8488/RuPERTa-base/README.md

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-07-14 22:58:45 +08:00
dartrevan
21c1fe5290 RuDR-BERT model card (#5698) 2020-07-14 22:51:53 +08:00
Doron Adler
2db1cc807b Norod78/hewiki-articles-distilGPT2py-il model card (#5735)
Model card for hewiki-articles-distilGPT2py-il
A tiny GPT2 model for generating Hebrew text
2020-07-14 22:50:44 +08:00
Pierre Guillou
dae244ad89 GPorTuguese-2 model card (#5744) 2020-07-14 22:48:52 +08:00
Sam Shleifer
b2505f7db7 Cleanup bart caching logic (#5640) 2020-07-14 06:13:05 -04:00
Sam Shleifer
838950ee44 [fix] mbart_en_ro_generate test now identical to fairseq (#5731) 2020-07-14 06:12:24 -04:00
Boris Dayma
4d5a8d6557 docs(wandb): explain how to use W&B integration (#5607)
* docs(wandb): explain how to use W&B integration

fix #5262

* Also mention TensorBoard

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-14 05:12:33 -04:00
Gunnlaugur Thor Briem
cd30f98fd2 doc: fix apparent copy-paste error in docstring (#5626) 2020-07-14 09:47:41 +02:00
as-stevens
f867000f56 [Reformer classification head] Implement the reformer model classification head for text classification (#5198)
* Reformer model head classification implementation for text classification

* Reformat the reformer model classification code

* PR review comments, and test case implementation for reformer for classification head changes

* CI/CD reformer for classification head test import error fix

* CI/CD test case implementation  added ReformerForSequenceClassification to all_model_classes

* Code formatting- fixed

* Normal test cases added for reformer classification head

* Fix test cases implementation for the reformer classification head

* removed token_type_id parameter from the reformer classification head

* fixed the test case for reformer classification head

* merge conflict with master fixed

* merge conflict, changed reformer classification to accept the choice_label parameter added in latest code

* refactored the the reformer classification head test code

* reformer classification head, common transform test cases fixed

* final set of the review comment, rearranging the reformer classes and docstring add to classification forward method

* fixed the compilation error and text case fix for reformer classification head

* Apply suggestions from code review

Remove unnecessary dup

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-07-14 09:16:22 +02:00
Gaurav Mishra
f0bda06f43 Update tokenization_t5.py (#5717)
Minor doc fix.
2020-07-14 00:02:03 -04:00
Sam Shleifer
c3c61ea017 [Fix] github actions CI by reverting #5138 (#5686) 2020-07-13 17:12:18 -04:00
Stas Bekman
45addfe96d FlaubertForTokenClassification (#5644)
* implement FlaubertForTokenClassification as a subclass of XLMForTokenClassification

* fix mapping order

* add the doc

* add common tests
2020-07-13 14:59:53 -04:00
Patrick von Platen
7096e47513 [Longformer] fix longformer global attention output (#5659)
* fix longformer global attention output

* fix multi gpu problem

* replace -10000 with 0

* better comment

* make attention output equal local and global

* Update src/transformers/modeling_longformer.py
2020-07-13 17:23:22 +02:00
Sylvain Gugger
ce374ba877 Fix Trainer in DataParallel setting (#5685)
* Fix Trainer in DataParallel setting

* Fix typo

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-07-13 08:37:38 -04:00
Stas Bekman
0a19a49dfe doc improvements (#5688) 2020-07-13 18:10:17 +08:00
Stas Bekman
443b0cad96 rename the function to match the rest of the test convention (#5692) 2020-07-13 18:09:49 +08:00
onepointconsulting
74843695eb Added first description of the model (#5672)
Added general description, information about the tags and also some example usage code.
2020-07-13 02:53:48 -04:00
Kevin Canwen Xu
0befb51327 Pipeline model type check (#5679)
* Add model type check for pipelines

* Add model type check for pipelines

* rename func

* Fix the init parameters

* Fix format

* rollback unnecessary refactor
2020-07-12 12:34:21 +08:00
Kevin Canwen Xu
dc31a72f50 Add Microsoft's CodeBERT (#5683)
* Add Microsoft's CodeBERT

* link style

* single modal

* unused import
2020-07-11 21:37:30 +08:00
Sylvain Gugger
7fad617dc1 Document model outputs (#5673)
* Document model outputs

* Update docs/source/main_classes/output.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-10 17:31:02 -04:00
Sylvain Gugger
df983b7483 Deprecate old past arguments (#5671) 2020-07-10 17:25:52 -04:00
Tomo Lazovich
cdf4cd7068 [squad] add version tag to squad cache (#5669) 2020-07-10 16:34:21 -04:00
Patrick von Platen
223084e42b Add Reformer to notebooks 2020-07-10 18:34:25 +02:00
Julien Chaumond
201d23f285 Update The Big Table of Tasks
Co-Authored-By: Suraj Patil <surajp815@gmail.com>
Co-Authored-By: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-10 18:07:29 +02:00
Bashar Talafha
82f7bbbd93 Update README.md (#5617)
* Update README.md

* Update README.md
2020-07-10 11:43:27 -04:00
Manuel Romero
bf497376ee Create README.md (#5572) 2020-07-10 11:42:49 -04:00
kolk
3653d01f2a Create README.md for electra-base-squad2 (#5574) 2020-07-10 11:39:44 -04:00
Txus
aa69c81f29 Add freshly trained base version (#5621) 2020-07-10 11:39:04 -04:00
Teven
227e0a406d Fixed use of memories in XLNet (caching for language generation + warning when loading improper memoryless model) (#5632)
* Pytorch gpu => cpu proper device

* Memoryless XLNet warning + fixed memories during generation

* Revert "Pytorch gpu => cpu proper device"

This reverts commit 93489b36

* made black happy

* TF generation with memories

* dim => axis

* added padding_text to TF XL models

* Added comment, added TF
2020-07-10 17:38:36 +02:00
Manuel Romero
3b7b646563 Create README.md (#5638) 2020-07-10 11:38:23 -04:00
Manuel Romero
0039b965db Create model card (#5655)
Create model card for T5-small fine-tuned on SQUAD v2
2020-07-10 11:38:11 -04:00
Nils Reimers
46982d612f Create README.md - Model card (#5657)
Model card for sentence-transformers/bert-base-nli-cls-token
2020-07-10 11:38:03 -04:00
Nils Reimers
c483803d1b Create README.md - Model card (#5658)
Model card for sentence-transformers/bert-base-nli-max-tokens
2020-07-10 11:37:56 -04:00
Sylvain Gugger
edfd82f5ff Change model outputs types to self-document outputs (#5438)
* [WIP] Proposal for model outputs

* All Bert models

* Make CI green maybe?

* Fix ONNX test

* Isolate ModelOutput from pt and tf

* Formatting

* Add Electra models

* Auto-generate docstrings from outputs

* Add TF outputs

* Add some BERT models

* Revert TF side

* Remove last traces of TF changes

* Fail with a clear error message

* Add Albert and work through Bart

* Add CTRL and DistilBert

* Formatting

* Progress on Bart

* Renames and finish Bart

* Formatting

* Fix last test

* Add DPR

* Finish Electra and add FlauBERT

* Add GPT2

* Add Longformer

* Add MMBT

* Add MobileBert

* Add GPT

* Formatting

* Add Reformer

* Add Roberta

* Add T5

* Add Transformer XL

* Fix test

* Add XLM + fix XLMForTokenClassification

* Style + XLMRoberta

* Add XLNet

* Formatting

* Add doc of return_tuple arg
2020-07-10 11:36:53 -04:00
Suraj Parmar
fa265230a2 Create Model card for RoBERTa-hindi-guj-san (#5661) 2020-07-10 11:34:23 -04:00
Sylvain Gugger
b2747af543 Improvements to PretrainedConfig documentation (#5642)
* Update PretrainedConfig doc

* Formatting

* Small fixes

* Forgotten args and more cleanup
2020-07-10 10:31:47 -04:00
Julien Chaumond
bfacb2e34f [model_card] BART for ELI5
cc @yjernite
2020-07-10 08:10:24 -04:00
Nils Reimers
2e6bb0e9c3 Create README.md (#5652) 2020-07-10 05:41:10 -04:00
Julien Chaumond
552e4591f5 [model_card] Add meta + fix link to image
(hotlinking to image works on GitHub but not on external sites)

cc @bashartalafha
2020-07-10 05:07:33 -04:00
Teven
02a0b43014 Fixed TextGenerationPipeline on torch + GPU (#5629)
* Pytorch gpu => cpu proper device

* Memoryless XLNet warning + fixed memories during generation

* Revert "Memoryless XLNet warning + fixed memories during generation"

This reverts commit 3d3251ff

* Took the operations on the generated_sequence out of the ensure_device scope
2020-07-09 16:29:32 -04:00
Sylvain Gugger
760f726e51 Add forum link in the docs (#5637) 2020-07-09 15:13:22 -04:00
Stas Bekman
bfeaae2235 fix 404 (#5616) 2020-07-09 15:12:29 -04:00
Lysandre Debut
b25f7802de Should check that torch TPU is available (#5636) 2020-07-09 13:54:32 -04:00
Lysandre Debut
3cc23eee06 More explicit error when failing to tensorize overflowing tokens (#5633) 2020-07-09 13:35:21 -04:00
Lysandre
b9d8af07e6 Update stable doc 2020-07-09 11:06:23 -04:00
Lysandre Debut
1158e56551 Correct extension (#5631) 2020-07-09 11:03:07 -04:00
Lysandre
5c82bf6831 Update stable doc 2020-07-09 10:16:13 -04:00
Lysandre Debut
0533cf4706 Test XLA examples (#5583)
* Test XLA examples

* Style

* Using `require_torch_tpu`

* Style

* No need for pytest
2020-07-09 09:19:19 -04:00
Funtowicz Morgan
3bd55199cd QA pipeline BART compatible (#5496)
* Ensure padding and question cannot have higher probs than context.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Add bart the the list of tokenizers adding two <sep> tokens for squad_convert_example_to_feature

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Format.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Addressing @patrickvonplaten comments.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Addressing @patrickvonplaten comments about masking non-context element when generating the answer.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Addressing @sshleifer comments.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make sure we mask CLS after handling impossible answers

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Mask in the correct vectors ...

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-07-09 15:11:40 +02:00
Stas Bekman
fa5423b169 doc fixes (#5613) 2020-07-08 19:52:44 -04:00
Txus
7d0ef00420 Add newly trained calbert-tiny-uncased (#5599)
* Create README.md

Add newly trained `calbert-tiny-uncased` (complete rewrite with SentencePiece)

* Add Exbert link

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-08 17:54:51 -04:00
Lorenzo Ampil
0cc4eae0e6 Fix Inconsistent NER Grouping (Pipeline) (#4987)
* Add B I handling to grouping

* Add fix to include separate entity as last token

* move last_idx definition outside loop

* Use first entity in entity group as reference for entity type

* Add test cases

* Take out extra class accidentally added

* Return tf ner grouped test to original

* Take out redundant last entity

* Get last_idx safely

Co-authored-by: ColleterVi <36503688+ColleterVi@users.noreply.github.com>

* Fix first entity comment

* Create separate functions for group_sub_entities and group_entities (splitting call method to testable functions)

* Take out unnecessary last_idx

* Remove additional forward pass test

* Move token classification basic tests to separate class

* Move token classification basic tests back to monocolumninputtestcase

* Move base ner tests to nerpipelinetests

* Take out unused kwargs

* Add back mandatory_keys argument

* Add unitary tests for group_entities in _test_ner_pipeline

* Fix last entity handling

* Fix grouping fucntion used

* Add typing to group_sub_entities and group_entities

Co-authored-by: ColleterVi <36503688+ColleterVi@users.noreply.github.com>
2020-07-08 16:18:17 -04:00
Suraj Patil
82ce8488bb create model cards for qg models (#5610) 2020-07-08 16:08:56 -04:00
Bashar Talafha
d6b6ab11f0 Create README.md (#5601) 2020-07-08 16:07:48 -04:00
Patrick von Platen
40d98ebf50 Update benchmark notebook (#5603)
* Créé avec Colaboratory

* delete old file
2020-07-08 16:03:59 +02:00
Sylvain Gugger
281e394889 Update question template (#5585) 2020-07-08 08:46:35 -04:00
Patrick von Platen
f82a2a5e8e [Benchmark] Add benchmarks for TF Training (#5594)
* tf_train

* adapt timing for tpu

* fix timing

* fix timing

* fix timing

* fix timing

* update notebook

* add tests
2020-07-08 12:11:09 +02:00
Ji Xin
cfbb982974 Add DeeBERT (entropy-based early exiting for *BERT) (#5477)
* Add deebert code

* Add readme of deebert

* Add test for deebert

Update test for Deebert

* Update DeeBert (README, class names, function refactoring); remove requirements.txt

* Format update

* Update test

* Update readme and model init methods
2020-07-08 08:17:59 +08:00
Joe Davison
b4b33fdf25 Guide to fixed-length model perplexity evaluation (#5449)
* add first draft ppl guide

* upload imgs

* expand on strides

* ref typo

* rm superfluous past var

* add tokenization disclaimer
2020-07-07 16:04:15 -06:00
Patrick von Platen
fde217c679 readme for benchmark (#5363) 2020-07-07 23:21:23 +02:00
Sam Shleifer
d6eab53058 mbart.prepare_translation_batch: pass through kwargs (#5581) 2020-07-07 13:46:05 -04:00
Sam Shleifer
353b8f1e7a Add mbart-large-cc25, support translation finetuning (#5129)
improve unittests for finetuning, especially w.r.t testing frozen parameters
fix freeze_embeds for T5
add streamlit setup.cfg
2020-07-07 13:23:01 -04:00
Julien Chaumond
141492448b Create xlm-roberta-large-finetuned-conll03-german-README.md
cc @BramVanroy
2020-07-07 13:15:10 -04:00
Patrick von Platen
4dc65591b5 [Almost all TF models] TF clean up: add missing CLM / MLM loss; fix T5 naming and keras compile (#5395)
* add first version of clm tf

* make style

* add more tests for bert

* update tf clm loss

* fix tests

* correct tf ner script

* add mlm loss

* delete bogus file

* clean tf auto model + add tests

* finish adding clm loss everywhere

* fix training in distilbert

* fix flake8

* save intermediate

* fix tf t5 naming

* remove prints

* finish up

* up

* fix tf gpt2

* fix new test utils import

* fix flake8

* keep backward compatibility

* Update src/transformers/modeling_tf_albert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_electra.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_roberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_mobilebert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_distilbert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply sylvains suggestions

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-07 18:15:53 +02:00
Suraj Patil
33e43edddc [docs] fix model_doc links in model summary (#5566)
* fix model_doc links

* update model links
2020-07-07 11:06:12 -04:00
Quentin Lhoest
4fedc1256c Fix tests imports dpr (#5576)
* fix test imports

* fix max_length

* style

* fix tests
2020-07-07 16:35:12 +02:00
Sam Shleifer
d4886173b2 [Bart] enable test_torchscript, update test_tie_weights (#5457)
* Passing all but one torchscript test

* Style

* move comment

* remove unneeded assert
2020-07-07 10:06:48 -04:00
Suraj Patil
e49393c361 [examples] Add trainer support for question-answering (#4829)
* add SquadDataset

* add DataCollatorForQuestionAnswering

* update __init__

* add run_squad with  trainer

* add DataCollatorForQuestionAnswering in __init__

* pass data_collator to trainer

* doc tweak

* Update run_squad_trainer.py

* Update __init__.py

* Update __init__.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-07 08:57:08 -04:00
Quentin Lhoest
fbd8792195 Add DPR model (#5279)
* beginning of dpr modeling

* wip

* implement forward

* remove biencoder + better init weights

* export dpr model to embed model for nlp lib

* add new api

* remove old code

* make style

* fix dumb typo

* don't load bert weights

* docs

* docs

* style

* move the `k` parameter

* fix init_weights

* add pretrained configs

* minor

* update config names

* style

* better config

* style

* clean code based on PR comments

* change Dpr to DPR

* fix config

* switch encoder config to a dict

* style

* inheritance -> composition

* add messages in assert startements

* add dpr reader tokenizer

* one tokenizer per model

* fix base_model_prefix

* fix imports

* typo

* add convert script

* docs

* change tokenizers conf names

* style

* change tokenizers conf names

* minor

* minor

* fix wrong names

* minor

* remove unused convert functions

* rename convert script

* use return_tensors in tokenizers

* remove n_questions dim

* move generate logic to tokenizer

* style

* add docs

* docs

* quality

* docs

* add tests

* style

* add tokenization tests

* DPR full tests

* Stay true to the attention mask building

* update docs

* missing param in bert input docs

* docs

* style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-07-07 08:56:12 -04:00
Savaş Yıldırım
d2a9399115 Update model card (#5491) 2020-07-07 18:43:49 +08:00
Savaş Yıldırım
2e653d89d7 Update model card (#5492) 2020-07-07 18:43:34 +08:00
Savaş Yıldırım
beaf60e589 bert-turkish-text-classification model card (#5493) 2020-07-07 18:43:09 +08:00
Manuel Romero
e6eba8419c electra-small-finetuned-squadv1 model card (#5430)
* Create model card

Create model card for electra-small-discriminator finetuned on SQUAD v1.1

* Set right model path in code example
2020-07-07 18:41:42 +08:00
Vitalii Radchenko
43b7ad5df5 ukr-roberta-base model card (#5514) 2020-07-07 18:40:23 +08:00
Manuel Romero
87aa857d7e roberta-base-1B-1-finetuned-squadv1 model card (#5515) 2020-07-07 18:39:09 +08:00
Moseli Motsoehli
c7d96b60e4 zuBERTa model card (#5536)
* Create README

* Update README.md

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-07-07 18:38:15 +08:00
Manuel Romero
b95dfcf110 roberta-base-1B-1-finetuned-squadv2 model card (#5523) 2020-07-07 18:33:42 +08:00
Abel
6912265711 Make T5 compatible with ONNX (#5518)
* Default decoder inputs to encoder ones for T5 if neither are specified.

* Fixing typo, now all tests are passing.

* Changing einsum to operations supported by onnx

* Adding a test to ensure T5 can be exported to onnx op>9

* Modified test for onnx export to make it faster

* Styling changes.

* Styling changes.

* Changing notation for matrix multiplication

Co-authored-by: Abel Riboulot <tkai@protomail.com>
2020-07-07 11:32:29 +02:00
Patrick von Platen
989ae326b5 [Reformer] Adapt Reformer MaskedLM Attn mask (#5560)
* fix attention mask

* fix slow test

* refactor attn masks

* fix fp16 generate test
2020-07-07 10:48:06 +02:00
Shashank Gupta
3dcb748e31 Added data collator for permutation (XLNet) language modeling and related calls (#5522)
* Added data collator for XLNet language modeling and related calls

Added DataCollatorForXLNetLanguageModeling in data/data_collator.py
to generate necessary inputs for language modeling training with
XLNetLMHeadModel. Also added related arguments, logic and calls in
examples/language-modeling/run_language_modeling.py.

Resolves: #4739, #2008 (partially)

* Changed name to `DataCollatorForPermutationLanguageModeling`

Changed the name of `DataCollatorForXLNetLanguageModeling` to the more general `DataCollatorForPermutationLanguageModelling`.
Removed the `--mlm` flag requirement for the new collator and defined a separate `--plm_probability` flag for its use.
CTRL uses a CLM loss just like GPT and GPT-2, so should work out of the box with this script (provided `past` is taken care of
similar to `mems` for XLNet).
Changed calls and imports appropriately.

* Added detailed comments, changed variable names

Added more detailed comments to `DataCollatorForPermutationLanguageModeling` in `data/data_collator.py` to explain working. Also cleaned up variable names and made them more informative.

* Added tests for new data collator

Added tests in `tests/test_trainer.py` for DataCollatorForPermutationLanguageModeling based on those in DataCollatorForLanguageModeling. A specific test has been added to check for odd-length sequences.

* Fixed styling issues
2020-07-07 10:17:37 +02:00
Lysandre
1d2332861f Post v3.0.2 release commit 2020-07-06 18:56:47 -04:00
Lysandre
b0892fa0e8 Release: v3.0.2
Some checks failed
GitHub-hosted runner / check_code_quality (push) Has been cancelled
2020-07-06 18:49:44 -04:00
Sylvain Gugger
f1e2e423ab Fix fast tokenizers too (#5562) 2020-07-06 18:45:01 -04:00
Anthony MOI
5787e4c159 Various tokenizers fixes (#5558)
* BertTokenizerFast - Do not specify strip_accents by default

* Bump tokenizers to new version

* Add test for AddedToken serialization
2020-07-06 18:27:53 -04:00
Sylvain Gugger
21f28c34b7 Fix #5507 (#5559)
* Fix #5507

* Fix formatting
2020-07-06 17:26:48 -04:00
Lysandre Debut
9d9b872b66 The add_space_before_punct_symbol is only for TransfoXL (#5549) 2020-07-06 12:17:05 -04:00
Lysandre Debut
d6b0b9d451 GPT2 tokenizer should not output token type IDs (#5546)
* GPT2 tokenizer should not output token type IDs

* Same for OpenAIGPT
2020-07-06 11:33:57 -04:00
Sylvain Gugger
7833b21a5a Fix #5544 (#5551) 2020-07-06 11:22:24 -04:00
Thomas Wolf
c473484087 Fix the tokenization warning noted in #5505 (#5550)
* fix warning

* style and quality
2020-07-06 11:15:25 -04:00
Lysandre
1bbc28bee7 Imports organization 2020-07-06 10:27:10 -04:00
Mohamed Taher Alrefaie
1bc13697b1 Update convert_pytorch_checkpoint_to_tf2.py (#5531)
fixed ImportError: cannot import name 'hf_bucket_url'
2020-07-06 09:55:10 -04:00
Arnav Sharma
b2309cc6bf Typo fix in training doc (#5495) 2020-07-06 09:15:22 -04:00
ELanning
7ecff0ccbb Fix typo in training (#5510) 2020-07-06 09:14:57 -04:00
Sam Shleifer
58cca47c16 [cleanup] TF T5 tests only init t5-base once. (#5410) 2020-07-03 14:27:49 -04:00
Patrick von Platen
991172922f better error message (#5497) 2020-07-03 19:25:25 +02:00
Thomas Wolf
b58a15a31e unpining specific git versions in setup.py 2020-07-03 17:38:39 +02:00
Thomas Wolf
fedabcd154 Release: 3.0.1
Some checks failed
GitHub-hosted runner / check_code_quality (push) Has been cancelled
2020-07-03 17:02:44 +02:00
Lysandre Debut
17ade127b9 Exposing prepare_for_model for both slow & fast tokenizers (#5479)
* Exposing prepare_for_model for both slow & fast tokenizers

* Update method signature

* The traditional style commit

* Hide the warnings behind the verbose flag

* update default truncation strategy and prepare_for_model

* fix tests and prepare_for_models methods

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-07-03 16:51:21 +02:00
Manuel Romero
814ed7ee76 Create model card (#5396)
Create model card for electicidad-small (Spanish Electra) fine-tuned on SQUAD-esv1
2020-07-03 08:29:09 -04:00
Moseli Motsoehli
49281ac939 grammar corrections and train data update (#5448)
- fixed grammar and spelling
- added an intro
- updated Training data references
2020-07-03 08:25:57 -04:00
chrisliu
97355339f6 Update upstream (#5456) 2020-07-03 08:16:27 -04:00
Manuel Romero
55b932a818 Create model card (#5464)
Create model card for electra-small-discriminator fine-tuned on SQUAD v2.0
2020-07-03 06:19:49 -04:00
Funtowicz Morgan
21cd8c4086 QA Pipelines fixes (#5429)
* Make QA pipeline supports models with more than 2 outputs such as BART assuming start/end are the two first outputs.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* When using the new padding/truncation paradigm setting padding="max_length" + max_length=X actually pads the input up to max_length.

This result in every sample going through QA pipelines to be of size 384 whatever the actual input size is making the overall pipeline very slow.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Mask padding & question before applying softmax. Softmax has been refactored to operate in log space for speed and stability.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Format.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use PaddingStrategy.LONGEST instead of DO_NOT_PAD

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Revert "When using the new padding/truncation paradigm setting padding="max_length" + max_length=X actually pads the input up to max_length."

This reverts commit 1b00a9a2

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Trigger CI after unattended failure

* Trigger CI
2020-07-03 10:29:20 +02:00
Pierric Cistac
8438bab38e Fix roberta model ordering for TFAutoModel (#5414) 2020-07-02 19:23:55 -04:00
Sylvain Gugger
6b735a7253 Tokenizer summary (#5467)
* Work on tokenizer summary

* Finish tutorial

* Link to it

* Apply suggestions from code review

Co-authored-by: Anthony MOI <xn1t0x@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add vocab definition

Co-authored-by: Anthony MOI <xn1t0x@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-02 17:07:42 -04:00
Shen
ef0e9d806c Update: ElectraDiscriminatorPredictions forward. (#5471)
`ElectraDiscriminatorPredictions.forward` should not need `attention_mask`.
2020-07-02 13:57:33 -04:00
Manuel Romero
13a8588f2d Create model card (#5432)
Create model card for electra-base-discriminator fine-tuned on SQUAD v1.1
2020-07-02 10:16:30 -04:00
Julien Chaumond
a0a6387a0d [model_cards] roberta-large-mnli: fix sep_token 2020-07-02 10:04:02 -04:00
Julien Chaumond
215db688da Create roberta-large-mnli-README.md 2020-07-02 09:43:54 -04:00
Lysandre Debut
69d313e808 Bans SentencePiece 0.1.92 (#5418) 2020-07-02 09:23:00 -04:00
George Ho
84e56669af Fix typo in glossary (#5466) 2020-07-02 09:19:33 -04:00
Teven
c6a510c6fa Fixing missing arguments for TransfoXL tokenizer when using TextGenerationPipeline (#5465)
* overriding _parse_and_tokenize in `TextGenerationPipeine` to allow for TransfoXl tokenizer arguments
2020-07-02 13:53:33 +02:00
Teven
6726416e4a Changed expected_output_ids in TransfoXL generation test (#5462)
* Changed expected_output_ids in TransfoXL generation test to match #4826 generation PR.

* making black happy

* making isort happy
2020-07-02 11:56:44 +02:00
tommccoy
812def00c9 fix use of mems in Transformer-XL (#4826)
Fixed duplicated memory use in Transformer-XL generation leading to bad predictions and performance.
2020-07-02 11:19:07 +02:00
Patrick von Platen
306f1a2695 Add Reformer MLM notebook (#5450)
* Add Reformer MLM notebook

* Update notebooks/README.md
2020-07-02 00:20:49 +02:00
Patrick von Platen
d16e36c7e5 [Reformer] Add Masked LM Reformer (#5426)
* fix conflicts

* fix

* happy rebasing
2020-07-01 22:43:18 +02:00
Funtowicz Morgan
f4323dbf8c Don't discard entity_group when token is the latest in the sequence. (#5439)
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-07-01 20:30:42 +02:00
Joe Davison
35befd9ce3 Fix tensor label type inference in default collator (#5250)
* allow tensor label inputs to default collator

* replace try/except with type check
2020-07-01 10:40:14 -06:00
Patrick von Platen
fe81f7d12c finish reformer qa head (#5433) 2020-07-01 12:27:14 -04:00
Patrick von Platen
d697b6ca75 [Longformer] Major Refactor (#5219)
* refactor naming

* add small slow test

* refactor

* refactor naming

* rename selected to extra

* big global attention refactor

* make style

* refactor naming

* save intermed

* refactor functions

* finish function refactor

* fix tests

* fix longformer

* fix longformer

* fix longformer

* fix all tests but one

* finish longformer

* address sams and izs comments

* fix transpose
2020-07-01 17:43:32 +02:00
Sam Shleifer
e0d58ddb65 [fix] Marian tests import (#5442) 2020-07-01 11:42:22 -04:00
Funtowicz Morgan
608d5a7c44 Raises PipelineException on FillMaskPipeline when there are != 1 mask_token in the input (#5389)
* Added PipelineException

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* fill-mask pipeline raises exception when more than one mask_token detected.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Put everything in a function.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added tests on pipeline fill-mask when input has != 1 mask_token

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix numel() computation for TF

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Addressing PR comments.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove function typing to avoid import on specific framework.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Quality.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Retry typing with @julien-c tip.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Quality².

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Simplify fill-mask mask_token checking.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Trigger CI
2020-07-01 17:27:47 +02:00
Sylvain Gugger
6c55e9fc32 Fix dropdown bug in searches (#5440)
* Trigger CI

* Fix dropdown bug in searches
2020-07-01 11:02:59 -04:00
Sylvain Gugger
734a28a767 Clean up diffs in Trainer/TFTrainer (#5417)
* Cleanup and unify Trainer/TFTrainer

* Forgot to adapt TFTrainingArgs

* In tf scripts n_gpu -> n_replicas

* Update src/transformers/training_args.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* Formatting

* Fix typo

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-01 11:00:20 -04:00
Sam Shleifer
43cb03a93d MarianTokenizer.prepare_translation_batch uses new tokenizer API (#5182) 2020-07-01 10:32:50 -04:00
Sam Shleifer
13deb95a40 Move tests/utils.py -> transformers/testing_utils.py (#5350) 2020-07-01 10:31:17 -04:00
sgugger
9c219305f5 Trigger CI 2020-07-01 10:22:50 -04:00
Sylvain Gugger
64e3d966b1 Add support for past states (#5399)
* Add support for past states

* Style and forgotten self

* You mean, documenting is not enough? I have to actually add it too?

* Add memory support during evaluation

* Fix tests in eval and add TF support

* No need to change this line anymore
2020-07-01 08:11:55 -04:00
Sylvain Gugger
4ade7491f4 Fix examples titles and optimization doc page (#5408) 2020-07-01 08:11:25 -04:00
Moseli Motsoehli
d60d231ea4 Create README.md (#5422)
* Create README.md

* Update model_cards/MoseliMotsoehli/TswanaBert/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-01 05:01:51 -04:00
Jay
298bdab18a Create model card for schmidek/electra-small-cased (#5400) 2020-07-01 04:01:56 -04:00
Julien Plu
fcf0652460 Fix TensorFlow dataset generator (#4881)
* fix TensorFlow generator

* Better features handling

* Apply style

* Apply style

* Fix squad as well

* Apply style

* Better factorization of TF Tensors creation
2020-06-30 19:49:11 -04:00
Hong Xu
501040fd30 In the run_ner.py example, give the optional label arg a default value (#5326)
Otherwise, if label is not specified, the following error occurs:

	Traceback (most recent call last):
	  File "run_ner.py", line 303, in <module>
	    main()
	  File "run_ner.py", line 101, in main
	    model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
	  File "/home/user/anaconda3/envs/bert/lib/python3.7/site-packages/transformers/hf_argparser.py", line 159, in parse_json_file
	    obj = dtype(**inputs)
	TypeError: __init__() missing 1 required positional argument: 'labels'
2020-06-30 19:45:35 -04:00
Sam Shleifer
b45e65efa0 Avoid deprecation warning for F.tanh (#5413) 2020-06-30 16:41:43 -04:00
Sam Shleifer
23231c0f78 [GH Runner] fix yaml indent (#5412) 2020-06-30 16:17:12 -04:00
Sam Shleifer
ac61114592 [CI] gh runner doesn't use -v, cats new result (#5409) 2020-06-30 16:12:14 -04:00
Sam Shleifer
27a7fe7a8d examples/seq2seq: never override $WANDB_PROJECT (#5407) 2020-06-30 15:29:13 -04:00
Sam Shleifer
32d2031458 [fix] slow fill_mask test failure (#5406) 2020-06-30 15:28:15 -04:00
Sam Shleifer
80aa4b8aa6 [CI] GH-runner stores artifacts like CircleCI (#5318) 2020-06-30 15:01:53 -04:00
Sylvain Gugger
87716a6d07 Documentation for the Trainer API (#5383)
* Documentation for the Trainer API

* Address review comments

* Address comments
2020-06-30 11:43:43 -04:00
Yacine Jernite
c4d4e8bdbd Move GenerationMixin to separate file (#5254)
* separate_generation_code

* isort

* renamed

* rename_files

* move_shapelit
2020-06-30 10:42:08 -04:00
Lysandre
90d13954c4 Repin versions 2020-06-30 09:16:36 -04:00
Sylvain Gugger
0607b88945 How to share model cards with the CLI (#5374)
* How to share model cards

* Switch the two options

* Fix bad copy/cut

* Julien's suggestion
2020-06-30 08:59:32 -04:00
Kevin Canwen Xu
331d8d2936 Upload DistilBART artwork (#5394) 2020-06-30 18:11:11 +08:00
Manuel Romero
09e841490c Model Card Fixing (#5369)
- Fix missing ```-``` in language meta
- T5 pic uploaded to a more permanent place
2020-06-30 18:02:24 +08:00
Manuel Romero
4c5bed192a Model Card Fixing (#5373)
- T5 pic uploaded to a more permanent place
2020-06-30 18:01:45 +08:00
Manuel Romero
02509d4b06 Model Card Fixing (#5371)
- Model pic uploaded to a more permanent place
2020-06-30 18:01:11 +08:00
Manuel Romero
79f0118c72 Model Card Fixing (#5370)
- Fix missing ```-``` in language meta
- T5 pic uploaded to a more permanent place
2020-06-30 18:00:29 +08:00
MichaelJanz
9a473f1e43 Update Bertabs example to work again (#5355)
* Fix the bug 'Attempted relative import with no known parent package' when using the bertabs example. Also change the used model from bertabs-finetuned-cnndm, since it seems not be accessible anymore

* Update run_summarization.py

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-06-30 14:05:01 +08:00
Sylvain Gugger
7f60e93ac5 Mention openAI model card and merge content (#5378)
* Mention openAI model card and merge content

* Fix sentence
2020-06-29 18:27:36 -04:00
chrisliu
482a5993c2 Fix model card folder name so that it is consistent with model hub (#5368)
* Merge upstream

* Merge upstream

* Add generate.py link

* Merge upstream

* Merge upstream

* Fix folder name
2020-06-29 12:54:30 -04:00
chrisliu
97f24303e8 Add link to file and fix typos in model card (#5367)
* Merge upstream

* Merge upstream

* Add generate.py link
2020-06-29 11:34:52 -04:00
Lysandre Debut
b9ee87f5c7 Doc for v3.0.0 (#5366)
* Doc for v3.0.0

* Update docs/source/_static/js/custom.js

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/_static/js/custom.js

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-06-29 11:08:54 -04:00
Lysandre
b62ca59527 Release: v3.0.0
Some checks failed
GitHub-hosted runner / check_code_quality (push) Has been cancelled
2020-06-29 10:40:13 -04:00
Sam Shleifer
a316a6aaa8 [seq2seq docs] Move evaluation down, fix typo (#5365) 2020-06-29 10:36:04 -04:00
Patrick von Platen
4bcc35cd69 [Docs] Benchmark docs (#5360)
* first doc version

* add benchmark docs

* fix typos

* improve README

* Update docs/source/benchmarks.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* fix naming and docs

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-29 16:08:57 +02:00
Sylvain Gugger
482c9178d3 Pin mecab for now (#5362) 2020-06-29 09:51:13 -04:00
Clement
2513fe0d02 added subtitle for recent contributors in readme (#5130) 2020-06-29 09:05:08 -04:00
Manuel Romero
30245c0c60 Fix table format fot test tesults (#5357) 2020-06-29 09:02:33 -04:00
Manuel Romero
c34010551a Create model card (#5356) 2020-06-29 09:01:55 -04:00
Ali Safaya
01aa0b8527 Create README.md (#5353) 2020-06-29 08:58:30 -04:00
chrisliu
96907367f1 arxiv-ai-gpt2 model card (#5337)
* Add model card and generation script for model arxiv_ai_gpt2

* Update arxiv-ai-gpt2 model card

Remove unnecessary lines

* Delete code in model cards
2020-06-29 08:53:20 -04:00
Ali Safaya
3cdf8b7ec2 Create model card for asafaya/bert-mini-arabic (#5352)
* Create README.md

* Update model_cards/asafaya/bert-mini-arabic/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-29 08:41:41 -04:00
Ali Safaya
9db1f41604 Create README.md (#5351) 2020-06-29 08:36:00 -04:00
Julien Chaumond
c950fef545 [docs] Small tweaks to #5323 2020-06-29 14:24:33 +02:00
Sylvain Gugger
4544f906e2 model cards for roberta and bert-multilingual (#5324)
* More model cards (cc @myleott)

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-29 05:06:05 -04:00
sgugger
92671532e7 More model cards 2020-06-29 10:58:54 +02:00
Pradhy729
9209d36f93 Added a model card README.md for my pretrained model. (#5325)
* Create README.md

* Removed unnecessary link from README.md

* Update README.md
2020-06-29 16:29:14 +08:00
Julien Plu
7cb52f53ef Fix LR decay in TF Trainer (#5269)
* Recover old PR

* Apply style

* Trigger CI
2020-06-29 14:38:32 +08:00
krevas
321c05abab Model cards for finance-koelectra models (#5313)
* Add finance-koelectra readme card

* Add finance-koelectra readme card

* Add finance-koelectra readme card

* Add finance-koelectra readme card
2020-06-29 13:47:44 +08:00
Sam Shleifer
28a690a80e [mBART] skip broken forward pass test, stronger integration test (#5327) 2020-06-28 15:08:28 -04:00
Sam Shleifer
45e26125de save_pretrained: mkdir(exist_ok=True) (#5258)
* all save_pretrained methods mkdir if not os.path.exists
2020-06-28 14:53:47 -04:00
Suraj Patil
12dfbd4f7a [examples] fix example links (#5344) 2020-06-28 12:54:54 -04:00
Patrick von Platen
98109464c1 clean reformer reverse sort (#5343) 2020-06-28 14:32:25 +02:00
Sylvain Gugger
1af58c0706 New model sharing tutorial (#5323) 2020-06-27 11:10:02 -04:00
Sylvain Gugger
efae6645e2 Fix xxx_length behavior when using XLNet in pipeline (#5319) 2020-06-27 11:09:51 -04:00
Sam Shleifer
393b8dc09a examples/seq2seq/run_eval.py fixes and docs (#5322) 2020-06-26 19:20:43 -04:00
Sam Shleifer
5543b30aa6 [pl_examples] default warmup steps=0 (#5316) 2020-06-26 15:03:41 -04:00
Sam Shleifer
bf0d12c220 CircleCI stores cleaner output at test_outputs.txt (#5291) 2020-06-26 13:59:31 -04:00
Thomas Wolf
601d4d699c [tokenizers] Updates data processors, docstring, examples and model cards to the new API (#5308)
* remove references to old API in docstring - update data processors

* style

* fix tests - better type checking error messages

* better type checking

* include awesome fix by @LysandreJik for #5310

* updated doc and examples
2020-06-26 19:48:14 +02:00
Kevin Canwen Xu
fd405e9a93 Add BART-base modeling and configuration (#5315) 2020-06-27 00:53:10 +08:00
Sam Shleifer
798dbff6a7 [pipelines] Change summarization default to distilbart-cnn-12-6 (#5289) 2020-06-26 11:43:23 -04:00
Patrick von Platen
834b6884c5 Add benchmark notebook (#5312)
* add notebook

* Créé avec Colaboratory

* move notebook to correct folder

* correct link

* correct filename

* correct filename

* better name
2020-06-26 17:38:13 +02:00
Patrick von Platen
08c9607c3d [Generation] fix docs for decoder_input_ids (#5306)
* fix docs

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_tf_utils.py

* Update src/transformers/modeling_tf_utils.py

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_tf_utils.py

* Update src/transformers/modeling_utils.py
2020-06-26 16:58:11 +02:00
Patrick von Platen
79a82cc06a [Benchmarks] improve Example Plotter (#5245)
* improve plotting

* better labels

* fix time plot
2020-06-26 15:00:14 +02:00
Sylvain Gugger
88d7f96e33 Gpt2 model card (#5283)
* Bert base model card

* Add metadata

* Adapt examples

* GPT2 model card

* Remove the BERT model card

* Change language code
2020-06-26 08:08:31 -04:00
Sylvain Gugger
fc5bce9e60 Bert base model card (#5276)
* Bert base model card

* Add metadata

* Adapt examples

* Comment on text generation

* Update model_cards/bert-base-uncased-README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-26 08:01:19 -04:00
Funtowicz Morgan
135791e8ef Add pad_to_multiple_of on tokenizers (reimport) (#5054)
* Add new parameter `pad_to_multiple_of` on tokenizers.

* unittest for pad_to_multiple_of

* Add .name when logging enum.

* Fix missing .items() on dict in tests.

* Add special check + warning if the tokenizer doesn't have proper pad_token.

* Use the correct logger format specifier.

* Ensure tokenizer with no pad_token do not modify the underlying padding strategy.

* Skip test if tokenizer doesn't have pad_token

* Fix RobertaTokenizer on empty input

* Format.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* fix and updating to simpler API

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-06-26 11:55:57 +02:00
Lysandre Debut
7cc15bdd96 Closes #5218 2020-06-25 18:19:21 -04:00
Joe Davison
2ffef0d0c7 Training & fine-tuning quickstart (#5034)
* add initial fine-tuning guide

* split code blocks to smaller segments

* fix up trianer section of fine-tune doc

* a few last typos

* Update usage -> task summary link

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-06-25 15:11:11 -06:00
Lysandre Debut
364a5ae1f0 Refactor Code samples; Test code samples (#5036)
* Refactor code samples

* Test docstrings

* Style

* Tokenization examples

* Run rust of tests

* First step to testing source docs

* Style and BART comment

* Test the remainder of the code samples

* Style

* let to const

* Formatting fixes

* Ready for merge

* Fix fixture + Style

* Fix last tests

* Update docs/source/quicktour.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Addressing @sgugger's comments + Fix MobileBERT in TF

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-06-25 16:46:00 -04:00
Thomas Wolf
315f464b0a [tokenizers] Several small improvements and bug fixes (#5287)
* avoid recursion in id checks for fast tokenizers

* better typings and fix #5232

* align slow and fast tokenizers behaviors for Roberta and GPT2

* style and quality

* fix tests - improve typings
2020-06-25 22:17:14 +02:00
Sylvain Gugger
24f46ea3f3 Remove links for all docs (#5280) 2020-06-25 11:45:05 -04:00
Thomas Wolf
27cf1d97f0 [Tokenization] Fix #5181 - make #5155 more explicit - move back the default logging level in tests to WARNING (#5252)
* fix-5181

Padding to max sequence length while truncation to another length was wrong on slow tokenizers

* clean up and fix #5155

* fix XLM test

* Fix tests for Transfo-XL

* logging only above WARNING in tests

* switch slow tokenizers tests in @slow

* fix Marian truncation tokenization test

* style and quality

* make the test a lot faster by limiting the sequence length used in tests
2020-06-25 17:24:28 +02:00
Sam Shleifer
e008d520bb [examples/seq2seq] more README improvements (#5274) 2020-06-25 10:13:01 -04:00
Julien Chaumond
6a495cae00 [model_cards] Example of how to specify inputs for the widget 2020-06-25 15:58:25 +02:00
Anthony MOI
0e1fce3c01 Fix convert_graph_to_onnx (#5230) 2020-06-25 08:17:02 +02:00
Moumeneb1
5543efd5cc Create README.md (#5259) 2020-06-25 01:56:07 -04:00
Sam Shleifer
40457bcebb examples/seq2seq supports translation (#5202) 2020-06-24 23:58:11 -04:00
Sylvain Gugger
d12ceb48ba Tokenization tutorial (#5257)
* All done

* Link to the tutorial

* Typo fixes

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Add metnion of the return_xxx args

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-06-24 18:43:20 -04:00
Thomas Wolf
7ac9110711 Add more tests on tokenizers serialization - fix bugs (#5056)
* update tests for fast tokenizers + fix small bug in saving/loading

* better tests on serialization

* fixing serialization

* comment cleanup
2020-06-24 21:53:08 +02:00
Sylvain Gugger
0148c262e7 Fix first test (#5255) 2020-06-24 15:16:04 -04:00
Sylvain Gugger
70c1e1d2d5 Use master _static (#5253)
* Use _static from master everywhere

* Copy to existing too
2020-06-24 15:06:14 -04:00
Victor SANH
4965aee064 [HANS] Fix label_list for RoBERTa/BART (class flipping) (#5196)
* fix weirdness in roberta/bart for mnli trained checkpoints

* black compliance

* isort code check
2020-06-24 14:38:15 -04:00
Julien Chaumond
fc24a93e64 [HfApi] Add support for pipeline_tag 2020-06-24 16:54:00 +00:00
Setu Shah
0a3d0e02c5 Replace labels with -100 to skip loss calc (#4718) 2020-06-24 12:14:50 -04:00
Sylvain Gugger
6894b486d0 Fix version controller links (for realsies) (#5251) 2020-06-24 12:13:43 -04:00
Sai Saketh Aluru
1121ce9f98 Model cards for Hate-speech-CNERG models (#5236)
* Add dehatebert-mono-arabic readme card

* Update dehatebert-mono-arabic model card

* model cards for Hate-speech-CNERG models
2020-06-24 11:41:08 -04:00
Lysandre Debut
cf10d4cfdd Cleaning TensorFlow models (#5229)
* Cleaning TensorFlow models

Update all classes


stylr

* Don't average loss
2020-06-24 11:37:20 -04:00
Sylvain Gugger
609e0c583f Fix links (#5248) 2020-06-24 11:35:55 -04:00
Ali Modarressi
c9163a8d5a delay decay schedule until the end of warmup (#4940) 2020-06-24 11:18:29 -04:00
Sylvain Gugger
f216b60671 Fix deploy doc (#5246)
* Try with the same command

* Try like this
2020-06-24 10:59:06 -04:00
Sylvain Gugger
49f6e7a3c6 Add some prints to debug (#5244) 2020-06-24 10:37:01 -04:00
Patrick von Platen
c2a26ec8a6 [Use cache] Align logic of use_cache with output_attentions and output_hidden_states (#5194)
* fix use cache

* add bart use cache

* fix bart

* finish bart
2020-06-24 16:09:17 +02:00
Sylvain Gugger
64c393ee74 Don't recreate old docs (#5243) 2020-06-24 09:59:07 -04:00
Patrick von Platen
b29683736a fix print in benchmark (#5242) 2020-06-24 15:58:49 +02:00
Patrick von Platen
9fe09cec76 [Benchmark] Extend Benchmark to all model type extensions (#5241)
* add benchmark for all kinds of models

* improved import

* delete bogus files

* make style
2020-06-24 15:11:42 +02:00
Sylvain Gugger
7c41057d50 Add hugs (#5225) 2020-06-24 07:56:14 -04:00
Sylvain Gugger
5e85b324ec Use the script in utils (#5224) 2020-06-24 07:55:58 -04:00
flozi00
5e31a98ab7 Create README.md (#5108)
* Create README.md

* Update model_cards/a-ware/roberta-large-squad-classification/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-24 04:45:51 -04:00
Adriano Diniz
033124e5f8 Update README.md (#5199)
Fix/add information in README.md
2020-06-24 04:42:46 -04:00
ahotrod
7ca6627ec3 Create README.md (#5217)
electra_large_discriminator_squad2_512 Question Answering LM
2020-06-24 04:40:50 -04:00
Kevin Canwen Xu
54e9ce785d Fix PABEE division by zero error (#5233)
* Fix PABEE division by zero error

* patience=0 by default
2020-06-24 16:10:36 +08:00
Sylvain Gugger
9022ef021a Only put tensors on a device (#5223)
* Only put tensors on a device

* Type hint and unpack list comprehension
2020-06-23 17:30:17 -04:00
Sylvain Gugger
173528e368 Add version control menu (#5222)
* Add version control menu

* Constify things

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-23 17:05:12 -04:00
Sam Shleifer
76e5af4cfd [pl_examples] revert deletion of optimizer_step (#5227) 2020-06-23 16:40:45 -04:00
Julien Chaumond
c01480bba3 [file_utils] Type user-agent 2020-06-23 18:31:13 +02:00
Sam Shleifer
58918c76f4 [bart] add config.extra_pos_embeddings to facilitate reuse (#5190) 2020-06-23 11:35:42 -04:00
Thomas Wolf
b28b537131 More clear error message in the use-case of #5169 (#5184) 2020-06-23 13:37:29 +02:00
Thomas Wolf
11fdde0271 Tokenizers API developments (#5103)
* Add return lengths

* make pad a bit more flexible so it can be used as collate_fn

* check all kwargs sent to encoding method are known

* fixing kwargs in encodings

* New AddedToken class in python

This class let you specify specifique tokenization behaviors for some special tokens. Used in particular for GPT2 and Roberta, to control how white spaces are stripped around special tokens.

* style and quality

* switched to hugginface tokenizers library for AddedTokens

* up to tokenizer 0.8.0-rc3 - update API to use AddedToken state

* style and quality

* do not raise an error on additional or unused kwargs for tokenize() but only a warning

* transfo-xl pretrained model requires torch

* Update src/transformers/tokenization_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-23 13:36:57 +02:00
Patrick von Platen
1ae132a07d [Reformer] Axial Pos Emb Improve mem usage reformer (#5209)
* improve mem handling

* improve mem for pos ax encodings
2020-06-23 10:49:18 +02:00
Sam Shleifer
5144104070 [fix] remove unused import (#5206) 2020-06-22 23:39:04 -04:00
Sam Shleifer
0d158e38c9 [fix] mobilebert had wrong path, causing slow test failure (#5205) 2020-06-22 23:31:36 -04:00
Sam Shleifer
f5c2a122e3 Upgrade examples to pl=0.8.1(#5146) 2020-06-22 20:40:10 -04:00
flozi00
06b60c8b05 [Modelcard] bart-squadv2 (#5011)
* [Modelcard] bart-squadv2

* Update README.md

* Update README.md
2020-06-22 18:40:19 -04:00
flozi00
35e0687256 Create README.md (#5013) 2020-06-22 18:40:00 -04:00
Fran Martinez
22d2c8ea2f Create README.md for finetuned BERT model (#5009)
* Create README.md

* changes in model usage section

* minor changes in output visualization

* minor errata in readme
2020-06-22 18:39:29 -04:00
furunkel
2589505693 Add model card for StackOBERTflow-comments-small (#5008)
* Create README.md

* Update README.md
2020-06-22 18:39:22 -04:00
bogdankostic
d8c26ed139 Specify dataset used for crossvalidation (#5175) 2020-06-22 18:26:12 -04:00
Adriano Diniz
a34fb91d54 Create README.md (#5149) 2020-06-22 18:00:53 -04:00
Adriano Diniz
ffabcf5249 Create README.md (#5160) 2020-06-22 17:59:54 -04:00
Adriano Diniz
3363a19b12 Create README.md (#5152)
* Create README.md

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-22 17:59:33 -04:00
Michaël Benesty
0cca61925c Add link to new comunity notebook (optimization) (#5195)
* Add link to new comunity notebook (optimization)

related to https://github.com/huggingface/transformers/issues/4842#event-3469184635

This notebook is about benchmarking model training with/without dynamic padding optimization. 
https://github.com/ELS-RD/transformers-notebook 

Using dynamic padding on MNLI provides a **4.7 times training time reduction**, with max pad length set to 512. The effect is strong because few examples are >> 400 tokens in this dataset. IRL, it will depend of the dataset, but it always bring improvement and, after more than 20 experiments listed in this [article](https://towardsdatascience.com/divide-hugging-face-transformers-training-time-by-2-or-more-21bf7129db9q-21bf7129db9e?source=friends_link&sk=10a45a0ace94b3255643d81b6475f409), it seems to not hurt performance.

Following advice from @patrickvonplaten I do the PR myself :-)

* Update notebooks/README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-06-22 23:47:33 +02:00
Lee Haau-Sing
1c5cd8e5f5 Add README.md (nyu-mll) (#5174)
* nyu-mll: roberta on smaller datasets

* Update README.md

* Update README.md

Co-authored-by: Alex Warstadt <alexwarstadt@gmail.com>
2020-06-22 17:24:27 -04:00
Sylvain Gugger
c439752482 Switch master/stable doc and add older releases (#5193) 2020-06-22 16:38:53 -04:00
Sylvain Gugger
417e492f1e Quick tour (#5145)
* Quicktour part 1

* Update

* All done

* Typos

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address comments in quick tour

* Update docs/source/quicktour.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update from feedback

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-22 16:08:09 -04:00
Thomas Wolf
75e1eed8d1 Cleaner warning when loading pretrained models (#4557)
* Cleaner warning when loading pretrained models

This make more explicit logging messages when using the various `from_pretrained` methods. It also make these messages as `logging.warning` because it's a common source of silent mistakes.

* Update src/transformers/modeling_utils.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* style and quality

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-22 21:58:47 +02:00
Lysandre Debut
4e741efa92 Have documentation fail on warning (#5189)
* Have documentation fail on warning

* Force ci failure

* Revert "Force ci failure"

This reverts commit f0a4666ec2eb4cd00a4da48af3357defc63324a0.
2020-06-22 15:49:50 -04:00
Sylvain Gugger
1262495a91 Add TF auto model to the docs + fix sphinx warnings (#5187) 2020-06-22 14:43:52 -04:00
Adriano Diniz
88429c57bc Create README.md (#5165) 2020-06-22 13:49:14 -04:00
Manuel Romero
76ee9c8bc9 Create README.md (#5107)
* Create README.md

@julien-c check out that dataset meta tag is right

* Fix typo

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-22 13:47:30 -04:00
Manuel Romero
bf493d5569 Model card for t5-base-finetuned-emotion (recognition) (#5179) 2020-06-22 13:45:45 -04:00
Patrick von Platen
e9ef21175e improve doc (#5185) 2020-06-22 19:00:11 +02:00
Thomas Wolf
ebc36108dc [tokenizers] Fix #5081 and improve backward compatibility (#5125)
* fix #5081 and improve backward compatibility (slightly)

* add nlp to setup.cfg - style and quality

* align default to previous default

* remove test that doesn't generalize
2020-06-22 17:25:43 +02:00
Malte
d2a7c86dc3 Check if text is set to avoid IndexError (#4209)
Fix for https://github.com/huggingface/transformers/issues/3809
2020-06-22 11:09:05 -04:00
Iz Beltagy
90f4b24520 Add support for gradient checkpointing in BERT (#4659)
* add support for gradient checkpointing in BERT

* fix unit tests

* isort

* black

* workaround for `torch.utils.checkpoint.checkpoint` not accepting bool

* Revert "workaround for `torch.utils.checkpoint.checkpoint` not accepting bool"

This reverts commit 5eb68bb804f5ffbfc7ba13c45a47717f72d04574.

* workaround for `torch.utils.checkpoint.checkpoint` not accepting bool

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-22 10:47:14 -04:00
Joseph Liu
f4e1f02210 Output hidden states (#4978)
* Configure all models to use output_hidden_states as argument passed to foward()

* Pass all tests

* Remove cast_bool_to_primitive in TF Flaubert model

* correct tf xlnet

* add pytorch test

* add tf test

* Fix broken tests

* Configure all models to use output_hidden_states as argument passed to foward()

* Pass all tests

* Remove cast_bool_to_primitive in TF Flaubert model

* correct tf xlnet

* add pytorch test

* add tf test

* Fix broken tests

* Refactor output_hidden_states for mobilebert

* Reset and remerge to master

Co-authored-by: Joseph Liu <joseph.liu@coinflex.com>
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-06-22 10:10:45 -04:00
Kevin Canwen Xu
866a8ccabb Add model cards for Microsoft's MiniLM (#5178)
* Add model cards for Microsoft's MiniLM

* XLMRobertaTokenizer

* format

* Add thumbnail

* finishing up
2020-06-22 21:48:14 +08:00
RafaelWO
b99ad457f4 Added feature to move added tokens in vocabulary for Transformer-XL (#4953)
* Fixed resize_token_embeddings for transfo_xl model

* Fixed resize_token_embeddings for transfo_xl.

Added custom methods to TransfoXLPreTrainedModel for resizing layers of
the AdaptiveEmbedding.

* Updated docstring

* Fixed resizinhg cutoffs; added check for new size of embedding layer.

* Added test for resize_token_embeddings

* Fixed code quality

* Fixed unchanged cutoffs in model.config

* Added feature to move added tokens in tokenizer.

* Fixed code quality

* Added feature to move added tokens in tokenizer.

* Fixed code quality

* Fixed docstring, renamed sym to 	oken.

Co-authored-by: Rafael Weingartner <rweingartner.its-b2015@fh-salzburg.ac.at>
2020-06-22 15:40:52 +02:00
Sylvain Gugger
eb0ca71ef6 Update glossary (#5148)
* Update glossary

* Update docs/source/glossary.rst

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-06-22 08:30:49 -04:00
Patrick von Platen
fa0be6d761 Benchmarks (#4912)
* finish benchmark

* fix isort

* fix setup cfg

* retab

* fix time measuring of tf graph mode

* fix tf cuda

* clean code

* better error message
2020-06-22 12:06:56 +02:00
Zihao Fu
18a0150bfa fix bart doc (#5132)
fix bart doc
2020-06-22 10:58:28 +02:00
Mikael Souza
3fe75c7f70 Fixing docs for Encoder Decoder Config (#5171) 2020-06-22 10:51:17 +02:00
flozi00
59345cc87f Typo (#5147) 2020-06-22 10:49:23 +02:00
Ilya Boytsov
bc3a0c0607 [examples] fixes arguments for summarization finetune scripts (#5157)
Authored-by: i.boytsov <i.boytsov@MAC867.local>
2020-06-21 11:51:21 -04:00
Tim Suchanek
68e19f1c22 Fix typo in root README (#5073) 2020-06-20 23:00:04 +08:00
Kevin Canwen Xu
c0c577cf8f Fix PABEE's result table (#5158) 2020-06-20 22:56:39 +08:00
Julien Chaumond
aa6a29bc25 SummarizationPipeline: init required task name (#5086)
* SummarizationPipeline: init required task name

* Update src/transformers/pipelines.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Apply suggestions from code review

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-06-20 03:16:30 -04:00
Kevin Canwen Xu
2fd28d4363 Add BERT Loses Patience (Patience-based Early Exit) (#5078)
* Add BERT Loses Patience (Patience-based Early Exit)

* update model archive

* update format

* sort import

* flake8

* Add results

* full results

* align the table

* refactor to inherit

* default per gpu eval = 1

* Formatting

* Formatting

* isort

* modify readme

* Add check

* Fix format

* Fix format

* Doc strings

* ALBERT & BERT for sequence classification don't inherit from the original anymore

* Remove incorrect comments

* Remove incorrect comments

* Remove incorrect comments

* Sync up with new code

* Sync up with new code

* Add a test

* Add a test

* Add a test

* Add a test

* Add a test

* Add a test

* Finishing up!
2020-06-20 13:41:46 +08:00
Zhu Baohe
f1679d7c48 Fix dropout in TFMobileBert (#5150) 2020-06-20 13:21:19 +08:00
Kevin Canwen Xu
5ed94b2312 Update note to avoid confusion (#5131) 2020-06-20 10:13:34 +08:00
Lysandre
d97b4176e5 Correct device assignment 2020-06-19 21:58:28 -04:00
Vasily Shamporov
9a3f91088c Add MobileBert (#4901)
* Add MobileBert

* Quality + Conversion script

* style

* Update src/transformers/modeling_mobilebert.py

* Links to S3

* Style

* TFMobileBert

Slight fixes to the pytorch MobileBert
Style

* MobileBertForMaskedLM (PT + TF)

* MobileBertForNextSentencePrediction (PT + TF)

* MobileFor{MultipleChoice, TokenClassification} (PT + TF)


ss

* Tests + Auto

* Doc

* Tests

* Addressing @sgugger's comments

* Adressing @patrickvonplaten's comments

* Style

* Style

* Integration test

* style

* Model card

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-19 16:38:36 -04:00
Sam Shleifer
f45e873910 [bart-mnli] Fix class flipping bug (#5141) 2020-06-19 13:33:24 -04:00
Erick Rocha Fonseca
e33929ef1e Fix in Reformer Config documentation (#5138) 2020-06-19 15:41:31 +02:00
Sam Shleifer
84be482f66 AutoTokenizer supports mbart-large-en-ro (#5121) 2020-06-18 20:47:37 -04:00
Sam Shleifer
2db1e2f415 [cleanup] remove redundant code in SummarizationDataset (#5119) 2020-06-18 20:34:48 -04:00
Sylvain Gugger
5f721ad6e4 Fix #5114 (#5122) 2020-06-18 19:20:04 -04:00
Pri Oberoi
a258982af3 Add missing arg in 02-transformers notebook (#5085)
* Add missing arg when creating model

* Fix typos

* Remove from_tf flag when creating model
2020-06-18 19:04:04 -04:00
Deniz
32e94cff64 tf add resize_token_embeddings method (#4351)
* resize token embeddings

* add tokens

* add tokens

* add tokens

* add t5 token method

* add t5 token method

* add t5 token method

* typo

* debugging input

* debugging input

* debug

* debug

* debug

* trying to set embedding tokens properly

* set embeddings for generation head too

* set embeddings for generation head too

* debugging

* debugging

* enable generation

* add base method

* add base method

* add base method

* return logits in the main call

* reverting to generation

* revert back

* set embeddings for the bert main layer

* description

* fix conflicts

* logging

* set base model as self

* refactor

* tf_bert add method

* tf_bert add method

* tf_bert add method

* tf_bert add method

* tf_bert add method

* tf_bert add method

* tf_bert add method

* tf_bert add method

* v0

* v0

* finalize

* final

* black

* add tests

* revert back the emb call

* comments

* comments

* add the second test

* add vocab size condig

* add tf models

* add tf models. add common tests

* remove model specific embedding tests

* stylish

* remove files

* stylez

* Update src/transformers/modeling_tf_transfo_xl.py

change the error.

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* adding unchanged weight test

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-18 18:41:26 -04:00
Lysandre Debut
973433260e Pin sphinx-rtd-theme (#5128) 2020-06-18 18:07:59 -04:00
Sam Shleifer
8a377c3d6e [fix] Move _adjust_logits above postprocess to fix Marian.generate (#5126) 2020-06-18 18:06:27 -04:00
Sam Shleifer
3d3e605aff [cleanup] generate_beam_search comments (#5115) 2020-06-18 16:30:24 -04:00
Suraj Patil
ca2d0f98c4 ElectraForMultipleChoice (#4954)
* add ElectraForMultipleChoice

* add  test_for_multiple_choice

* add ElectraForMultipleChoice in auto model

* add ElectraForMultipleChoice in all_model_classes

* add SequenceSummary related parameters

* get rid pooler, use SequenceSummary instead

* add electra multiple choice test

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-18 14:59:35 -04:00
Ori Garin
279d8e24f7 support local_files_only option for tf models (#5116) 2020-06-18 13:47:05 -04:00
Julien Chaumond
355954ffca Create distilbert-base-uncased-distilled-squad-README.md 2020-06-18 05:17:45 -04:00
Suraj Patil
18177a1a60 lm_labels => labels (#5080) 2020-06-18 09:16:29 +02:00
Lysandre
efeb75b805 Remove misleading comment
closes #4958
2020-06-17 18:24:35 -04:00
Saurabh Misra
bb154ac50c Fixing TPU training by disabling wandb.watch gradients logging for TPU (#4926) 2020-06-17 18:04:11 -04:00
Suraj Patil
fb6cccb863 fix qa example (#4929) 2020-06-17 17:54:16 -04:00
Karthikeyan Singaravelan
38bba9cdd5 Fix deprecation warnings due to invalid escape sequences. (#4924) 2020-06-17 17:46:58 -04:00
Sam Shleifer
f1a3d03741 add pandas to setup.cfg (#5093) 2020-06-17 16:39:17 -04:00
Sam Shleifer
90c833870c [MarianTokenizer] Switch to sacremoses for punc normalization (#5092) 2020-06-17 16:31:05 -04:00
Pranav Dayanand Pawar
049e14f0e3 very minor spelling correction in script command (#5090)
actual script name - counts_parameters.py
2020-06-17 16:08:43 -04:00
Sylvain Gugger
20fa828984 Make default_data_collator more flexible and deprecate old behavior (#5060)
* Make default_data_collator more flexible

* Accept tensors for all features

* Document code

* Refactor

* Formatting
2020-06-17 15:24:51 -04:00
Yacine Jernite
5e06963394 Some changes to simplify the generation function (#5031)
* moving logits post-processing out of beam search

* moving logits post-processing out of beam search

* first step cache

* fix_Encoder_Decoder

* patrick_version_postprocess

* add_keyword_arg
2020-06-17 14:48:06 -04:00
Sylvain Gugger
204ebc25e6 Update installation page and add contributing to the doc (#5084)
* Update installation page and add contributing to the doc

* Remove mention of symlinks
2020-06-17 14:01:10 -04:00
Sam Shleifer
043f9f51f9 [examples] SummarizationModule improvements (#4951) 2020-06-17 13:51:34 -04:00
Sylvain Gugger
cd40f6564e Add header and fix command (#5082) 2020-06-17 11:45:05 -04:00
Julien Chaumond
70bc3ead4f [TextClassificationPipeline] Hotfix: make json serializable 2020-06-17 15:09:27 +00:00
Sylvain Gugger
7291ea0bff Reorganize documentation (#5064)
* Reorganize topics and add all models
2020-06-17 07:55:20 -04:00
Sylvain Gugger
e4aaa45805 Update pipeline examples to doctest syntax (#5030) 2020-06-16 18:14:58 -04:00
Sylvain Gugger
011cc0be51 Fix all sphynx warnings (#5068) 2020-06-16 16:50:02 -04:00
flozi00
af497b5672 Typo (#5069) 2020-06-16 16:46:20 -04:00
Yacine Jernite
49c5202522 Eli5 examples (#4968)
* add eli5 examples

* add dense query script

* query_di

* merging

* merging

* add_utils

* adds nearest neighbor wikipedia

* batch queries

* training_retriever

* new notebooks

* moved retriever traiing script

* finished wiki40b

* max_len_fix

* train_s2s

* retriever_batch_checkpointing

* cleanup

* merge

* dim_fix

* fix_indexer

* fix_wiki40b_snippets

* fix_embed_for_r

* fp32 index

* fix_sparse_q

* joint_training

* remove obsolete datasets

* add_passage_nn_results

* add_passage_nn_results

* add_batch_nn

* add_batch_nn

* add_data_scripts

* notebook

* notebook

* notebook

* fix_multi_gpu

* add_app

* full_caching

* full_caching

* notebook

* sparse_done

* images

* notebook

* add_image_gif

* with_Gif

* add_contr_image

* notebook

* notebook

* notebook

* train_functions

* notebook

* min_retrieval_length

* pandas_option

* notebook

* min_retrieval_length

* notebook

* notebook

* eval_Retriever

* notebook

* images

* notebook

* add_example

* add_example

* notebook

* fireworks

* notebook

* notebook

* joe's notebook comments

* app_update

* notebook

* notebook_link

* captions

* notebook

* assing RetriBert model

* add RetriBert to Auto

* change AutoLMHead to AutoSeq2Seq

* notebook downloads from hf models

* style_black

* style_black

* app_update

* app_update

* fix_app_update

* style

* style

* isort

* Delete WikiELI5training.ipynb

* Delete evaluate_eli5.py

* Delete WikiELI5explore.ipynb

* Delete ExploreWikiELI5Support.html

* Delete explainlikeimfive.py

* Delete wiki_snippets.py

* children before parent

* children before parent

* style_black

* style_black_only

* isort

* isort_new

* Update src/transformers/modeling_retribert.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* typo fixes

* app_without_asset

* cleanup

* Delete ELI5animation.gif

* Delete ELI5contrastive.svg

* Delete ELI5wiki_index.svg

* Delete choco_bis.svg

* Delete fireworks.gif

* Delete huggingface_logo.jpg

* Delete huggingface_logo.svg

* Delete Long_Form_Question_Answering_with_ELI5_and_Wikipedia.ipynb

* Delete eli5_app.py

* Delete eli5_utils.py

* readme

* Update README.md

* unused imports

* moved_info

* default_beam

* ftuned model

* disclaimer

* Update src/transformers/modeling_retribert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* black

* add_doc

* names

* isort_Examples

* isort_Examples

* Add doc to index

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-06-16 16:36:58 -04:00
Sam Shleifer
c3e607496c [cleanup] examples test_run_squad uses tiny model (#5059) 2020-06-16 14:06:45 -04:00
Sylvain Gugger
439aa1d6e9 Remove old section + caching in install (#5027) 2020-06-16 13:03:41 -04:00
Sam Shleifer
3d495c61ef Fix marian tokenizer save pretrained (#5043) 2020-06-16 09:48:19 -04:00
Sylvain Gugger
d5477baf7d Convert hans to Trainer (#5025)
* Convert hans to Trainer

* Tick box
2020-06-16 08:06:31 -04:00
Amil Khare
c852036b4a [cleanup] Hoist ModelTester objects to top level (#4939)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-06-16 08:03:43 -04:00
Manuel Romero
0c55a384f8 Add reference to NLP dataset (#5028)
* Add reference to NLP dataset

* Update README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-16 04:19:09 -04:00
Manuel Romero
0946d1209d Add reference to NLP (package) dataset (#5029)
* Add reference to NLP (package) dataset

* Update README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-16 04:17:46 -04:00
Boris Dayma
edcb3ac59a refactor(wandb): consolidate import (#5044) 2020-06-16 03:40:43 -04:00
Funtowicz Morgan
9e03364999 Ability to pickle/unpickle BatchEncoding pickle (reimport) (#5039)
* Added is_fast property on BatchEncoding to indicate if the object comes from a Fast Tokenizer.

* Added __get_state__() & __set_state__() to be pickable.

* Correct tokens() return type from List[int] to List[str]

* Added unittest for BatchEncoding pickle/unpickle

* Added unittest for BatchEncoding is_fast

* More careful checking on BatchEncoding unpickle tests.

* Formatting.

* is_fast should assertTrue on Rust tokenizers.

* Ensure tensorflow has correct way of checking array_equal

* More formatting.
2020-06-16 09:25:25 +02:00
Sylvain Gugger
f9f8a5312e Add DistilBertForMultipleChoice (#5032)
* Add `DistilBertForMultipleChoice`
2020-06-15 18:31:41 -04:00
Anthony MOI
36434220fc [HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)
* Use tokenizers pre-tokenized pipeline

* failing pretrokenized test

* Fix is_pretokenized in python

* add pretokenized tests

* style and quality

* better tests for batched pretokenized inputs

* tokenizers clean up - new padding_strategy - split the files

* [HUGE] refactoring tokenizers - padding - truncation - tests

* style and quality

* bump up requied tokenizers version to 0.8.0-rc1

* switched padding/truncation API - simpler better backward compat

* updating tests for custom tokenizers

* style and quality - tests on pad

* fix QA pipeline

* fix backward compatibility for max_length only

* style and quality

* Various cleans up - add verbose

* fix tests

* update docstrings

* Fix tests

* Docs reformatted

* __call__ method documented

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-06-15 17:12:51 -04:00
Patrick von Platen
ebba39e4e1 [Bart] Question Answering Model is added to tests (#5024)
* fix test

* Update tests/test_modeling_common.py

* Update tests/test_modeling_common.py
2020-06-15 22:50:09 +02:00
Sylvain Gugger
bbad4c6989 Add position_ids (#5021) 2020-06-15 15:50:17 -04:00
Boris Dayma
1bf4098e03 feat(TFTrainer): improve logging (#4946)
* feat(tftrainer): improve logging

* fix(trainer): consider case with evaluation only

* refactor(tftrainer): address comments

* refactor(tftrainer): move self.epoch_logging to __init__
2020-06-15 14:06:17 -04:00
Funtowicz Morgan
7b5a1e7d51 Fix importing transformers on Windows (#4997) 2020-06-15 19:36:57 +02:00
Sam Shleifer
a9f1fc6c94 Add bart-base (#5014) 2020-06-15 13:29:26 -04:00
Funtowicz Morgan
7b685f5229 Increase pipeline support for ONNX export. (#5005)
* Increase pipeline support for ONNX export.

* Style.
2020-06-15 19:13:58 +02:00
Sylvain Gugger
1affde2f10 Make DataCollator a callable (#5015)
* Make DataCollator a callable

* Update src/transformers/data/data_collator.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-15 11:58:33 -04:00
Bram Vanroy
f7c93b3cee Possible fix to make AMP work with DDP in the trainer (#4728)
* manually set device in trainer args

* check if current device is cuda before set_device

* Explicitly set GPU ID when using single GPU

This addresses https://github.com/huggingface/transformers/issues/4657#issuecomment-642228099
2020-06-15 10:10:26 -04:00
ipuneetrathore
66bcfbb130 Create README.md (#4975)
* Create README.md

* Update model_cards/ipuneetrathore/bert-base-cased-finetuned-finBERT/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-15 08:43:50 -04:00
Stefan Schweter
d812e6d76e NER: fix construction of input examples for RoBERTa (#4943)
* utils_ner: do not add extra sep token for RoBERTa model

* run_pl_ner: do not add extra sep token for RoBERTa model
2020-06-15 08:30:40 -04:00
Suraj Patil
ebab096e86 [model card] model card for bart-large-finetuned-squadv1 (#4977)
* [model card] model card for bart-large-finetuned-squadv1

* add metadata link to the dataset
2020-06-15 05:39:41 -04:00
Funtowicz Morgan
9ad36ad57f Improve ONNX logging (#4999)
* Improve ONNX export logging to give more information about the generated graph.

* Correctly handle input and output in the logging.
2020-06-15 11:04:51 +02:00
ZhuBaohe
9931f817b7 fix (#4976) 2020-06-14 21:36:14 +02:00
Suraj Patil
9208f57b16 BartTokenizerFast (#4878) 2020-06-14 13:04:49 -04:00
Sylvain Gugger
403d309857 Hans data (#4854)
* Update hans data to be able to use Trainer

* Fixes

* Deal with tokenizer that don't have token_ids

* Clean up things

* Simplify data use

* Fix the input dict

* Formatting + proper path in README
2020-06-13 09:35:13 -04:00
Julien Chaumond
ca5e1cdf8e model_cards: we can now tag datasets
see corresponding model pages to see how it's rendered
2020-06-12 23:19:07 +02:00
Suraj Patil
e93ccb3290 BartForQuestionAnswering (#4908) 2020-06-12 15:47:57 -04:00
Sylvain Gugger
538531cde5 Add AlbertForMultipleChoice (#4959)
* Add AlbertForMultipleChoice

* Make up to date and add all models to common tests
2020-06-12 14:20:19 -04:00
Manuel Romero
fe24139702 Create README.md (#4865) 2020-06-12 09:03:43 -04:00
Yannis Papanikolaou
9aa219a1fe Create README.md (#4872) 2020-06-12 09:03:13 -04:00
Patrick von Platen
86578bb04c [AutoModel] Split AutoModelWithLMHead into clm, mlm, encoder-decoder (#4933)
* first commit

* add new auto models

* better naming

* fix bert automodel

* fix automodel for pretraining

* add models to init

* fix name typo

* fix typo

* better naming

* future warning instead of depreciation warning
2020-06-12 10:01:49 +02:00
Sam Shleifer
5620033115 [mbart] Fix fp16 testing logic (#4949) 2020-06-11 22:11:34 -04:00
VictorSanh
473808da0d update mvmt-pruning/saving_prunebert (updating torch to 1.5) 2020-06-11 19:42:45 +00:00
Patrick von Platen
caf3746678 fix indentation issue (#4941) 2020-06-11 21:28:01 +02:00
Suraj Patil
6293eb04df [Model card] model card for electra-base QA model (#4936) 2020-06-11 13:16:34 -04:00
Sam Shleifer
08b59d10e5 MBartTokenizer:add language codes (#3776) 2020-06-11 13:02:33 -04:00
Sylvain Gugger
20451195f0 Support multiple choice in tf common model tests (#4920)
* Support multiple choice in tf common model tests

* Add the input_embeds test
2020-06-11 10:31:26 -04:00
Setu Shah
699541c4b3 TFTrainer: Add dataloader_drop_last (#4925) 2020-06-11 02:11:22 -04:00
RafaelWO
e80d6c689b Fix resize_token_embeddings for Transformer-XL (#4759)
* Fixed resize_token_embeddings for transfo_xl model

* Fixed resize_token_embeddings for transfo_xl.

Added custom methods to TransfoXLPreTrainedModel for resizing layers of
the AdaptiveEmbedding.

* Updated docstring

* Fixed resizinhg cutoffs; added check for new size of embedding layer.

* Added test for resize_token_embeddings

* Fixed code quality

* Fixed unchanged cutoffs in model.config

Co-authored-by: Rafael Weingartner <rweingartner.its-b2015@fh-salzburg.ac.at>
2020-06-10 19:03:06 -04:00
Sylvain Gugger
d541938c48 Make multiple choice models work with input_embeds (#4921) 2020-06-10 18:38:34 -04:00
Sylvain Gugger
1e2631d6f8 Split LMBert model in two (#4874)
* Split LMBert model in two

* Fix example

* Remove lm_labels

* Adapt tests, refactor prepare_for_generation

* Fix merge

* Hide BeartLMHeadModel
2020-06-10 18:26:42 -04:00
Matthew Goldey
f6da8b2200 check type before logging in trainer to ensure values are scalars (#4883)
* check type before logging to ensure it's a scalar

* log when Trainer attempts to add a non-scalar value using TensorboardX's writer.add_scalar so we know what kinds of fixes are appropriate

* black it

* rephrase log message to clarify attribute was dropped

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-10 18:25:55 -04:00
Yannis Papanikolaou
1c986f42ff Create README.md (#4871) 2020-06-10 17:29:41 -04:00
Lysandre Debut
3ae2e86baf Run a single wandb instance per TPU run (#4851)
* Run a single wandb instance per TPU run

* wandb: self.is_world_master

* make style

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-10 16:28:18 -04:00
Lysandre Debut
466aa57a45 Don't init TPU device twice (#4916) 2020-06-10 15:53:15 -04:00
Suraj Patil
ef2dcdccaa ElectraForQuestionAnswering (#4913)
* ElectraForQuestionAnswering

* udate __init__

* add test for electra qa model

* add ElectraForQuestionAnswering in auto models

* add ElectraForQuestionAnswering in all_model_classes

* fix outputs, input_ids defaults to None

* add ElectraForQuestionAnswering in docs

* remove commented line
2020-06-10 15:17:52 -04:00
Amil Khare
5d63ca6c38 [ctrl] fix pruning of MultiHeadAttention (#4904) 2020-06-10 14:06:55 -04:00
Sylvain Gugger
4e10acb3e5 Add more models to common tests (#4910) 2020-06-10 13:19:53 -04:00
Patrick von Platen
3b3619a327 [All models] fix docs after adding output attentions to all forward functions (#4909)
* fix doc

* add format file

* add output attentions to all docs

* add also for bart

* fix naming

* re-add doc to config
2020-06-10 18:10:59 +02:00
Sylvain Gugger
ac99217e92 Fix the CI (#4903)
* Fix CI
2020-06-10 09:26:06 -04:00
Sylvain Gugger
0a375f5abd Deal with multiple choice in common tests (#4886)
* Deal with multiple choice in common tests
2020-06-10 08:10:20 -04:00
Sylvain Gugger
e8db8b845a Remove unused arguments in Multiple Choice example (#4853)
* Remove unused arguments

* Formatting

* Remove second todo comment
2020-06-09 20:05:09 -04:00
songyouwei
29c36e9f36 run_pplm.py bug fix (#4867)
`is_leaf` may become `False` after `.to(device=device)` function call.
2020-06-09 19:14:27 -04:00
Lysandre
13aa174112 uninstalled wandb raises AttributeError 2020-06-09 18:50:56 -04:00
Bharat Raghunathan
6e603cb789 [All models] Extend config.output_attentions with output_attentions function arguments (#4538)
* DOC: Replace instances of ``config.output_attentions`` with function argument ``output_attentions``

* DOC: Apply Black Formatting

* Fix errors where output_attentions was undefined

* Remove output_attentions in classes per review

* Fix regressions on tests having `output_attention`

* Fix further regressions in tests relating to `output_attentions`

Ensure proper propagation of `output_attentions` as a function parameter
to all model subclasses

* Fix more regressions in `test_output_attentions`

* Fix issues with BertEncoder

* Rename related variables to `output_attentions`

* fix pytorch tests

* fix bert and gpt2 tf

* Fix most TF tests for `test_output_attentions`

* Fix linter errors and more TF tests

* fix conflicts

* DOC: Apply Black Formatting

* Fix errors where output_attentions was undefined

* Remove output_attentions in classes per review

* Fix regressions on tests having `output_attention`

* fix conflicts

* fix conflicts

* fix conflicts

* fix conflicts

* fix pytorch tests

* fix conflicts

* fix conflicts

* Fix linter errors and more TF tests

* fix tf tests

* make style

* fix isort

* improve output_attentions

* improve tensorflow

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-06-09 23:39:06 +02:00
Sam Shleifer
f90bc44d9a [examples] Cleanup summarization docs (#4876) 2020-06-09 17:38:28 -04:00
Patrick von Platen
2cfb947f59 [Benchmark] add tpu and torchscipt for benchmark (#4850)
* add tpu and torchscipt for benchmark

* fix name in tests

* "fix email"

* make style

* better log message for tpu

* add more print and info for tpu

* allow possibility to print tpu metrics

* correct cpu usage

* fix test for non-install

* remove bugus file

* include psutil in testing

* run a couple of times before tracing in torchscript

* do not allow tpu memory tracing for now

* make style

* add torchscript to env

* better name for torch tpu

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2020-06-09 23:12:43 +02:00
Hamza Harkous
f0340b3031 Removes from the of the parent of TFRobertaClassificationHead (#4884)
Co-authored-by: Hamza Harkous <harkous@google.com>
2020-06-09 16:14:01 -04:00
Amil Khare
02e5f79662 [examples] consolidate summarization examples (#4837) 2020-06-09 11:14:12 -04:00
Julien Plu
9f5d5a531d Fix the __getattr__ method in BatchEncoding (#4772) 2020-06-09 09:44:00 +02:00
Sylvain Gugger
41a1d27cde Add XLMRobertaForQuestionAnswering (#4855)
* Add XLMRobertaForQuestionAnswering

* Formatting

* Make test happy
2020-06-08 21:22:37 -04:00
Sam Shleifer
a139d1a160 [cleanup] consolidate some prune_heads logic (#4799) 2020-06-08 17:08:04 -04:00
ZhuBaohe
4c7f564f9a fix (#4839) 2020-06-08 18:28:50 +02:00
Sylvain Gugger
37be3786cf Clean documentation (#4849)
* Clean documentation
2020-06-08 11:28:19 -04:00
Lysandre
42860e92a4 Turn off codecov patch for now 2020-06-08 09:47:13 -04:00
Julien Plu
36dfc317b3 TF Checkpoints (#4831)
* Align checkpoint dir with the PT trainer

* Use args for max to keep checkpoints
2020-06-08 09:45:23 -04:00
Patrick von Platen
439f1cab20 [Generate] beam search should generate without replacement (#4845)
* fix flaky beam search

* fix typo
2020-06-08 15:31:32 +02:00
Patrick von Platen
c0554776de fix PR (#4810) 2020-06-08 15:31:12 +02:00
Sylvain Gugger
e817747941 Expose classes used in documentation (#4808)
* Expose classes used in documentation

* Format code
2020-06-08 08:14:32 -04:00
daniel-shan
b6f365a8ed Updates args in tf squad example. (#4820)
Co-authored-by: Daniel Shan <daniel.shan@workday.com>
2020-06-08 05:36:09 -04:00
Bram Vanroy
e33fdc93b4 Export PretrainedBartModel from __init__ (#4819) 2020-06-07 11:55:10 -04:00
Sam Shleifer
c58e6c129a [marian tests ] pass device to pipeline (#4815) 2020-06-06 00:52:17 -04:00
Mr Ruben
ddf9a3dfc7 Updated path "cd examples/text-generation/pplm" (#4778)
https://github.com/huggingface/transformers/issues/4776
2020-06-05 21:16:48 -04:00
Sylvain Gugger
2d372a990b Explain how to preview the docs in a PR (#4795) 2020-06-05 20:47:02 -04:00
Sylvain Gugger
56d5d160cd Add model and doc badges (#4811)
* Add badges for models and docs
2020-06-05 18:45:42 -04:00
Sam Shleifer
4ab7424597 [cleanup/marian] pipelines test and new kwarg (#4812) 2020-06-05 18:45:19 -04:00
Sam Shleifer
875288b344 [isort] add matplotlib to known 3rd party dependencies (#4800) 2020-06-05 17:27:31 -04:00
Patrick von Platen
8cca875569 [EncoderDecoderConfig] automatically set decoder config to decoder (#4809)
* automatically set decoder config to decoder

* add more tests
2020-06-05 23:16:37 +02:00
Sylvain Gugger
f1fe18465d Use labels to remove deprecation warnings (#4807) 2020-06-05 16:41:46 -04:00
Sylvain Gugger
5c0cfc2cf0 Add link to community models (#4804) 2020-06-05 15:29:20 -04:00
Sylvain Gugger
4dd5cf2207 Fix argument label (#4792)
* Fix argument label

* Fix test
2020-06-05 15:20:29 -04:00
Sam Shleifer
3723f30a18 [cleanup] MarianTokenizer: delete unused constants (#4802) 2020-06-05 14:57:24 -04:00
Sylvain Gugger
acaa2e6267 Clean-up code (#4790) 2020-06-05 12:36:22 -04:00
Sylvain Gugger
fa661ce749 Add model summary (#4789)
* Add model summary

* Add link to pretrained models
2020-06-05 12:22:50 -04:00
Lysandre Debut
79ab881eb1 No silent error when d_head already in the configuration (#4747)
* No silent error when d_head already in the configuration

* Update src/transformers/configuration_xlnet.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-05 12:01:43 -04:00
Julien Chaumond
b9109f2de1 [doc] Make it clearer that text-generation does not involve training 2020-06-05 14:59:22 +02:00
Sylvain Gugger
ceaab8dd22 Add .vs to gitignore (#4774) 2020-06-05 07:56:11 -04:00
Julien Plu
f9414f7553 Tensorflow improvements (#4530)
* Better None gradients handling

* Apply Style

* Apply Style

* Create a loss class per task to compute its respective loss

* Add loss classes to the ALBERT TF models

* Add loss classes to the BERT TF models

* Add question answering and multiple choice to TF Camembert

* Remove prints

* Add multiple choice model to TF DistilBERT + loss computation

* Add question answering model to TF Electra + loss computation

* Add token classification, question answering and multiple choice models to TF Flaubert

* Add multiple choice model to TF Roberta + loss computation

* Add multiple choice model to TF XLM + loss computation

* Add multiple choice and question answering models to TF XLM-Roberta

* Add multiple choice model to TF XLNet + loss computation

* Remove unused parameters

* Add task loss classes

* Reorder TF imports + add new model classes

* Add new model classes

* Bugfix in TF T5 model

* Bugfix for TF T5 tests

* Bugfix in TF T5 model

* Fix TF T5 model tests

* Fix T5 tests + some renaming

* Fix inheritance issue in the AutoX tests

* Add tests for TF Flaubert and TF XLM Roberta

* Add tests for TF Flaubert and TF XLM Roberta

* Remove unused piece of code in the TF trainer

* bugfix and remove unused code

* Bugfix for TF 2.2

* Apply Style

* Divide TFSequenceClassificationAndMultipleChoiceLoss into their two respective name

* Apply style

* Mirror the PT Trainer in the TF one: fp16, optimizers and tb_writer as class parameter and better dataset handling

* Fix TF optimizations tests and apply style

* Remove useless parameter

* Bugfix and apply style

* Fix TF Trainer prediction

* Now the TF models return the loss such as their PyTorch couterparts

* Apply Style

* Ignore some tests output

* Take into account the SQuAD cls_index, p_mask and is_impossible parameters for the QuestionAnswering task models.

* Fix names for SQuAD data

* Apply Style

* Fix conflicts with 2.11 release

* Fix conflicts with 2.11

* Fix wrongname

* Add better documentation on the new create_optimizer function

* Fix isort

* logging_dir: use same default as PyTorch

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-04 19:45:53 -04:00
Théophile Blard
ccd26c2862 Create model card for tblard/allocine (#4775)
https://huggingface.co/tblard/tf-allocine
2020-06-04 19:15:07 -04:00
Stefan Schweter
2a4b9e09c0 NER: Add new WNUT’17 example (#4681)
* ner: add preprocessing script for examples that splits longer sentences

* ner: example shell scripts use local preprocessing now

* ner: add new example section for WNUT’17 NER task. Remove old English CoNLL-03 results

* ner: satisfy black and isort
2020-06-04 19:13:17 -04:00
Setu Shah
0e1869cc28 Add drop_last arg for data loader 2020-06-04 18:30:31 -04:00
prajjwal1
48a05026de removed deprecared use of Variable api from pplm example 2020-06-04 18:07:49 -04:00
Sylvain Gugger
12d0eb5f3e Don't access pad_token_id if there is no pad_token (#4773) 2020-06-04 17:57:04 -04:00
Manuel Romero
17a88d3192 Create model card for T5-base fine-tuned for Sentiment Span Extraction (#4737) 2020-06-04 16:59:56 -04:00
Oren Amsalem
fb52143cf6 Create README.md (#4743) 2020-06-04 16:59:37 -04:00
Suraj Parmar
5f077a3445 Model Card for RoBERTa trained on Sanskrit (#4763)
* Model cad for SanBERTa

Model Card for RoBERTa trained on Sanskrit

* Model card for SanBERTa

model card for RoBERTa trained on Sanskrit
2020-06-04 16:58:40 -04:00
Sylvain Gugger
cd4e07a85e Add note about doc generation (#4770) 2020-06-04 13:43:14 -04:00
Jason Phang
492b352ab6 Remove unnecessary model_type arg in example (#4771) 2020-06-04 13:41:24 -04:00
Lysandre Debut
e645b9ab94 Codecov setup (#4768)
* Codecov setup

* Understanding codecov
2020-06-04 11:44:38 -04:00
Sam Shleifer
2b8b6c929e [cleanup] PretrainedModel.generate: remove unused kwargs (#4761) 2020-06-04 08:13:52 -04:00
Funtowicz Morgan
5bf9afbf35 Introduce a new tensor type for return_tensors on tokenizer for NumPy (#4585)
* Refactor tensor creation in tokenizers.

* Make sure to convert string to TensorType

* Refactor convert_to_tensors_

* Introduce numpy tensor creation

* Format

* Add unittest for TensorType creation from str

* sorting imports

* Added unittests for numpy tensor conversion.

* Do not use in-place version for squeeze as numpy doesn't provide such feature.

* Added extra parameter prepend_batch_axis: bool on prepare_for_model.

* Ensure test_np_encode_plus_sent_to_model is not executed if encoder/decoder model.

* style.

* numpy tests require_torch for now while flax not merged.

* Hopefully will make flake8 happy.

* One more time 🎶
2020-06-04 06:57:01 +02:00
Funtowicz Morgan
efae154929 never_split on slow tokenizers should not split (#4723)
* Ensure tokens in never_split are not splitted when using basic tokenizer before wordpiece.

* never_split only use membership attempt to use a set() which is 10x faster for this operation.

* Use union to concatenate two sets.

* Updated docstring for never_split parameter.

* Avoid set.union() if never_split is None

* Added comments.

* Correct docstring format.
2020-06-03 16:48:28 -04:00
Lysandre Debut
2e4de76231 Update encode documentation (#4751) 2020-06-03 16:30:59 -04:00
Patrick von Platen
ed4df85572 fix beam search bug in tf as well (#4745) 2020-06-03 12:53:23 -04:00
Sylvain Gugger
1b5820a565 Unify label args (#4722)
* Deprecate masked_lm_labels argument

* Apply to all models

* Better error message
2020-06-03 09:36:26 -04:00
Abhishek Kumar Mishra
3e5928c57d Adding notebooks for Fine Tuning [Community Notebook] (#4732)
* Added links to more community notebooks

Added links to 3 more community notebooks from the git repo: https://github.com/abhimishra91/transformers-tutorials
Different Transformers models are fine tuned on Dataset using PyTorch

* Update README.md

* Update README.md

* Update README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-06-03 11:07:26 +02:00
Julien Chaumond
99207bd112 Pipelines: miscellanea of QoL improvements and small features... (#4632)
* [hf_api] Attach all unknown attributes for future-proof compatibility

* [Pipeline] NerPipeline is really a TokenClassificationPipeline

* modelcard.py: I don't think we need to force the download

* Remove config, tokenizer from SUPPORTED_TASKS as we're moving to one model = one weight + one tokenizer

* FillMaskPipeline: also output token in string form

* TextClassificationPipeline: option to return all scores, not just the argmax

* Update docs/source/main_classes/pipelines.rst
2020-06-03 03:51:31 -04:00
David Mezzetti
8ed47aa10b bert-small-cord19 model cards (#4730)
* Create README.md

* Create README.md

* Create README.md
2020-06-03 03:40:14 -04:00
Patrick von Platen
9ca485734a [Reformer] Improved memory if input is shorter than chunk length (#4720)
* improve handling of short inputs for reformer

* correct typo in assert statement

* fix other tests
2020-06-02 23:08:39 +02:00
Jin Young Sohn
b231a413f5 Add cache_dir to save features in GLUE + Differentiate match/mismatch for MNLI metrics (#4621)
* Glue task cleaup

* Enable writing cache to cache_dir in case dataset lives in readOnly
filesystem.
* Differentiate match vs mismatch for MNLI metrics.

* Style

* Fix pytype

* Fix type

* Use cache_dir in mnli mismatch eval dataset

* Small Tweaks

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-02 13:40:14 -04:00
Sam Shleifer
70f7423436 TFRobertaModelIntegrationTest requires tf (#4726) 2020-06-02 12:59:00 -04:00
Lysandre
d976ef262e Repin versions 2020-06-02 10:27:15 -04:00
Julien Chaumond
b42586ea56 Fix CI after killing archive maps (#4724)
Some checks failed
GitHub-hosted runner / check_code_quality (push) Has been cancelled
* 🐛 Fix model ids for BART and Flaubert
2020-06-02 10:21:09 -04:00
Lysandre
b43c78e5d3 Release: v2.11.0 2020-06-02 09:49:09 -04:00
Julien Chaumond
d4c2cb402d Kill model archive maps (#4636)
* Kill model archive maps

* Fixup

* Also kill model_archive_map for MaskedBertPreTrainedModel

* Unhook config_archive_map

* Tokenizers: align with model id changes

* make style && make quality

* Fix CI
2020-06-02 09:39:33 -04:00
Patrick von Platen
47a551d17b [pipeline] Tokenizer should not add special tokens for text generation (#4686)
* allow to not add special tokens

* remove print
2020-06-02 11:03:46 +02:00
Funtowicz Morgan
f6d5046af1 Override get_vocab for fast tokenizer. (#4717) 2020-06-02 11:02:27 +02:00
Lysandre Debut
88762a2f8c Specify PyTorch versions for examples (#4710) 2020-06-02 04:29:28 -04:00
Lorenzo Ampil
d3ef14f931 Add community notebook for sentiment span extraction (#4700) 2020-06-02 09:59:53 +02:00
Sylvain Gugger
7677936316 Make docstring match args (#4711) 2020-06-01 15:22:51 -04:00
Lysandre
6449c494d0 close #4685 2020-06-01 12:57:52 -04:00
Julien Chaumond
ec8717d5d8 [config] Ensure that id2label always takes precedence over num_labels 2020-06-01 16:54:55 +02:00
Julien Chaumond
751a1e0890 [config] Ensure that id2label always takes precedence over num_labels
Fixes bug reported in https://github.com/huggingface/transformers/issues/4669

See #3967 for context
2020-06-01 16:25:56 +02:00
Rens
ec62b7d953 Fix onnx export input names order (#4641)
* pass on tokenizer to pipeline

* order input names when convert to onnx

* update style

* remove unused imports

* make ordered inputs list needs to be mutable

* add test custom bert model

* remove unused imports
2020-06-01 16:12:48 +02:00
Victor SANH
bf760c80b5 finish README 2020-06-01 09:23:31 -04:00
Victor SANH
9d7d9b3ae0 weird import 2020-06-01 09:23:31 -04:00
Victor SANH
2a3c88a659 Update examples/movement-pruning/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-01 09:23:31 -04:00
Victor SANH
4ac462bfb8 Update examples/movement-pruning/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-01 09:23:31 -04:00
Victor SANH
35fa0bbca0 clarify README 2020-06-01 09:23:31 -04:00
Victor SANH
cc746a5020 flake8 compliance 2020-06-01 09:23:31 -04:00
Victor SANH
b11386e158 less prints in saving prunebert 2020-06-01 09:23:31 -04:00
Victor SANH
8b5d4003ab complete README 2020-06-01 09:23:31 -04:00
Victor SANH
5c8e5b3709 commplying with isort 2020-06-01 09:23:31 -04:00
Victor SANH
db2a3b2e01 space 2020-06-01 09:23:31 -04:00
Victor SANH
5f8f2d849a add floppy bert model notebok 2020-06-01 09:23:31 -04:00
Victor SANH
b41948f5cd add requirements 2020-06-01 09:23:31 -04:00
Victor SANH
fb8f4277b2 add scripts 2020-06-01 09:23:31 -04:00
Victor SANH
d489a6d3d5 add masked_run_* 2020-06-01 09:23:31 -04:00
Victor SANH
e4c07faf0a add sparsity modules 2020-06-01 09:23:31 -04:00
Mehrdad Farahani
667003e447 Create README.md (#4665) 2020-06-01 08:29:09 -04:00
Mehrdad Farahani
ed23f5909e HooshvareLab readme parsbert-armananer (#4666)
Readme for HooshvareLab/bert-base-parsbert-armananer-uncased
2020-06-01 08:28:43 -04:00
Mehrdad Farahani
3750b9b0b0 HooshvareLab readme parsbert-peymaner (#4667)
Readme for HooshvareLab/bert-base-parsbert-peymaner-uncased
2020-06-01 08:28:25 -04:00
Mehrdad Farahani
036c2c6b02 Update HooshvareLab/bert-base-parsbert-uncased (#4687)
mBERT results added regarding NER datasets!
2020-06-01 08:27:00 -04:00
Manuel Romero
74872c19d3 Create README.md (#4684) 2020-06-01 05:45:54 -04:00
Patrick von Platen
0866669e75 [EncoderDecoder] Fix initialization and save/load bug (#4680)
* fix bug

* add more tests
2020-05-30 01:25:19 +02:00
Patrick von Platen
6f82aea66b Include nlp notebook for model evaluation (#4676) 2020-05-29 19:38:56 +02:00
Wei Fang
33b7532e69 Fix longformer attention mask type casting when using apex (#4574)
* Fix longformer attention mask casting when using apex

* remove extra type casting
2020-05-29 18:13:30 +02:00
Patrick von Platen
56ee2560be [Longformer] Better handling of global attention mask vs local attention mask (#4672)
* better api

* improve automatic setting of global attention mask

* fix longformer bug

* fix global attention mask in test

* fix global attn mask flatten

* fix slow tests

* update docstring

* update docs and make more robust

* improve attention mask
2020-05-29 17:58:42 +02:00
Simon Böhm
e2230ba77b Fix BERT example code for NSP and Multiple Choice (#3953)
Change the example code to use encode_plus since the token_type_id
wasn't being correctly set.
2020-05-29 11:55:55 -04:00
Zhangyx
3a5d1ea2a5 Fix two bugs: 1. Index of test data of SST-2. 2. Label index of MNLI data. (#4546) 2020-05-29 11:12:24 -04:00
Patrick von Platen
9c17256447 [Longformer] Multiple choice for longformer (#4645)
* add multiple choice for longformer

* add models to docs

* adapt docstring

* add test to longformer

* add longformer for mc in init and modeling auto

* fix tests
2020-05-29 13:46:08 +02:00
Iz Beltagy
91487cbb8e [Longformer] fix model name in examples (#4653)
* fix longformer model names in examples

* a better name for the notebook
2020-05-29 13:12:35 +02:00
flozi00
b5015a2a0f gpt2 typo (#4629)
* gpt2 typo

* Add files via upload
2020-05-28 16:44:43 -04:00
Iz Beltagy
fe5cb1a1c8 Adding community notebook (#4642)
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-28 22:35:15 +02:00
Suraj Patil
aecaaf73a4 [Community notebooks] add longformer-for-qa notebook (#4652) 2020-05-28 22:27:22 +02:00
Anthony MOI
5e737018e1 Fix add_special_tokens on fast tokenizers (#4531) 2020-05-28 10:54:45 -04:00
Suraj Patil
e444648a30 LongformerForTokenClassification (#4638) 2020-05-28 12:48:18 +02:00
Lavanya Shukla
3cc2c2a150 add 2 colab notebooks (#4505)
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-28 11:18:16 +02:00
Iz Beltagy
ef03ae874f [Longformer] more models + model cards (#4628)
* adding freeze roberta models

* model cards

* lint
2020-05-28 11:11:05 +02:00
Patrick von Platen
96f57c9ccb [Benchmark] Memory benchmark utils (#4198)
* improve memory benchmarking

* correct typo

* fix current memory

* check torch memory allocated

* better pytorch function

* add total cached gpu memory

* add total gpu required

* improve torch gpu usage

* update memory usage

* finalize memory tracing

* save intermediate benchmark class

* fix conflict

* improve benchmark

* improve benchmark

* finalize

* make style

* improve benchmarking

* correct typo

* make train function more flexible

* fix csv save

* better repr of bytes

* better print

* fix __repr__ bug

* finish plot script

* rename plot file

* delete csv and small improvements

* fix in plot

* fix in plot

* correct usage of timeit

* remove redundant line

* remove redundant line

* fix bug

* add hf parser tests

* add versioning and platform info

* make style

* add gpu information

* ensure backward compatibility

* finish adding all tests

* Update src/transformers/benchmark/benchmark_args.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/benchmark/benchmark_args_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* delete csv files

* fix isort ordering

* add out of memory handling

* add better train memory handling

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-05-27 23:22:16 +02:00
Suraj Patil
ec4cdfdd05 LongformerForSequenceClassification (#4580)
* LongformerForSequenceClassification

* better naming x=>hidden_states, fix typo in doc

* Update src/transformers/modeling_longformer.py

* Update src/transformers/modeling_longformer.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-27 22:30:00 +02:00
Suraj Patil
4402879ee4 [Model Card] model card for longformer-base-4096-finetuned-squadv1 (#4625) 2020-05-27 18:48:03 +02:00
Lysandre Debut
6a17688021 per_device instead of per_gpu/error thrown when argument unknown (#4618)
* per_device instead of per_gpu/error thrown when argument unknown

* [docs] Restore examples.md symlink

* Correct absolute links so that symlink to the doc works correctly

* Update src/transformers/hf_argparser.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Warning + reorder

* Docs

* Style

* not for squad

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-27 11:36:55 -04:00
Mehrdad Farahani
1381b6d01d README for HooshvareLab (#4610)
HooshvareLab/bert-base-parsbert-uncased
2020-05-27 11:25:36 -04:00
Patrick von Platen
5acb4edf25 Update version command when contributing (#4614) 2020-05-27 17:19:11 +02:00
Darek Kłeczek
842588c12f uncased readme (#4608)
Co-authored-by: kldarek <darekmail>
2020-05-27 09:50:04 -04:00
Darek Kłeczek
ac1a612179 Create README.md (#4607)
Model card for cased model
2020-05-27 09:36:20 -04:00
Sam Shleifer
07797c4da4 [testing] LanguageModelGenerationTests require_tf or require_torch (#4616) 2020-05-27 09:10:26 -04:00
Hao Tan
a9aa7456ac Add back --do_lower_case to uncased models (#4245)
The option `--do_lower_case` is currently required by the uncased models (i.e., bert-base-uncased, bert-large-uncased).

Results:
BERT-BASE without --do_lower_case:  'exact': 73.83, 'f1': 82.22
BERT-BASE with --do_lower_case:  'exact': 81.02, 'f1': 88.34
2020-05-26 21:13:07 -04:00
Bayartsogt Yadamsuren
a801c7fd74 Creating a readme for ALBERT in Mongolian (#4603)
Here I am uploading Mongolian masked language model (ALBERT) on your platform.
https://en.wikipedia.org/wiki/Mongolia
2020-05-26 16:54:42 -04:00
Wissam Antoun
6458c0e268 updated model cards for both models at aubmindlab (#4604)
* updated aubmindlab/bert-base-arabert/ Model card

* updated aubmindlab/bert-base-arabertv01 model card
2020-05-26 16:52:43 -04:00
Oleksandr Bushkovskyi
ea4e7a53fa Improve model card for Tereveni-AI/gpt2-124M-uk-fiction (#4582)
Add language metadata, training and evaluation corpora details.
Add example output. Fix inconsistent use of quotes.
2020-05-26 16:51:40 -04:00
Manuel Romero
937930dcae Create README.md (#4591) 2020-05-26 16:50:08 -04:00
Manuel Romero
bac1cc4dc1 Remove MD emojis (#4602) 2020-05-26 16:38:39 -04:00
Patrick von Platen
003c477129 [GPT2, CTRL] Allow input of input_ids and past of variable length (#4581)
* revert convenience  method

* clean docs a bit
2020-05-26 19:43:58 +02:00
ohmeow
5ddd8d6531 Add BART fine-tuning summarization community notebook (#4539)
* adding BART summarization how-to community notebook

* Update notebooks/README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-26 16:43:41 +02:00
Bram Vanroy
8cc6807e89 Make transformers-cli cross-platform (#4131)
* make transformers-cli cross-platform

Using "scripts" is a useful option in setup.py particularly when you want to get access to non-python scripts. However, in this case we want to have an entry point into some of our own Python scripts. To do this in a concise, cross-platfom way, we can use entry_points.console_scripts. This change is necessary to provide the CLI on different platforms, which "scripts" does not ensure. Usage remains the same, but the "transformers-cli" script has to be moved (be part of the library) and renamed (underscore + extension)

* make style & quality
2020-05-26 10:00:51 -04:00
Patrick von Platen
c589eae2b8 [Longformer For Question Answering] Conversion script, doc, small fixes (#4593)
* add new longformer for question answering model

* add new config as well

* fix links

* fix links part 2
2020-05-26 14:58:47 +02:00
ZhuBaohe
a163c9ca5b [T5] Fix Cross Attention position bias (#4499)
* fix

* fix1
2020-05-26 08:57:24 -04:00
ZhuBaohe
1d69028989 fix (#4410) 2020-05-26 08:51:28 -04:00
Sam Shleifer
b86e42e0ac [ci] fix 3 remaining slow GPU failures (#4584) 2020-05-25 19:20:50 -04:00
Julien Chaumond
365d452d4d [ci] Slow GPU tests run daily (#4465) 2020-05-25 17:28:02 -04:00
Patrick von Platen
3e3e552125 [Reformer] fix reformer num buckets (#4564)
* fix reformer num buckets

* fix

* adapt docs

* set num buckets in config
2020-05-25 16:04:45 -04:00
Elman Mansimov
3dea40b858 fixing tokenization of extra_id symbols in T5Tokenizer. Related to issue 4021 (#4353) 2020-05-25 16:04:30 -04:00
Suraj Patil
5139733623 LongformerTokenizerFast (#4547) 2020-05-25 16:03:55 -04:00
Oliver Guhr
c9c385c522 Updated the link to the paper (#4570)
I looks like the conference has changed the link to the paper.
2020-05-25 15:29:50 -04:00
Sho Arora
adab7f8332 Add nn.Module as superclass (#4533) 2020-05-25 15:29:33 -04:00
Manuel Romero
8f7c1c7672 Create model card (#4578) 2020-05-25 15:28:30 -04:00
Ali Safaya
4c6b218056 Update README.md (#4556) 2020-05-25 15:12:23 -04:00
Antonis Maronikolakis
50d1ce411f add DistilBERT to supported models (#4558) 2020-05-25 14:50:45 -04:00
Suraj Patil
03d8527de0 Longformer for question answering (#4500)
* added LongformerForQuestionAnswering

* add LongformerForQuestionAnswering

* fix import for LongformerForMaskedLM

* add LongformerForQuestionAnswering

* hardcoded sep_token_id

* compute attention_mask if not provided

* combine global_attention_mask with attention_mask when provided

* update example in  docstring

* add assert error messages, better attention combine

* add test for longformerForQuestionAnswering

* typo

* cast gloabl_attention_mask to long

* make style

* Update src/transformers/configuration_longformer.py

* Update src/transformers/configuration_longformer.py

* fix the code quality

* Merge branch 'longformer-for-question-answering' of https://github.com/patil-suraj/transformers into longformer-for-question-answering

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-25 18:43:36 +02:00
Bharat Raghunathan
a34a9896ac DOC: Fix typos in modeling_auto (#4534) 2020-05-23 09:40:59 -04:00
Bijay Gurung
e19b978151 Add Type Hints to modeling_utils.py Closes #3911 (#3948)
* Add Type Hints to modeling_utils.py Closes #3911

Add Type Hints to methods in `modeling_utils.py`

Note: The coverage isn't 100%. Mostly skipped internal methods.

* Reformat according to `black` and `isort`

* Use typing.Iterable instead of Sequence

* Parameterize Iterable by its generic type

* Use typing.Optional when None is the default value

* Adhere to style guideline

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_utils.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-22 19:10:22 -04:00
Funtowicz Morgan
996f393a86 Warn the user about max_len being on the path to be deprecated. (#4528)
* Warn the user about max_len being on the path to be deprecated.

* Ensure better backward compatibility when max_len is provided to a tokenizer.

* Make sure to override the parameter and not the actual instance value.

* Format & quality
2020-05-22 18:08:30 -04:00
Patrick von Platen
0f6969b7e9 Better github link for Reformer Colab Notebook 2020-05-22 23:51:36 +02:00
Sam Shleifer
ab44630db2 [Summarization Pipeline]: Fix default tokenizer (#4506)
* Fix pipelines defaults bug

* one liner

* style
2020-05-22 17:49:45 -04:00
Julien Chaumond
2c1ebb8b50 Re-apply #4446 + add packaging dependency
As discussed w/ @lysandrejik

packaging is maintained by PyPA (the Python Packaging Authority), and should be lightweight and stable
2020-05-22 17:29:03 -04:00
Lysandre
e6aeb0d3e8 Style 2020-05-22 17:20:03 -04:00
Alexander Measure
95a26fcf2d link to paper was broken (#4526)
changed from https://https://arxiv.org/abs/2001.04451.pdf to https://arxiv.org/abs/2001.04451.pdf
2020-05-22 15:17:09 -04:00
HUSEIN ZOLKEPLI
89d795f180 Added huseinzol05/t5-small-bahasa-cased README.md (#4522) 2020-05-22 15:04:06 -04:00
Anthony MOI
35df911485 Fix convert_token_type_ids_from_sequences for fast tokenizers (#4503) 2020-05-22 12:45:10 -04:00
Julien Chaumond
f7677e1623 [model_cards] bart-large-cnn
cc @sshleifer
2020-05-22 12:20:54 -04:00
Patrick von Platen
12e6afe900 Add Reformer colab to community noteboos 2020-05-22 17:03:34 +02:00
Lysandre
ef22ba4836 Re-pin versions 2020-05-22 11:03:07 -04:00
Lysandre
10d72390c0 Revert #4446 Since it introduces a new dependency
Some checks failed
GitHub-hosted runner / check_code_quality (push) Has been cancelled
2020-05-22 10:49:45 -04:00
Lysandre
e0db6bbd65 Release: v2.10.0 2020-05-22 10:37:44 -04:00
Frankie Liuzzi
bd6e301832 added functionality for electra classification head (#4257)
* added functionality for electra classification head

* unneeded dropout

* Test ELECTRA for sequence classification

* Style

Co-authored-by: Frankie <frankie@frase.io>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-05-22 09:48:21 -04:00
Lysandre
a086527727 Unused Union should not be imported 2020-05-21 09:42:47 -04:00
Lysandre Debut
9d2ce253de TPU hangs when saving optimizer/scheduler (#4467)
* TPU hangs when saving optimizer/scheduler

* Style

* ParallelLoader is not a DataLoader

* Style

* Addressing @julien-c's comments
2020-05-21 09:18:27 -04:00
Zhangyx
49296533ca Adds predict stage for glue tasks, and generate result files which can be submitted to gluebenchmark.com (#4463)
* Adds predict stage for glue tasks, and generate result files which could be submitted to gluebenchmark.com website.

* Use Split enum + always output the label name

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-21 09:17:44 -04:00
Tobias Lee
271bedb485 [examples] fix no grad in second pruning in run_bertology (#4479)
* fix no grad in second pruning and typo

* fix prune heads attention mismatch problem

* fix

* fix

* fix

* run make style

* run make style
2020-05-21 09:17:03 -04:00
Julien Chaumond
865d4d595e [ci] Close #4481 2020-05-20 18:27:42 -04:00
Julien Chaumond
a3af8e86cb Update test_trainer_distributed.py 2020-05-20 18:26:51 -04:00
Cola
eacea530c1 🚨 Remove warning of deprecation (#4477)
Remove warning of deprecated overload of addcdiv_

Fix #4451
2020-05-20 16:48:29 -04:00
Julien Plu
fa2fbed3e5 Better None gradients handling in TF Trainer (#4469)
* Better None gradients handling

* Apply Style

* Apply Style
2020-05-20 16:46:21 -04:00
Oliver Åstrand
e708bb75bf Correct TF formatting to exclude LayerNorms from weight decay (#4448)
* Exclude LayerNorms from weight decay

* Include both formats of layer norm
2020-05-20 16:45:59 -04:00
Rens
49c06132df pass on tokenizer to pipeline (#4489) 2020-05-20 22:23:21 +02:00
Nathan Cooper
cacb654c7f Add Fine-tune DialoGPT on new datasets notebook (#4473) 2020-05-20 16:17:52 -04:00
Timo Moeller
30a09f3827 Adjust german bert model card, add new model card (#4488) 2020-05-20 16:08:29 -04:00
Lysandre Debut
14cb5b35fa Fix slow gpu tests lysandre (#4487)
* There is one missing key in BERT

* Correct device for CamemBERT model

* RoBERTa tokenization adding prefix space

* Style
2020-05-20 11:59:45 -04:00
Manuel Romero
6dc52c78d8 Create README.md (#4482) 2020-05-20 09:45:50 -04:00
Manuel Romero
ed5456daf4 Model card for RuPERTa-base fine-tuned for NER (#4466) 2020-05-20 09:45:24 -04:00
Oleksandr Bushkovskyi
c76450e20c Model card for Tereveni-AI/gpt2-124M-uk-fiction (#4470)
Create model card for "Tereveni-AI/gpt2-124M-uk-fiction" model
2020-05-20 09:44:26 -04:00
Hu Xu
9907dc523a add BERT trained from review corpus. (#4405)
* add model_cards for BERT trained on reviews.

* add link to repository.

* refine README.md for each review model
2020-05-20 09:42:35 -04:00
Sam Shleifer
efbc1c5a9d [MarianTokenizer] implement save_vocabulary and other common methods (#4389) 2020-05-19 19:45:49 -04:00
Sam Shleifer
956c4c4eb4 [gpu slow tests] fix mbart-large-enro gpu tests (#4472) 2020-05-19 19:45:31 -04:00
Patrick von Platen
48c3a70b4e [Longformer] Docs and clean API (#4464)
* add longformer docs

* improve docs
2020-05-19 21:52:36 +02:00
Patrick von Platen
aa925a52fa [Tests, GPU, SLOW] fix a bunch of GPU hardcoded tests in Pytorch (#4468)
* fix gpu slow tests in pytorch

* change model to device syntax
2020-05-19 21:35:04 +02:00
Suraj Patil
5856999a9f add T5 fine-tuning notebook [Community notebooks] (#4462)
* add T5 fine-tuning notebook [Community notebooks]

* Update README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-19 18:26:28 +02:00
Sam Shleifer
07dd7c2fd8 [cleanup] test_tokenization_common.py (#4390) 2020-05-19 10:46:55 -04:00
Iz Beltagy
8f1d047148 Longformer (#4352)
* first commit

* bug fixes

* better examples

* undo padding

* remove wrong VOCAB_FILES_NAMES

* License

* make style

* make isort happy

* unit tests

* integration test

* make `black` happy by undoing `isort` changes!!

* lint

* no need for the padding value

* batch_size not bsz

* remove unused type casting

* seqlen not seq_len

* staticmethod

* `bert` selfattention instead of `n2`

* uint8 instead of bool + lints

* pad inputs_embeds using embeddings not a constant

* black

* unit test with padding

* fix unit tests

* remove redundant unit test

* upload model weights

* resolve todo

* simpler _mask_invalid_locations without lru_cache + backward compatible masked_fill_

* increase unittest coverage
2020-05-19 16:04:43 +02:00
Girishkumar
31eedff5a0 Refactored the README.md file (#4427) 2020-05-19 09:56:24 -04:00
Shaoyen
384f0eb2f9 Map optimizer to correct device after loading from checkpoint. (#4403)
* Map optimizer to correct device after loading from checkpoint.

* Make style test pass

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-18 23:16:05 -04:00
Julien Chaumond
bf14ef75f1 [Trainer] move model to device before setting optimizer (#4450) 2020-05-18 23:13:33 -04:00
Julien Chaumond
5e7fe8b585 Distributed eval: SequentialDistributedSampler + gather all results (#4243)
* Distributed eval: SequentialDistributedSampler + gather all results

* For consistency only write to disk from world_master

Close https://github.com/huggingface/transformers/issues/4272

* Working distributed eval

* Hook into scripts

* Fix #3721 again

* TPU.mesh_reduce: stay in tensor space

Thanks @jysohn23

* Just a small comment

* whitespace

* torch.hub: pip install packaging

* Add test scenarii
2020-05-18 22:02:39 -04:00
Julien Chaumond
4c06893610 Fix nn.DataParallel compatibility in PyTorch 1.5 (#4300)
* Test case for #3936

* multigpu tests pass on pytorch 1.4.0

* Fixup

* multigpu tests pass on pytorch 1.5.0

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_utils.py

* rename multigpu to require_multigpu

* mode doc
2020-05-18 20:34:50 -04:00
Rakesh Chada
9de4afa897 Make get_last_lr in trainer backward compatible (#4446)
* makes fetching last learning late in trainer backward compatible

* split comment to multiple lines

* fixes black styling issue

* uses version to create a more explicit logic
2020-05-18 20:17:36 -04:00
Stefan Dumitrescu
42e8fbfc51 Added model cards for Romanian BERT models (#4437)
* Create README.md

* Create README.md

* Update README.md

* Update README.md

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-18 18:48:56 -04:00
Oliver Guhr
54065d68b8 added model card for german-sentiment-bert (#4435) 2020-05-18 18:44:41 -04:00
Martin Müller
e28b7e2311 Create README.md (#4433) 2020-05-18 18:41:34 -04:00
sy-wada
09b933f19d Update README.md (model_card) (#4424)
- add a citation.
- modify the table of the BLUE benchmark.

The table of the first version was not displayed correctly on https://huggingface.co/seiya/oubiobert-base-uncased.
Could you please confirm that this fix will allow you to display it correctly?
2020-05-18 18:18:17 -04:00
Manuel Romero
235777ccc9 Modify example of usage (#4413)
I followed the google example of usage for its electra small model but i have seen it is not meaningful, so i created a better example
2020-05-18 18:17:33 -04:00
Suraj Patil
9ddd3a6548 add model card for t5-base-squad (#4409)
* add model card for t5-base-squad

* Update model_cards/valhalla/t5-base-squad/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-18 18:17:14 -04:00
HUSEIN ZOLKEPLI
c5aa114392 Added README huseinzol05/t5-base-bahasa-cased (#4377)
* add bert bahasa readme

* update readme

* update readme

* added xlnet

* added tiny-bert and fix xlnet readme

* added albert base

* added albert tiny

* added electra model

* added gpt2 117m bahasa readme

* added gpt2 345m bahasa readme

* added t5-base-bahasa

* fix readme

* Update model_cards/huseinzol05/t5-base-bahasa-cased/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-18 18:10:23 -04:00
Funtowicz Morgan
ca4a3f4da9 Adding optimizations block from ONNXRuntime. (#4431)
* Adding optimizations block from ONNXRuntime.

* Turn off external data format by default for PyTorch export.

* Correct the way use_external_format is passed through the cmdline args.
2020-05-18 20:32:33 +02:00
Patrick von Platen
24538df919 [Community notebooks] General notebooks (#4441)
* Update README.md

* Update README.md

* Update README.md

* Update README.md
2020-05-18 20:23:57 +02:00
Sam Shleifer
a699525d25 [test_pipelines] Mark tests > 10s @slow, small speedups (#4421) 2020-05-18 12:23:21 -04:00
Boris Dayma
d9ece8233d fix(run_language_modeling): use arg overwrite_cache (#4407) 2020-05-18 11:37:35 -04:00
Patrick von Platen
d39bf0ac2d better naming in tf t5 (#4401) 2020-05-18 11:34:00 -04:00
Patrick von Platen
590adb130b improve docstring (#4422) 2020-05-18 11:31:35 -04:00
Patrick von Platen
026a5d0888 [T5 fp16] Fix fp16 in T5 (#4436)
* fix fp16 in t5

* make style

* refactor invert_attention_mask fn

* fix typo
2020-05-18 17:25:58 +02:00
Soham Chatterjee
fa6113f9a0 Fixed spelling of training (#4416) 2020-05-18 11:23:29 -04:00
Julien Chaumond
757baee846 Fix un-prefixed f-string
see https://github.com/huggingface/transformers/pull/4367#discussion_r426356693

Hat/tip @girishponkiya
2020-05-18 11:20:46 -04:00
Patrick von Platen
a27c795908 fix (#4419) 2020-05-18 15:51:40 +02:00
Funtowicz Morgan
31c799a0c9 Tag onnx export tests as slow (#4432) 2020-05-18 09:24:41 -04:00
Mehrad Moradshahi
8581a670e3 [MbartTokenizer] save to sentencepiece.bpe.model (#4335) 2020-05-18 08:54:04 -04:00
Lorenzo Ampil
18d233d525 Allow the creation of "entity groups" for NerPipeline #3548 (#3957)
* Add index to be returned by NerPipeline to allow for the creation of

* Add entity groups

* Convert entity list to dict

* Add entity to entity_group_disagg atfter updating entity gorups

* Change 'group' parameter to 'grouped_entities'

* Add unit tests for grouped NER pipeline case

* Correct variable name typo for NER_FINETUNED_MODELS

* Sync grouped tests to recent test updates
2020-05-17 09:25:17 +02:00
Julien Chaumond
3e0f062106 Fix addcmul_ 2020-05-15 17:44:17 -04:00
Julien Chaumond
fc2a4c88ce Fix: one more try 2020-05-15 17:38:48 -04:00
Julien Chaumond
55bda52555 Same fix for addcmul_ 2020-05-15 17:23:48 -04:00
Julien Chaumond
ad02c961c6 Fix UserWarning: This overload of add_ is deprecated in pytorch==1.5.0 2020-05-15 17:09:11 -04:00
Julien Chaumond
15550ce0d1 [skip ci] remove local rank 2020-05-15 17:08:38 -04:00
Nikita
62427d0815 rerun notebook 02-transformers (#4341) 2020-05-15 10:33:08 -04:00
Jared T Nielsen
34706ba050 Allow for None gradients in GradientAccumulator. (#4372) 2020-05-15 09:52:00 -04:00
Lysandre Debut
edf9ac11d4 Should return overflowing information for the log (#4385) 2020-05-15 09:49:11 -04:00
Funtowicz Morgan
b908f2e9dd Attempt to unpin torch version for Github Action. (#4384) 2020-05-15 15:47:15 +02:00
Julien Chaumond
af2e6bf87c [examples] Streamline doc 2020-05-14 20:34:31 -04:00
Lysandre Debut
7defc6670f p_mask in SQuAD pre-processing (#4049)
* Better p_mask building

* Adressing @mfuntowicz comments
2020-05-14 17:07:52 -04:00
Morgan Funtowicz
84894974bd Updated ONNX notebook link in README. 2020-05-14 22:40:59 +02:00
Funtowicz Morgan
db0076a9df Conversion script to export transformers models to ONNX IR. (#4253)
* Added generic ONNX conversion script for PyTorch model.

* WIP initial TF support.

* TensorFlow/Keras ONNX export working.

* Print framework version info

* Add possibility to check the model is correctly loading on ONNX runtime.

* Remove quantization option.

* Specify ONNX opset version when exporting.

* Formatting.

* Remove unused imports.

* Make functions more generally reusable from other part of the code.

* isort happy.

* flake happy

* Export only feature-extraction for now

* Correctly check inputs order / filter before export.

* Removed task variable

* Fix invalid args call in load_graph_from_args.

* Fix invalid args call in convert.

* Fix invalid args call in infer_shapes.

* Raise exception and catch in caller function instead of exit.

* Add 04-onnx-export.ipynb notebook

* More WIP on the notebook

* Remove unused imports

* Simplify & remove unused constants.

* Export with constant_folding in PyTorch

* Let's try to put function args in the right order this time ...

* Disable external_data_format temporary

* ONNX notebook draft ready.

* Updated notebooks charts + wording

* Correct error while exporting last chart in notebook.

* Adressing @LysandreJik comment.

* Set ONNX opset to 11 as default value.

* Set opset param mandatory

* Added ONNX export unittests

* Quality.

* flake8 happy

* Add keras2onnx dependency on extras["tf"]

* Pin keras2onnx on github master to v1.6.5

* Second attempt.

* Third attempt.

* Use the right repo URL this time ...

* Do the same for onnxconverter-common

* Added keras2onnx and onnxconveter-common to 1.7.0 to supports TF2.2

* Correct commit hash.

* Addressing PR review: Optimization are enabled by default.

* Addressing PR review: small changes in the notebook

* setup.py comment about keras2onnx versioning.
2020-05-14 16:35:52 -04:00
Suraj Patil
2d05480174 Fix trainer evaluation (#4363)
* fix loss calculation in evaluation

* fix evaluation on TPU when prediction_loss_only is True
2020-05-14 14:39:44 -04:00
Savaş Yıldırım
035678efdb Create README.md (#4359)
* Create README.md

* Update model_cards/savasy/bert-base-turkish-squad/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-14 14:07:32 -04:00
sy-wada
b9c9e05381 Create README.md (#4357) 2020-05-14 14:06:10 -04:00
Sam Shleifer
9535bf1977 Tokenizer.batch_decode convenience method (#4159) 2020-05-14 13:50:47 -04:00
Sam Shleifer
7822cd38a0 [tests] make pipelines tests faster with smaller models (#4238)
covers torch and tf. Also fixes a failing @slow test
2020-05-14 13:36:02 -04:00
Julien Chaumond
448c467256 Fix: unpin flake8 and fix cs errors (#4367)
* Fix: unpin flake8 and fix cs errors

* Ok we still need to quote those
2020-05-14 13:14:26 -04:00
Julien Chaumond
c547f15a17 Use Filelock to ensure distributed barriers
see context in https://github.com/huggingface/transformers/pull/4223
2020-05-14 11:58:32 -04:00
Julien Chaumond
015f7812ed [ci skip] Pin isort 2020-05-14 10:12:18 -04:00
Lysandre Debut
ef46ccb05c TPU needs a rendezvous (#4339) 2020-05-14 08:59:52 -04:00
Viktor Alm
94cb73c2d2 Add image and metadata (#4345)
Unfortunately i accidentally orphaned my other PR
2020-05-13 20:05:15 -04:00
Manuel Romero
a0eebdc404 Add link to W&B to see whole training logs (#4348) 2020-05-13 20:04:57 -04:00
1269 changed files with 179022 additions and 36558 deletions

View File

@@ -1,4 +1,66 @@
version: 2
version: 2.1
orbs:
gcp-gke: circleci/gcp-gke@1.0.4
go: circleci/go@1.3.0
# TPU REFERENCES
references:
checkout_ml_testing: &checkout_ml_testing
run:
name: Checkout ml-testing-accelerators
command: |
git clone https://github.com/GoogleCloudPlatform/ml-testing-accelerators.git
cd ml-testing-accelerators
git fetch origin 5e88ac24f631c27045e62f0e8d5dfcf34e425e25:stable
git checkout stable
build_push_docker: &build_push_docker
run:
name: Configure Docker
command: |
gcloud --quiet auth configure-docker
cd docker/transformers-pytorch-tpu
if [ -z "$CIRCLE_PR_NUMBER" ]; then docker build --tag "$GCR_IMAGE_PATH:$CIRCLE_WORKFLOW_JOB_ID" -f Dockerfile --build-arg "TEST_IMAGE=1" . ; else docker build --tag "$GCR_IMAGE_PATH:$CIRCLE_WORKFLOW_JOB_ID" -f Dockerfile --build-arg "TEST_IMAGE=1" --build-arg "GITHUB_REF=pull/$CIRCLE_PR_NUMBER/head" . ; fi
docker push "$GCR_IMAGE_PATH:$CIRCLE_WORKFLOW_JOB_ID"
deploy_cluster: &deploy_cluster
run:
name: Deploy the job on the kubernetes cluster
command: |
go get github.com/google/go-jsonnet/cmd/jsonnet && \
export PATH=$PATH:$HOME/go/bin && \
kubectl create -f docker/transformers-pytorch-tpu/dataset.yaml || true && \
job_name=$(jsonnet -J ml-testing-accelerators/ docker/transformers-pytorch-tpu/bert-base-cased.jsonnet --ext-str image=$GCR_IMAGE_PATH --ext-str image-tag=$CIRCLE_WORKFLOW_JOB_ID | kubectl create -f -) && \
job_name=${job_name#job.batch/} && \
job_name=${job_name% created} && \
echo "Waiting on kubernetes job: $job_name" && \
i=0 && \
# 30 checks spaced 30s apart = 900s total.
max_checks=30 && \
status_code=2 && \
# Check on the job periodically. Set the status code depending on what
# happened to the job in Kubernetes. If we try max_checks times and
# still the job hasn't finished, give up and return the starting
# non-zero status code.
while [ $i -lt $max_checks ]; do ((i++)); if kubectl get jobs $job_name -o jsonpath='Failed:{.status.failed}' | grep "Failed:1"; then status_code=1 && break; elif kubectl get jobs $job_name -o jsonpath='Succeeded:{.status.succeeded}' | grep "Succeeded:1" ; then status_code=0 && break; else echo "Job not finished yet"; fi; sleep 30; done && \
echo "Done waiting. Job status code: $status_code" && \
pod_name=$(kubectl get po -l controller-uid=`kubectl get job $job_name -o "jsonpath={.metadata.labels.controller-uid}"` | awk 'match($0,!/NAME/) {print $1}') && \
echo "GKE pod name: $pod_name" && \
kubectl logs -f $pod_name --container=train
echo "Done with log retrieval attempt." && \
gcloud container images delete "$GCR_IMAGE_PATH:$CIRCLE_WORKFLOW_JOB_ID" --force-delete-tags && \
exit $status_code
delete_gke_jobs: &delete_gke_jobs
run:
name: Delete GKE Jobs
command: |
# Match jobs whose age matches patterns like '1h' or '1d', i.e. any job
# that has been around longer than 1hr. First print all columns for
# matches, then execute the delete.
kubectl get job | awk 'match($4,/[0-9]+[dh]/) {print $0}'
kubectl delete job $(kubectl get job | awk 'match($4,/[0-9]+[dh]/) {print $1}')
jobs:
run_tests_torch_and_tf:
working_directory: ~/transformers
@@ -10,11 +72,23 @@ jobs:
parallelism: 1
steps:
- checkout
- run: sudo pip install .[sklearn,tf-cpu,torch,testing]
- run: sudo pip install codecov pytest-cov
- run: python -m pytest -n 8 --dist=loadfile -s -v ./tests/ --cov
- restore_cache:
keys:
- v0.3-torch_and_tf-{{ checksum "setup.py" }}
- v0.3-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install git+https://github.com/huggingface/datasets
- run: pip install .[sklearn,tf-cpu,torch,testing]
- run: pip install codecov pytest-cov
- save_cache:
key: v0.3-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -n 8 --dist=loadfile -rA -s ./tests/ --cov --durations=0 | tee output.txt
- run: codecov
- store_artifacts:
path: ~/transformers/output.txt
destination: test_output.txt
run_tests_torch:
working_directory: ~/transformers
docker:
@@ -25,10 +99,21 @@ jobs:
parallelism: 1
steps:
- checkout
- run: sudo pip install .[sklearn,torch,testing]
- run: sudo pip install codecov pytest-cov
- run: python -m pytest -n 8 --dist=loadfile -s -v ./tests/ --cov
- run: codecov
- restore_cache:
keys:
- v0.3-torch-{{ checksum "setup.py" }}
- v0.3-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install git+https://github.com/huggingface/datasets
- run: pip install .[sklearn,torch,testing]
- save_cache:
key: v0.3-torch-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -n 8 --dist=loadfile -rA -s ./tests/ | tee output.txt
- store_artifacts:
path: ~/transformers/output.txt
destination: test_output.txt
run_tests_tf:
working_directory: ~/transformers
docker:
@@ -39,10 +124,46 @@ jobs:
parallelism: 1
steps:
- checkout
- run: sudo pip install .[sklearn,tf-cpu,testing]
- run: sudo pip install codecov pytest-cov
- run: python -m pytest -n 8 --dist=loadfile -s -v ./tests/ --cov
- run: codecov
- restore_cache:
keys:
- v0.3-tf-{{ checksum "setup.py" }}
- v0.3-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install git+https://github.com/huggingface/datasets
- run: pip install .[sklearn,tf-cpu,testing]
- save_cache:
key: v0.3-tf-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -n 8 --dist=loadfile -rA -s ./tests/ | tee output.txt
- store_artifacts:
path: ~/transformers/output.txt
destination: test_output.txt
run_tests_flax:
working_directory: ~/transformers
docker:
- image: circleci/python:3.7
environment:
OMP_NUM_THREADS: 1
resource_class: xlarge
parallelism: 1
steps:
- checkout
- restore_cache:
keys:
- v0.3-flax-{{ checksum "setup.py" }}
- v0.3-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install git+https://github.com/huggingface/datasets
- run: sudo pip install .[flax,sklearn,torch,testing]
- save_cache:
key: v0.3-flax-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -n 8 --dist=loadfile -rA -s ./tests/ | tee output.txt
- store_artifacts:
path: ~/transformers/output.txt
destination: test_output.txt
run_tests_custom_tokenizers:
working_directory: ~/transformers
docker:
@@ -51,8 +172,21 @@ jobs:
RUN_CUSTOM_TOKENIZERS: yes
steps:
- checkout
- run: sudo pip install .[mecab,testing]
- run: python -m pytest -sv ./tests/test_tokenization_bert_japanese.py
- restore_cache:
keys:
- v0.3-custom_tokenizers-{{ checksum "setup.py" }}
- v0.3-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install .[ja,testing]
- run: python -m unidic download
- save_cache:
key: v0.3-custom_tokenizers-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -s ./tests/test_tokenization_bert_japanese.py | tee output.txt
- store_artifacts:
path: ~/transformers/output.txt
destination: test_output.txt
run_examples_torch:
working_directory: ~/transformers
docker:
@@ -63,17 +197,38 @@ jobs:
parallelism: 1
steps:
- checkout
- run: sudo pip install .[sklearn,torch,testing]
- run: sudo pip install -r examples/requirements.txt
- run: python -m pytest -n 8 --dist=loadfile -s -v ./examples/
- restore_cache:
keys:
- v0.3-torch_examples-{{ checksum "setup.py" }}
- v0.3-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install .[sklearn,torch,testing]
- run: pip install -r examples/requirements.txt
- save_cache:
key: v0.3-torch_examples-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -n 8 --dist=loadfile -rA -s ./examples/ | tee output.txt
- store_artifacts:
path: ~/transformers/output.txt
destination: test_output.txt
build_doc:
working_directory: ~/transformers
docker:
- image: circleci/python:3.6
steps:
- checkout
- run: sudo pip install .[tf,torch,docs]
- run: cd docs && make html
- restore_cache:
keys:
- v0.3-build_doc-{{ checksum "setup.py" }}
- v0.3-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install .[tf,torch,sentencepiece,docs]
- save_cache:
key: v0.3-build_doc-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: cd docs && make html SPHINXOPTS="-W"
- store_artifacts:
path: ./docs/_build
deploy_doc:
@@ -85,7 +240,15 @@ jobs:
fingerprints:
- "5b:7a:95:18:07:8c:aa:76:4c:60:35:88:ad:60:56:71"
- checkout
- run: sudo pip install .[tf,torch,docs]
- restore_cache:
keys:
- v0.3-deploy_doc-{{ checksum "setup.py" }}
- v0.3-{{ checksum "setup.py" }}
- run: pip install .[tf,torch,sentencepiece,docs]
- save_cache:
key: v0.3-deploy_doc-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: ./.circleci/deploy.sh
check_code_quality:
working_directory: ~/transformers
@@ -95,12 +258,23 @@ jobs:
parallelism: 1
steps:
- checkout
# we need a version of isort with https://github.com/timothycrosley/isort/pull/1000
- run: sudo pip install git+git://github.com/timothycrosley/isort.git@e63ae06ec7d70b06df9e528357650281a3d3ec22#egg=isort
- run: sudo pip install .[tf,torch,quality]
- run: black --check --line-length 119 --target-version py35 examples templates tests src utils
- run: isort --check-only --recursive examples templates tests src utils
- restore_cache:
keys:
- v0.3-code_quality-{{ checksum "setup.py" }}
- v0.3-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install isort
- run: pip install .[tf,torch,flax,quality]
- save_cache:
key: v0.3-code_quality-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: black --check examples templates tests src utils
- run: isort --check-only examples templates tests src utils
- run: flake8 examples templates tests src utils
- run: python utils/check_copies.py
- run: python utils/check_dummies.py
- run: python utils/check_repo.py
check_repository_consistency:
working_directory: ~/transformers
docker:
@@ -109,8 +283,37 @@ jobs:
parallelism: 1
steps:
- checkout
- run: sudo pip install requests
- run: pip install requests
- run: python ./utils/link_tester.py
# TPU JOBS
run_examples_tpu:
docker:
- image: circleci/python:3.6
environment:
OMP_NUM_THREADS: 1
resource_class: xlarge
parallelism: 1
steps:
- checkout
- go/install
- *checkout_ml_testing
- gcp-gke/install
- gcp-gke/update-kubeconfig-with-credentials:
cluster: $GKE_CLUSTER
perform-login: true
- setup_remote_docker
- *build_push_docker
- *deploy_cluster
cleanup-gke-jobs:
docker:
- image: circleci/python:3.6
steps:
- gcp-gke/install
- gcp-gke/update-kubeconfig-with-credentials:
cluster: $GKE_CLUSTER
perform-login: true
- *delete_gke_jobs
workflow_filters: &workflow_filters
filters:
branches:
@@ -127,5 +330,18 @@ workflows:
- run_tests_torch_and_tf
- run_tests_torch
- run_tests_tf
- run_tests_flax
- build_doc
- deploy_doc: *workflow_filters
tpu_testing_jobs:
triggers:
- schedule:
# Set to run at the first minute of every hour.
cron: "0 8 * * *"
filters:
branches:
only:
- master
jobs:
- cleanup-gke-jobs
- run_examples_tpu

View File

@@ -5,19 +5,31 @@ function deploy_doc(){
git checkout $1
if [ ! -z "$2" ]
then
if [ -d "$dir/$2" ]; then
if [ "$2" == "master" ]; then
echo "Pushing master"
make clean && make html && scp -r -oStrictHostKeyChecking=no _build/html/* $doc:$dir/$2/
cp -r _build/html/_static .
elif ssh -oStrictHostKeyChecking=no $doc "[ -d $dir/$2 ]"; then
echo "Directory" $2 "already exists"
scp -r -oStrictHostKeyChecking=no _static/* $doc:$dir/$2/_static/
else
echo "Pushing version" $2
make clean && make html && scp -r -oStrictHostKeyChecking=no _build/html $doc:$dir/$2
make clean && make html
rm -rf _build/html/_static
cp -r _static _build/html
scp -r -oStrictHostKeyChecking=no _build/html $doc:$dir/$2
fi
else
echo "Pushing master"
make clean && make html && scp -r -oStrictHostKeyChecking=no _build/html/* $doc:$dir
echo "Pushing stable"
make clean && make html
rm -rf _build/html/_static
cp -r _static _build/html
scp -r -oStrictHostKeyChecking=no _build/html/* $doc:$dir
fi
}
deploy_doc "master"
# You can find the commit for each tag on https://github.com/huggingface/transformers/tags
deploy_doc "master" master
deploy_doc "b33a385" v1.0.0
deploy_doc "fe02e45" v1.1.0
deploy_doc "89fd345" v1.2.0
@@ -27,3 +39,15 @@ deploy_doc "3616209" v2.2.0
deploy_doc "d0f8b9a" v2.3.0
deploy_doc "6664ea9" v2.4.0
deploy_doc "fb560dc" v2.5.0
deploy_doc "b90745c" v2.5.1
deploy_doc "fbc5bf1" v2.6.0
deploy_doc "6f5a12a" v2.7.0
deploy_doc "11c3257" v2.8.0
deploy_doc "e7cfc1a" v2.9.0
deploy_doc "7cb203f" v2.9.1
deploy_doc "10d7239" v2.10.0
deploy_doc "b42586e" v2.11.0
deploy_doc "7fb8bdf" v3.0.2
deploy_doc "4b3ee9c" v3.1.0
deploy_doc "3ebb1b3" v3.2.0
deploy_doc "0613f05" # v3.3.0 Latest stable release

View File

@@ -7,14 +7,53 @@ assignees: ''
---
# 🐛 Bug
## Environment info
<!-- You can run the command `transformers-cli env` and copy-and-paste its output below.
Don't forget to fill out the missing fields in that output! -->
- `transformers` version:
- Platform:
- Python version:
- PyTorch version (GPU?):
- Tensorflow version (GPU?):
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
### Who can help
<!-- Your issue will be replied to more quickly if you can figure out the right person to tag with @
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
Please tag fewer than 3 people.
albert, bert, GPT2, XLM: @LysandreJik
tokenizers: @mfuntowicz
Trainer: @sgugger
Speed and Memory Benchmarks: @patrickvonplaten
Model Cards: @julien-c
Translation: @sshleifer
Summarization: @sshleifer
TextGeneration: @TevenLeScao
examples/distillation: @VictorSanh
nlp datasets: [different repo](https://github.com/huggingface/nlp)
rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Text Generation: @TevenLeScao
blenderbot: @mariamabarham
Bart: @sshleifer
Marian: @sshleifer
T5: @patrickvonplaten
Longformer/Reformer: @patrickvonplaten
TransfoXL/XLNet: @TevenLeScao
examples/seq2seq: @sshleifer
examples/bert-loses-patience: @JetRunner
tensorflow: @jplu
examples/token-classification: @stefan-it
documentation: @sgugger
-->
## Information
Model I am using (Bert, XLNet ...):
Language I am using the model on (English, Chinese ...):
The problem arises when using:
* [ ] the official example scripts: (give details below)
* [ ] my own modified scripts: (give details below)
@@ -38,15 +77,3 @@ Steps to reproduce the behavior:
## Expected behavior
<!-- A clear and concise description of what you would expect to happen. -->
## Environment info
<!-- You can run the command `transformers-cli env` and copy-and-paste its output below.
Don't forget to fill out the missing fields in that output! -->
- `transformers` version:
- Platform:
- Python version:
- PyTorch version (GPU?):
- Tensorflow version (GPU?):
- Using GPU in script?:
- Using distributed or parallel set-up in script?:

View File

@@ -1,6 +1,6 @@
---
name: "❓ Questions & Help"
about: Post your general questions on Stack Overflow tagged huggingface-transformers
about: Post your general questions on the Hugging Face forum or Stack Overflow tagged huggingface-transformers
title: ''
labels: ''
assignees: ''
@@ -11,19 +11,17 @@ assignees: ''
<!-- The GitHub issue tracker is primarly intended for bugs, feature requests,
new models and benchmarks, and migration questions. For all other questions,
we direct you to Stack Overflow (SO) where a whole community of PyTorch and
Tensorflow enthusiast can help you out. Make sure to tag your question with the
right deep learning framework as well as the huggingface-transformers tag:
we direct you to the Hugging Face forum: https://discuss.huggingface.co/ .
You can also try Stack Overflow (SO) where a whole community of PyTorch and
Tensorflow enthusiast can help you out. In this case, make sure to tag your
question with the right deep learning framework as well as the
huggingface-transformers tag:
https://stackoverflow.com/questions/tagged/huggingface-transformers
If your question wasn't answered after a period of time on Stack Overflow, you
can always open a question on GitHub. You should then link to the SO question
that you posted.
-->
## Details
<!-- Description of your issue -->
<!-- You should first ask your question on SO, and only if
<!-- You should first ask your question on the forum or SO, and only if
you didn't get an answer ask it here on GitHub. -->
**A link to original question on Stack Overflow**:
**A link to original question on the forum/Stack Overflow**:

63
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

@@ -0,0 +1,63 @@
# What does this PR do?
<!--
Congratulations! You've made it this far! You're not quite done yet though.
Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution.
Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change.
Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost.
-->
<!-- Remove if not applicable -->
Fixes # (issue)
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md#start-contributing-pull-requests),
Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link
to the it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the
[documentation guidelines](https://github.com/huggingface/transformers/tree/master/docs), and
[here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/master/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?
## Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.
<!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
Please tag fewer than 3 people.
albert, bert, XLM: @LysandreJik
GPT2: @LysandreJik, @patrickvonplaten
tokenizers: @mfuntowicz
Trainer: @sgugger
Benchmarks: @patrickvonplaten
Model Cards: @julien-c
Translation: @sshleifer
Summarization: @sshleifer
examples/distillation: @VictorSanh
nlp datasets: [different repo](https://github.com/huggingface/nlp)
rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Text Generation: @patrickvonplaten, @TevenLeScao
Blenderbot, Bart, Marian, Pegasus: @sshleifer
T5: @patrickvonplaten
Rag: @patrickvonplaten, @lhoestq
EncoderDecoder: @patrickvonplaten
Longformer, Reformer: @patrickvonplaten
TransfoXL, XLNet: @TevenLeScao, @patrickvonplaten
examples/seq2seq: @sshleifer
examples/bert-loses-patience: @JetRunner
tensorflow: @jplu
examples/token-classification: @stefan-it
documentation: @sgugger
-->

View File

@@ -1,19 +0,0 @@
name: GitHub-hosted runner
on: push
jobs:
check_code_quality:
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v1
with:
python-version: 3.7
# - name: Install dependencies
# run: |
# pip install .[tf,torch,quality]

View File

@@ -18,10 +18,19 @@ jobs:
uses: actions/setup-python@v1
with:
python-version: 3.7
- name: Loading cache
uses: actions/cache@v2
id: cache
with:
path: ~/.cache/pip
key: v0-torch_hub-${{ hashFiles('setup.py') }}
- name: Install dependencies
run: |
pip install --upgrade pip
pip install torch
pip install numpy tokenizers filelock requests tqdm regex sentencepiece sacremoses
pip install numpy filelock protobuf requests tqdm regex sentencepiece sacremoses tokenizers packaging
- name: Torch hub list
run: |

View File

@@ -14,7 +14,7 @@ on:
jobs:
run_tests_torch_and_tf_gpu:
runs-on: self-hosted
runs-on: [self-hosted, single-gpu]
steps:
- uses: actions/checkout@v2
- name: Python version
@@ -25,6 +25,14 @@ jobs:
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v0-tests_tf_torch_gpu-${{ hashFiles('setup.py') }}
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
run: |
python -m venv .env
@@ -35,20 +43,72 @@ jobs:
- name: Install dependencies
run: |
source .env/bin/activate
pip install torch==1.4.0
pip install .[sklearn,testing]
pip install --upgrade pip
pip install torch!=1.6.0
pip install .[sklearn,testing,onnxruntime]
pip install git+https://github.com/huggingface/datasets
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
python -c "import torch; print(torch.cuda.is_available())"
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
- name: Run all non-slow tests on GPU
env:
TF_FORCE_GPU_ALLOW_GROWTH: "true"
# TF_GPU_MEMORY_LIMIT: 4096
OMP_NUM_THREADS: 1
USE_CUDA: yes
run: |
source .env/bin/activate
python -m pytest -n 2 --dist=loadfile -s -v ./tests/
python -m pytest -n 2 --dist=loadfile -s ./tests/
run_tests_torch_and_tf_multiple_gpu:
runs-on: [self-hosted, multi-gpu]
steps:
- uses: actions/checkout@v2
- name: Python version
run: |
which python
python --version
pip --version
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v0-tests_tf_torch_multiple_gpu-${{ hashFiles('setup.py') }}
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
- name: Install dependencies
run: |
source .env/bin/activate
pip install --upgrade pip
pip install torch!=1.6.0
pip install .[sklearn,testing,onnxruntime]
pip install git+https://github.com/huggingface/datasets
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
- name: Run all non-slow tests on GPU
env:
TF_FORCE_GPU_ALLOW_GROWTH: "true"
# TF_GPU_MEMORY_LIMIT: 4096
OMP_NUM_THREADS: 1
run: |
source .env/bin/activate
python -m pytest -n 2 --dist=loadfile -s ./tests/

View File

@@ -10,9 +10,17 @@ on:
jobs:
run_all_tests_torch_and_tf_gpu:
runs-on: self-hosted
runs-on: [self-hosted, single-gpu]
steps:
- uses: actions/checkout@v2
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v0-slow_tests_tf_torch_gpu-${{ hashFiles('setup.py') }}
- name: Python version
run: |
which python
@@ -22,6 +30,7 @@ jobs:
run: pwd
- run: nvidia-smi
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
if: steps.cache.outputs.cache-hit != 'true'
run: |
python -m venv .env
source .env/bin/activate
@@ -31,21 +40,94 @@ jobs:
- name: Install dependencies
run: |
source .env/bin/activate
pip install .[sklearn,tf,torch,testing]
pip install --upgrade pip
pip install torch!=1.6.0
pip install .[sklearn,testing,onnxruntime]
pip install git+https://github.com/huggingface/datasets
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
python -c "import torch; print(torch.cuda.is_available())"
python -c "import tensorflow as tf; print(tf.test.is_built_with_cuda(), tf.config.list_physical_devices('GPU'))"
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
- name: Run all tests on GPU
env:
TF_FORCE_GPU_ALLOW_GROWTH: "true"
OMP_NUM_THREADS: 1
RUN_SLOW: yes
USE_CUDA: yes
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s -v ./tests/
python -m pytest -n 1 --dist=loadfile -s ./tests/ --durations=0
- name: Run examples tests on GPU
env:
TF_FORCE_GPU_ALLOW_GROWTH: "true"
OMP_NUM_THREADS: 1
RUN_SLOW: yes
run: |
source .env/bin/activate
pip install -r examples/requirements.txt
python -m pytest -n 1 --dist=loadfile -s examples --durations=0
run_all_tests_torch_and_tf_multiple_gpu:
runs-on: [self-hosted, multi-gpu]
steps:
- uses: actions/checkout@v2
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v0-slow_tests_tf_torch_multi_gpu-${{ hashFiles('setup.py') }}
- name: Python version
run: |
which python
python --version
pip --version
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
if: steps.cache.outputs.cache-hit != 'true'
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
- name: Install dependencies
run: |
source .env/bin/activate
pip install --upgrade pip
pip install torch!=1.6.0
pip install .[sklearn,testing,onnxruntime]
pip install git+https://github.com/huggingface/datasets
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
- name: Run all tests on GPU
env:
TF_FORCE_GPU_ALLOW_GROWTH: "true"
OMP_NUM_THREADS: 1
RUN_SLOW: yes
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s ./tests/ --durations=0
- name: Run examples tests on GPU
env:
TF_FORCE_GPU_ALLOW_GROWTH: "true"
OMP_NUM_THREADS: 1
RUN_SLOW: yes
run: |
source .env/bin/activate
pip install -r examples/requirements.txt
python -m pytest -n 1 --dist=loadfile -s examples --durations=0

12
.gitignore vendored
View File

@@ -8,6 +8,13 @@ __pycache__/
# C extensions
*.so
# tests and logs
tests/fixtures/*
!tests/fixtures/sample_text_no_unicode.txt
logs/
lightning_logs/
lang_code_data/
# Distribution / packaging
.Python
build/
@@ -116,6 +123,7 @@ dmypy.json
.pyre/
# vscode
.vs
.vscode
# Pycharm
@@ -134,6 +142,7 @@ runs
/wandb
/examples/runs
/examples/**/*.args
/examples/rag/sweep
# data
/data
@@ -148,3 +157,6 @@ debug.env
#ctags
tags
# pre-commit
.pre-commit*

129
CODE_OF_CONDUCT.md Normal file
View File

@@ -0,0 +1,129 @@
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, religion, or sexual identity
and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
overall community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or
advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
feedback@huggingface.co.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series
of actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within
the community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
Community Impact Guidelines were inspired by [Mozilla's code of conduct
enforcement ladder](https://github.com/mozilla/diversity).
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see the FAQ at
https://www.contributor-covenant.org/faq. Translations are available at
https://www.contributor-covenant.org/translations.

View File

@@ -9,6 +9,9 @@ It also helps us if you spread the word: reference the library from blog posts
on the awesome projects it made possible, shout out on Twitter every time it has
helped you, or simply star the repo to say "thank you".
Whichever way you choose to contribute, please be mindful to respect our
[code of conduct](https://github.com/huggingface/transformers/blob/master/CODE_OF_CONDUCT.md).
## You can contribute in so many ways!
There are 4 ways you can contribute to transformers:
@@ -44,9 +47,16 @@ Did not find it? :( So we can act quickly on it, please follow these steps:
To get the OS and software versions automatically, you can run the following command:
```bash
python transformers-cli env
transformers-cli env
```
or from the root of the repository the following command:
```bash
python src/transformers/commands/transformers_cli.py env
```
### Do you want to implement a new model?
Awesome! Please provide the following information:
@@ -58,7 +68,8 @@ Awesome! Please provide the following information:
If you are willing to contribute the model yourself, let us know so we can best
guide you.
We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder.
We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them
in the [`templates`](https://github.com/huggingface/transformers/tree/master/templates) folder.
### Do you want a new feature (that is not a model)?
@@ -79,7 +90,9 @@ A world-class feature request addresses the following points:
If your issue is well written we're already 80% of the way there by the time you
post it.
We have added **templates** to guide you in the process of adding a new example script for training or testing the models in the library. You can find them in the [`templates`](./templates) folder.
We have added **templates** to guide you in the process of adding a new example script for training or testing the
models in the library. You can find them in the [`templates`](https://github.com/huggingface/transformers/tree/master/templates)
folder.
## Start contributing! (Pull Requests)
@@ -124,12 +137,18 @@ Follow these steps to start contributing:
it with `pip uninstall transformers` before reinstalling it in editable
mode with the `-e` flag.)
Right now, we need an unreleased version of `isort` to avoid a
[bug](https://github.com/timothycrosley/isort/pull/1000):
To run the full test suite, you might need the additional dependency on `datasets` which requires a separate source
install:
```bash
$ pip install -U git+git://github.com/timothycrosley/isort.git@e63ae06ec7d70b06df9e528357650281a3d3ec22#egg=isort
$ git clone https://github.com/huggingface/datasets
$ cd datasets
$ pip install -e .
```
If you have already cloned that repo, you might need to `git pull` to get the most recent changes in the `datasets`
library.
5. Develop the features on your branch.
As you work on the features, you should make sure that the test suite
@@ -139,6 +158,14 @@ Follow these steps to start contributing:
$ make test
```
Note, that this command uses `-n auto` pytest flag, therefore, it will start as many parallel `pytest` processes as the number of your computer's CPU-cores, and if you have lots of those and a few GPUs and not a great amount of RAM, it's likely to overload your computer. Therefore, to run the test suite, you may want to consider using this command instead:
```bash
$ python -m pytest -n 3 --dist=loadfile -s -v ./tests/
```
Adjust the value of `-n` to fit the load your hardware can support.
`transformers` relies on `black` and `isort` to format its source code
consistently. After you make changes, format them with:
@@ -146,12 +173,29 @@ Follow these steps to start contributing:
$ make style
```
`transformers` also uses `flake8` to check for coding mistakes. Quality
`transformers` also uses `flake8` and a few custom scripts to check for coding mistakes. Quality
control runs in CI, however you can also run the same checks with:
```bash
$ make quality
```
You can do the automatic style corrections and code verifications that can't be automated in one go:
```bash
$ make fixup
```
This target is also optimized to only work with files modified by the PR you're working on.
If you're modifying documents under `docs/source`, make sure to validate that
they can still be built. This check also runs in CI. To run a local check
make sure you have installed the documentation builder requirements, by
running `pip install .[tf,torch,docs]` once from the root of this repository
and then run:
```bash
$ make docs
```
Once you're happy with your changes, add changed files using `git add` and
make a commit with `git commit` to record your changes locally:
@@ -198,15 +242,22 @@ Follow these steps to start contributing:
are useful to avoid duplicated work, and to differentiate it from PRs ready
to be merged;
4. Make sure existing tests pass;
5. Add high-coverage tests. No quality test, no merge.
- If you are adding a new model, make sure that you use `ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)`, which triggers the common tests.
- If you are adding new `@slow` tests, make sure they pass using `RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`.
CircleCI does not run them.
6. All public methods must have informative docstrings;
5. Add high-coverage tests. No quality testing = no merge.
- If you are adding a new model, make sure that you use
`ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)`, which triggers the common tests.
- If you are adding new `@slow` tests, make sure they pass using
`RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`.
- If you are adding a new tokenizer, write tests, and make sure
`RUN_SLOW=1 python -m pytest tests/test_tokenization_{your_model_name}.py` passes.
CircleCI does not run the slow tests, but github actions does every night!
6. All public methods must have informative docstrings that work nicely with sphinx. See `modeling_ctrl.py` for an
example.
### Tests
You can run 🤗 Transformers tests with `unittest` or `pytest`.
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in
the [tests folder](https://github.com/huggingface/transformers/tree/master/tests) and examples tests in the
[examples folder](https://github.com/huggingface/transformers/tree/master/examples).
We like `pytest` and `pytest-xdist` because it's faster. From the root of the
repository, here's how to run tests with `pytest` for the library:
@@ -221,8 +272,7 @@ and for the examples:
$ pip install -r examples/requirements.txt # only needed the first time
$ python -m pytest -n auto --dist=loadfile -s -v ./examples/
```
In fact, that's how `make test` and `make test-examples` are implemented!
In fact, that's how `make test` and `make test-examples` are implemented (sans the `pip install` line)!
You can specify a smaller set of tests in order to test only the feature
you're working on.
@@ -253,7 +303,8 @@ $ python -m unittest discover -s examples -t examples -v
### Style guide
For documentation strings, `transformers` follows the [google
style](https://google.github.io/styleguide/pyguide.html).
For documentation strings, `transformers` follows the [google style](https://google.github.io/styleguide/pyguide.html).
Check our [documentation writing guide](https://github.com/huggingface/transformers/tree/master/docs#writing-documentation---specification)
for more information.
#### This guide was heavily inspired by the awesome [scikit-learn guide to contributing](https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md)

View File

@@ -1,17 +1,53 @@
.PHONY: quality style test test-examples
.PHONY: modified_only_fixup extra_quality_checks quality style fixup fix-copies test test-examples docs
check_dirs := examples templates tests src utils
# get modified files since the branch was made
fork_point_sha := $(shell git merge-base --fork-point master)
joined_dirs := $(shell echo $(check_dirs) | tr " " "|")
modified_py_files := $(shell git diff --name-only $(fork_point_sha) | egrep '^($(joined_dirs))' | egrep '\.py$$')
#$(info modified files are: $(modified_py_files))
modified_only_fixup:
@if [ -n "$(modified_py_files)" ]; then \
echo "Checking/fixing $(modified_py_files)"; \
black $(modified_py_files); \
isort $(modified_py_files); \
flake8 $(modified_py_files); \
else \
echo "No library .py files were modified"; \
fi
# Check that source code meets quality standards
quality:
black --check --line-length 119 --target-version py35 examples templates tests src utils
isort --check-only --recursive examples templates tests src utils
flake8 examples templates tests src utils
extra_quality_checks:
python utils/check_copies.py
python utils/check_dummies.py
python utils/check_repo.py
# Format source code automatically
# this target runs checks on all files
quality:
black --check $(check_dirs)
isort --check-only $(check_dirs)
flake8 $(check_dirs)
${MAKE} extra_quality_checks
# Format source code automatically and check is there are any problems left that need manual fixing
style:
black --line-length 119 --target-version py35 examples templates tests src utils
isort --recursive examples templates tests src utils
black $(check_dirs)
isort $(check_dirs)
# Super fast fix and check target that only works on relevant modified files since the branch was made
fixup: modified_only_fixup extra_quality_checks
# Make marked copies of snippets of codes conform to the original
fix-copies:
python utils/check_copies.py --fix_and_overwrite
python utils/check_dummies.py --fix_and_overwrite
# Run tests for the library
@@ -22,3 +58,8 @@ test:
test-examples:
python -m pytest -n auto --dist=loadfile -s -v ./examples/
# Check that docs can build
docs:
cd docs && make html SPHINXOPTS="-W"

796
README.md
View File

@@ -16,64 +16,134 @@
<a href="https://github.com/huggingface/transformers/releases">
<img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/transformers.svg">
</a>
<a href="https://github.com/huggingface/transformers/blob/master/CODE_OF_CONDUCT.md">
<img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg">
</a>
</p>
<h3 align="center">
<p>State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0
</h3>
🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides state-of-the-art general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet, T5, CTRL...) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over thousands of pretrained models in 100+ languages and deep interoperability between PyTorch & TensorFlow 2.0.
🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation, etc in 100+ languages. Its aim is to make cutting-edge NLP easier to use for everyone.
🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets then share them with the community on our [model hub](https://huggingface.co/models). At the same time, each python module defining an architecture can be used as a standalone and modified to enable quick research experiments.
🤗 Transformers is backed by the two most popular deep learning libraries, [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/), with a seamless integration between them, allowing you to train your models with one then load it for inference with the other.
### Recent contributors
[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/0)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/0)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/1)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/1)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/2)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/2)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/3)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/3)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/4)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/4)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/5)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/5)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/6)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/6)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/7)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/7)
### Features
- High performance on NLU and NLG tasks
- Low barrier to entry for educators and practitioners
## Online demos
State-of-the-art NLP for everyone
- Deep learning researchers
- Hands-on practitioners
- AI/ML/NLP teachers and educators
You can test most of our models directly on their pages from the [model hub](https://huggingface.co/models). We also offer an [inference API](https://huggingface.co/pricing) to use those models.
Lower compute costs, smaller carbon footprint
- Researchers can share trained models instead of always retraining
- Practitioners can reduce compute time and production costs
- Dozens of architectures with over 1,000 pretrained models, some in more than 100 languages
Here are a few examples:
- [Masked word completion with BERT](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France)
- [Name Entity Recognition with Electra](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city)
- [Text generation with GPT-2](https://huggingface.co/gpt2?text=A+long+time+ago%2C+)
- [Natural Langugage Inference with RoBERTa](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal)
- [Summarization with BART](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct)
- [Question answering with DistilBERT](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species)
- [Translation with T5](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin)
Choose the right framework for every part of a model's lifetime
- Train state-of-the-art models in 3 lines of code
- Deep interoperability between TensorFlow 2.0 and PyTorch models
- Move a single model between TF2.0/PyTorch frameworks at will
- Seamlessly pick the right framework for training, evaluation, production
**[Write With Transformer](https://transformer.huggingface.co)**, built by the Hugging Face team, is the official demo of this repos text generation capabilities.
## Quick tour
| Section | Description |
|-|-|
| [Installation](#installation) | How to install the package |
| [Model architectures](#model-architectures) | Architectures (with pretrained weights) |
| [Online demo](#online-demo) | Experimenting with this repos text generation capabilities |
| [Quick tour: Usage](#quick-tour) | Tokenizers & models usage: Bert and GPT-2 |
| [Quick tour: TF 2.0 and PyTorch ](#Quick-tour-TF-20-training-and-PyTorch-interoperability) | Train a TF 2.0 model in 10 lines of code, load it in PyTorch |
| [Quick tour: pipelines](#quick-tour-of-pipelines) | Using Pipelines: Wrapper around tokenizer and models to use finetuned models |
| [Quick tour: Fine-tuning/usage scripts](#quick-tour-of-the-fine-tuningusage-scripts) | Using provided scripts: GLUE, SQuAD and Text generation |
| [Quick tour: Share your models ](#Quick-tour-of-model-sharing) | Upload and share your fine-tuned models with the community |
| [Migrating from pytorch-transformers to transformers](#Migrating-from-pytorch-transformers-to-transformers) | Migrating your code from pytorch-transformers to transformers |
| [Migrating from pytorch-pretrained-bert to pytorch-transformers](#Migrating-from-pytorch-pretrained-bert-to-transformers) | Migrating your code from pytorch-pretrained-bert to transformers |
| [Documentation][(v2.5.0)](https://huggingface.co/transformers/v2.5.0)[(v2.4.0/v2.4.1)](https://huggingface.co/transformers/v2.4.0)[(v2.3.0)](https://huggingface.co/transformers/v2.3.0)[(v2.2.0/v2.2.1/v2.2.2)](https://huggingface.co/transformers/v2.2.0) [(v2.1.1)](https://huggingface.co/transformers/v2.1.1) [(v2.0.0)](https://huggingface.co/transformers/v2.0.0) [(v1.2.0)](https://huggingface.co/transformers/v1.2.0) [(v1.1.0)](https://huggingface.co/transformers/v1.1.0) [(v1.0.0)](https://huggingface.co/transformers/v1.0.0) [(master)](https://huggingface.co/transformers) | Full API documentation and more |
To immediately use a model on a given text, we provide the `pipeline` API. Pipelines group together a pretrained model with the preprocessing that was used during that model training. Here is how to quickly use a pipeline to classify positive versus negative texts
```python
>>> from transformers import pipeline
# Allocate a pipeline for sentiment-analysis
>>> classifier = pipeline('sentiment-analysis')
>>> classifier('We are very happy to include pipeline into the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9978193640708923}]
```
The second line of code downloads and caches the pretrained model used by the pipeline, the third line evaluates it on the given text. Here the answer is "positive" with a confidence of 99.8%.
This is another example of pipeline used for that can extract question answers from some context:
``` python
>>> from transformers import pipeline
# Allocate a pipeline for question-answering
>>> question_answerer = pipeline('question-answering')
>>> question_answerer({
... 'question': 'What is the name of the repository ?',
... 'context': 'Pipeline have been included in the huggingface/transformers repository'
... })
{'score': 0.5135612454720828, 'start': 35, 'end': 59, 'answer': 'huggingface/transformers'}
```
On top of the answer, the pretrained model used here returned its confidence score, along with the start position and its end position in the tokenized sentence. You can learn more about the tasks supported by the `pipeline` API in [this tutorial](https://huggingface.co/transformers/task_summary.html).
To download and use any of the pretrained models on your given task, you just need to use those three lines of codes (PyTorch version):
```python
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = AutoModel.from_pretrained("bert-base-uncased")
>>> inputs = tokenizer("Hello world!", return_tensors="pt")
>>> outputs = model(**inputs)
```
or for TensorFlow:
```python
>>> from transformers import AutoTokenizer, TFAutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = TFAutoModel.from_pretrained("bert-base-uncased")
>>> inputs = tokenizer("Hello world!", return_tensors="tf")
>>> outputs = model(**inputs)
```
The tokenizer is responsible for all the preprocessing the pretrained model expects, and can be called directly on one (or list) of texts (as we can see on the fourth line of both code examples). It will output a dictionary you can directly pass to your model (which is done on the fifth line).
The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) or a [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) (depending on your backend) which you can use normally. For instance, [this tutorial](https://huggingface.co/transformers/training.html) explains how to integrate such a model in classic PyTorch or TensorFlow training loop, or how to use our `Trainer` API to quickly fine-tune the on a new dataset.
## Why should I use transformers?
1. Easy-to-use state-of-the-art models:
- High performance on NLU and NLG tasks.
- Low barrier to entry for educators and practitioners.
- Few user-facing abstractions with just three classes to learn.
- A unified API for using all our pretrained models.
1. Lower compute costs, smaller carbon footprint:
- Researchers can share trained models instead of always retraining.
- Practitioners can reduce compute time and production costs.
- Dozens of architectures with over 2,000 pretrained models, some in more than 100 languages.
1. Choose the right framework for every part of a model's lifetime:
- Train state-of-the-art models in 3 lines of code.
- Move a single model between TF2.0/PyTorch frameworks at will.
- Seamlessly pick the right framework for training, evaluation, production.
1. Easily customize a model or an example to your needs:
- Examples for each architecture to reproduce the results by the official authors of said architecture.
- Expose the models internal as consistently as possible.
- Model files can be used independently of the library for quick experiments.
## Why shouldn't I use transformers?
- This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving in additional abstractions/files.
- The training API is not intended to work on any model but is optimized to work with the models provided by the library. For generic machine learning loops, you should use another library.
- While we strive to present as many use cases as possible, the scripts in our [examples folder](https://github.com/huggingface/transformers/tree/master/examples) are just that: examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs.
## Installation
This repo is tested on Python 3.6+, PyTorch 1.0.0+ and TensorFlow 2.0.
This repository is tested on Python 3.6+, PyTorch 1.0.0+ (PyTorch 1.3.1+ for [examples](https://github.com/huggingface/transformers/tree/master/examples)) and TensorFlow 2.0.
You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
Create a virtual environment with the version of Python you're going to use and activate it.
First, create a virtual environment with the version of Python you're going to use and activate it.
Now, if you want to use 🤗 Transformers, you can install it with pip. If you'd like to play with the examples, you must install it from source.
### With pip
First you need to install one of, or both, TensorFlow 2.0 and PyTorch.
Then, you will need to install one of, or both, TensorFlow 2.0 and PyTorch.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available) and/or [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform.
When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows:
@@ -82,612 +152,72 @@ When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be
pip install transformers
```
### From source
If you'd like to play with the examples, you must [install the library from source](https://huggingface.co/transformers/installation.html#installing-from-source).
Here also, you first need to install one of, or both, TensorFlow 2.0 and PyTorch.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available) and/or [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform.
## Models architectures
When TensorFlow 2.0 and/or PyTorch has been installed, you can install from source by cloning the repository and running:
```bash
git clone https://github.com/huggingface/transformers
cd transformers
pip install .
```
When you update the repository, you should upgrade the transformers installation and its dependencies as follows:
```bash
git pull
pip install --upgrade .
```
### Run the examples
Examples are included in the repository but are not shipped with the library.
Therefore, in order to run the latest versions of the examples, you need to install from source, as described above.
Look at the [README](https://github.com/huggingface/transformers/blob/master/examples/README.md) for how to run examples.
### Tests
A series of tests are included for the library and for some example scripts. Library tests can be found in the [tests folder](https://github.com/huggingface/transformers/tree/master/tests) and examples tests in the [examples folder](https://github.com/huggingface/transformers/tree/master/examples).
Depending on which framework is installed (TensorFlow 2.0 and/or PyTorch), the irrelevant tests will be skipped. Ensure that both frameworks are installed if you want to execute all tests.
Here's the easiest way to run tests for the library:
```bash
pip install -e ".[testing]"
make test
```
and for the examples:
```bash
pip install -e ".[testing]"
pip install -r examples/requirements.txt
make test-examples
```
For details, refer to the [contributing guide](https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md#tests).
### Do you want to run a Transformer model on a mobile device?
You should check out our [`swift-coreml-transformers`](https://github.com/huggingface/swift-coreml-transformers) repo.
It contains a set of tools to convert PyTorch or TensorFlow 2.0 trained Transformer models (currently contains `GPT-2`, `DistilGPT-2`, `BERT`, and `DistilBERT`) to CoreML models that run on iOS devices.
At some point in the future, you'll be able to seamlessly move from pre-training or fine-tuning models to productizing them in CoreML, or prototype a model or an app in CoreML then research its hyperparameters or architecture from TensorFlow 2.0 and/or PyTorch. Super exciting!
## Model architectures
🤗 Transformers currently provides the following NLU/NLG architectures:
🤗 Transformers currently provides the following architectures (see [here](https://huggingface.co/transformers/model_summary.html) for a high-level summary of each them):
1. **[ALBERT](https://huggingface.co/transformers/model_doc/albert.html)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
1. **[BART](https://huggingface.co/transformers/model_doc/bart.html)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
1. **[BERT](https://huggingface.co/transformers/model_doc/bert.html)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
2. **[GPT](https://huggingface.co/transformers/model_doc/gpt.html)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
3. **[GPT-2](https://huggingface.co/transformers/model_doc/gpt2.html)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
4. **[Transformer-XL](https://huggingface.co/transformers/model_doc/transformerxl.html)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
5. **[XLNet](https://huggingface.co/transformers/model_doc/xlnet.html)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
6. **[XLM](https://huggingface.co/transformers/model_doc/xlm.html)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
7. **[RoBERTa](https://huggingface.co/transformers/model_doc/roberta.html)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
8. **[DistilBERT](https://huggingface.co/transformers/model_doc/distilbert.html)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
9. **[CTRL](https://huggingface.co/transformers/model_doc/ctrl.html)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
10. **[CamemBERT](https://huggingface.co/transformers/model_doc/camembert.html)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
11. **[ALBERT](https://huggingface.co/transformers/model_doc/albert.html)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
12. **[T5](https://huggingface.co/transformers/model_doc/t5.html)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
13. **[XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
14. **[MMBT](https://github.com/facebookresearch/mmbt/)** (from Facebook), released together with the paper a [Supervised Multimodal Bitransformers for Classifying Images and Text](https://arxiv.org/pdf/1909.02950.pdf) by Douwe Kiela, Suvrat Bhooshan, Hamed Firooz, Davide Testuggine.
15. **[FlauBERT](https://huggingface.co/transformers/model_doc/flaubert.html)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
16. **[BART](https://huggingface.co/transformers/model_doc/bart.html)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
17. **[ELECTRA](https://huggingface.co/transformers/model_doc/electra.html)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
18. **[DialoGPT](https://huggingface.co/transformers/model_doc/dialogpt.html)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
19. **[Reformer](https://huggingface.co/transformers/model_doc/reformer.html)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
20. **[MarianMT](https://huggingface.co/transformers/model_doc/marian.html)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
21. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
22. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Peason R coefficient on STS-B for XLNet). You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
## Online demo
**[Write With Transformer](https://transformer.huggingface.co)**, built by the Hugging Face team at transformer.huggingface.co, is the official demo of this repos text generation capabilities.
You can use it to experiment with completions generated by `GPT2Model`, `TransfoXLModel`, and `XLNetModel`.
> “🦄 Write with transformer is to writing what calculators are to calculus.”
![write_with_transformer](https://transformer.huggingface.co/front/assets/thumbnail-large.png)
## Quick tour
Let's do a very quick overview of the model architectures in 🤗 Transformers. Detailed examples for each model architecture (Bert, GPT, GPT-2, Transformer-XL, XLNet and XLM) can be found in the [full documentation](https://huggingface.co/transformers/).
```python
import torch
from transformers import *
# Transformers has a unified API
# for 10 transformer architectures and 30 pretrained weights.
# Model | Tokenizer | Pretrained weights shortcut
MODELS = [(BertModel, BertTokenizer, 'bert-base-uncased'),
(OpenAIGPTModel, OpenAIGPTTokenizer, 'openai-gpt'),
(GPT2Model, GPT2Tokenizer, 'gpt2'),
(CTRLModel, CTRLTokenizer, 'ctrl'),
(TransfoXLModel, TransfoXLTokenizer, 'transfo-xl-wt103'),
(XLNetModel, XLNetTokenizer, 'xlnet-base-cased'),
(XLMModel, XLMTokenizer, 'xlm-mlm-enfr-1024'),
(DistilBertModel, DistilBertTokenizer, 'distilbert-base-cased'),
(RobertaModel, RobertaTokenizer, 'roberta-base'),
(XLMRobertaModel, XLMRobertaTokenizer, 'xlm-roberta-base'),
]
# To use TensorFlow 2.0 versions of the models, simply prefix the class names with 'TF', e.g. `TFRobertaModel` is the TF 2.0 counterpart of the PyTorch model `RobertaModel`
# Let's encode some text in a sequence of hidden-states using each model:
for model_class, tokenizer_class, pretrained_weights in MODELS:
# Load pretrained model/tokenizer
tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
model = model_class.from_pretrained(pretrained_weights)
# Encode text
input_ids = torch.tensor([tokenizer.encode("Here is some text to encode", add_special_tokens=True)]) # Add special tokens takes care of adding [CLS], [SEP], <s>... tokens in the right way for each model.
with torch.no_grad():
last_hidden_states = model(input_ids)[0] # Models outputs are now tuples
# Each architecture is provided with several class for fine-tuning on down-stream tasks, e.g.
BERT_MODEL_CLASSES = [BertModel, BertForPreTraining, BertForMaskedLM, BertForNextSentencePrediction,
BertForSequenceClassification, BertForTokenClassification, BertForQuestionAnswering]
# All the classes for an architecture can be initiated from pretrained weights for this architecture
# Note that additional weights added for fine-tuning are only initialized
# and need to be trained on the down-stream task
pretrained_weights = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(pretrained_weights)
for model_class in BERT_MODEL_CLASSES:
# Load pretrained model/tokenizer
model = model_class.from_pretrained(pretrained_weights)
# Models can return full list of hidden-states & attentions weights at each layer
model = model_class.from_pretrained(pretrained_weights,
output_hidden_states=True,
output_attentions=True)
input_ids = torch.tensor([tokenizer.encode("Let's see all hidden-states and attentions on this text")])
all_hidden_states, all_attentions = model(input_ids)[-2:]
# Models are compatible with Torchscript
model = model_class.from_pretrained(pretrained_weights, torchscript=True)
traced_model = torch.jit.trace(model, (input_ids,))
# Simple serialization for models and tokenizers
model.save_pretrained('./directory/to/save/') # save
model = model_class.from_pretrained('./directory/to/save/') # re-load
tokenizer.save_pretrained('./directory/to/save/') # save
tokenizer = BertTokenizer.from_pretrained('./directory/to/save/') # re-load
# SOTA examples for GLUE, SQUAD, text generation...
```
## Quick tour TF 2.0 training and PyTorch interoperability
Let's do a quick example of how a TensorFlow 2.0 model can be trained in 12 lines of code with 🤗 Transformers and then loaded in PyTorch for fast inspection/tests.
```python
import tensorflow as tf
import tensorflow_datasets
from transformers import *
# Load dataset, tokenizer, model from pretrained model/vocabulary
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
model = TFBertForSequenceClassification.from_pretrained('bert-base-cased')
data = tensorflow_datasets.load('glue/mrpc')
# Prepare dataset for GLUE as a tf.data.Dataset instance
train_dataset = glue_convert_examples_to_features(data['train'], tokenizer, max_length=128, task='mrpc')
valid_dataset = glue_convert_examples_to_features(data['validation'], tokenizer, max_length=128, task='mrpc')
train_dataset = train_dataset.shuffle(100).batch(32).repeat(2)
valid_dataset = valid_dataset.batch(64)
# Prepare training: Compile tf.keras model with optimizer, loss and learning rate schedule
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])
# Train and evaluate using tf.keras.Model.fit()
history = model.fit(train_dataset, epochs=2, steps_per_epoch=115,
validation_data=valid_dataset, validation_steps=7)
# Load the TensorFlow model in PyTorch for inspection
model.save_pretrained('./save/')
pytorch_model = BertForSequenceClassification.from_pretrained('./save/', from_tf=True)
# Quickly test a few predictions - MRPC is a paraphrasing task, let's see if our model learned the task
sentence_0 = "This research was consistent with his findings."
sentence_1 = "His findings were compatible with this research."
sentence_2 = "His findings were not compatible with this research."
inputs_1 = tokenizer.encode_plus(sentence_0, sentence_1, add_special_tokens=True, return_tensors='pt')
inputs_2 = tokenizer.encode_plus(sentence_0, sentence_2, add_special_tokens=True, return_tensors='pt')
pred_1 = pytorch_model(inputs_1['input_ids'], token_type_ids=inputs_1['token_type_ids'])[0].argmax().item()
pred_2 = pytorch_model(inputs_2['input_ids'], token_type_ids=inputs_2['token_type_ids'])[0].argmax().item()
print("sentence_1 is", "a paraphrase" if pred_1 else "not a paraphrase", "of sentence_0")
print("sentence_2 is", "a paraphrase" if pred_2 else "not a paraphrase", "of sentence_0")
```
## Quick tour of the fine-tuning/usage scripts
**Important**
Before running the fine-tuning scripts, please read the
[instructions](#run-the-examples) on how to
setup your environment to run the examples.
The library comprises several example scripts with SOTA performances for NLU and NLG tasks:
- `run_glue.py`: an example fine-tuning sequence classification models on nine different GLUE tasks (*sequence-level classification*)
- `run_squad.py`: an example fine-tuning question answering models on the question answering dataset SQuAD 2.0 (*token-level classification*)
- `run_ner.py`: an example fine-tuning token classification models on named entity recognition (*token-level classification*)
- `run_generation.py`: an example using GPT, GPT-2, CTRL, Transformer-XL and XLNet for conditional language generation
- other model-specific examples (see the documentation).
Here are three quick usage examples for these scripts:
### `run_glue.py`: Fine-tuning on GLUE tasks for sequence classification
The [General Language Understanding Evaluation (GLUE) benchmark](https://gluebenchmark.com/) is a collection of nine sentence- or sentence-pair language understanding tasks for evaluating and analyzing natural language understanding systems.
Before running any of these GLUE tasks you should download the
[GLUE data](https://gluebenchmark.com/tasks) by running
[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
and unpack it to some directory `$GLUE_DIR`.
You should also install the additional packages required by the examples:
```shell
pip install -r ./examples/requirements.txt
```
```shell
export GLUE_DIR=/path/to/glue
export TASK_NAME=MRPC
python ./examples/text-classification/run_glue.py \
--model_name_or_path bert-base-uncased \
--task_name $TASK_NAME \
--do_train \
--do_eval \
--data_dir $GLUE_DIR/$TASK_NAME \
--max_seq_length 128 \
--per_gpu_eval_batch_size=8 \
--per_gpu_train_batch_size=8 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir /tmp/$TASK_NAME/
```
where task name can be one of CoLA, SST-2, MRPC, STS-B, QQP, MNLI, QNLI, RTE, WNLI.
The dev set results will be present within the text file 'eval_results.txt' in the specified output_dir. In case of MNLI, since there are two separate dev sets, matched and mismatched, there will be a separate output folder called '/tmp/MNLI-MM/' in addition to '/tmp/MNLI/'.
#### Fine-tuning XLNet model on the STS-B regression task
This example code fine-tunes XLNet on the STS-B corpus using parallel training on a server with 4 V100 GPUs.
Parallel training is a simple way to use several GPUs (but is slower and less flexible than distributed training, see below).
```shell
export GLUE_DIR=/path/to/glue
python ./examples/text-classification/run_glue.py \
--model_name_or_path xlnet-large-cased \
--do_train \
--do_eval \
--task_name=sts-b \
--data_dir=${GLUE_DIR}/STS-B \
--output_dir=./proc_data/sts-b-110 \
--max_seq_length=128 \
--per_gpu_eval_batch_size=8 \
--per_gpu_train_batch_size=8 \
--gradient_accumulation_steps=1 \
--max_steps=1200 \
--model_name=xlnet-large-cased \
--overwrite_output_dir \
--overwrite_cache \
--warmup_steps=120
```
On this machine we thus have a batch size of 32, please increase `gradient_accumulation_steps` to reach the same batch size if you have a smaller machine. These hyper-parameters should result in a Pearson correlation coefficient of `+0.917` on the development set.
#### Fine-tuning Bert model on the MRPC classification task
This example code fine-tunes the Bert Whole Word Masking model on the Microsoft Research Paraphrase Corpus (MRPC) corpus using distributed training on 8 V100 GPUs to reach a F1 > 92.
```bash
python -m torch.distributed.launch --nproc_per_node 8 ./examples/text-classification/run_glue.py \
--model_name_or_path bert-large-uncased-whole-word-masking \
--task_name MRPC \
--do_train \
--do_eval \
--data_dir $GLUE_DIR/MRPC/ \
--max_seq_length 128 \
--per_gpu_eval_batch_size=8 \
--per_gpu_train_batch_size=8 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir /tmp/mrpc_output/ \
--overwrite_output_dir \
--overwrite_cache \
```
Training with these hyper-parameters gave us the following results:
```bash
acc = 0.8823529411764706
acc_and_f1 = 0.901702786377709
eval_loss = 0.3418912578906332
f1 = 0.9210526315789473
global_step = 174
loss = 0.07231863956341798
```
### `run_squad.py`: Fine-tuning on SQuAD for question-answering
This example code fine-tunes BERT on the SQuAD dataset using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD:
```bash
python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_squad.py \
--model_type bert \
--model_name_or_path bert-large-uncased-whole-word-masking \
--do_train \
--do_eval \
--train_file $SQUAD_DIR/train-v1.1.json \
--predict_file $SQUAD_DIR/dev-v1.1.json \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ../models/wwm_uncased_finetuned_squad/ \
--per_gpu_eval_batch_size=3 \
--per_gpu_train_batch_size=3 \
```
Training with these hyper-parameters gave us the following results:
```bash
python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ../models/wwm_uncased_finetuned_squad/predictions.json
{"exact_match": 86.91579943235573, "f1": 93.1532499015869}
```
This is the model provided as `bert-large-uncased-whole-word-masking-finetuned-squad`.
### `run_generation.py`: Text generation with GPT, GPT-2, CTRL, Transformer-XL and XLNet
A conditional generation script is also included to generate text from a prompt.
The generation script includes the [tricks](https://github.com/rusiaaman/XLNet-gen#methodology) proposed by Aman Rusia to get high-quality generation with memory models like Transformer-XL and XLNet (include a predefined text to make short inputs longer).
Here is how to run the script with the small version of OpenAI GPT-2 model:
```shell
python ./examples/text-generation/run_generation.py \
--model_type=gpt2 \
--length=20 \
--model_name_or_path=gpt2 \
```
and from the Salesforce CTRL model:
```shell
python ./examples/text-generation/run_generation.py \
--model_type=ctrl \
--length=20 \
--model_name_or_path=ctrl \
--temperature=0 \
--repetition_penalty=1.2 \
```
## Quick tour of model sharing
Starting with `v2.2.2`, you can now upload and share your fine-tuned models with the community, using the <abbr title="Command-line interface">CLI</abbr> that's built-in to the library.
**First, create an account on [https://huggingface.co/join](https://huggingface.co/join)**. Optionally, join an existing organization or create a new one. Then:
```shell
transformers-cli login
# log in using the same credentials as on huggingface.co
```
Upload your model:
```shell
transformers-cli upload ./path/to/pretrained_model/
# ^^ Upload folder containing weights/tokenizer/config
# saved via `.save_pretrained()`
transformers-cli upload ./config.json [--filename folder/foobar.json]
# ^^ Upload a single file
# (you can optionally override its filename, which can be nested inside a folder)
```
If you want your model to be namespaced by your organization name rather than your username, add the following flag to any command:
```shell
--organization organization_name
```
Your model will then be accessible through its identifier, a concatenation of your username (or organization name) and the folder name above:
```python
"username/pretrained_model"
# or if an org:
"organization_name/pretrained_model"
```
**Please add a README.md model card** to the repo under `model_cards/` with: model description, training params (dataset, preprocessing, hardware used, hyperparameters), evaluation results, intended uses & limitations, etc.
Your model now has a page on huggingface.co/models 🔥
Anyone can load it from code:
```python
tokenizer = AutoTokenizer.from_pretrained("namespace/pretrained_model")
model = AutoModel.from_pretrained("namespace/pretrained_model")
```
List all your files on S3:
```shell
transformers-cli s3 ls
```
You can also delete unneeded files:
```shell
transformers-cli s3 rm …
```
## Quick tour of pipelines
New in version `v2.3`: `Pipeline` are high-level objects which automatically handle tokenization, running your data through a transformers model
and outputting the result in a structured object.
You can create `Pipeline` objects for the following down-stream tasks:
- `feature-extraction`: Generates a tensor representation for the input sequence
- `ner`: Generates named entity mapping for each word in the input sequence.
- `sentiment-analysis`: Gives the polarity (positive / negative) of the whole input sequence.
- `text-classification`: Initialize a `TextClassificationPipeline` directly, or see `sentiment-analysis` for an example.
- `question-answering`: Provided some context and a question refering to the context, it will extract the answer to the question in the context.
- `fill-mask`: Takes an input sequence containing a masked token (e.g. `<mask>`) and return list of most probable filled sequences, with their probabilities.
- `summarization`
- `translation_xx_to_yy`
```python
from transformers import pipeline
# Allocate a pipeline for sentiment-analysis
nlp = pipeline('sentiment-analysis')
nlp('We are very happy to include pipeline into the transformers repository.')
>>> {'label': 'POSITIVE', 'score': 0.99893874}
# Allocate a pipeline for question-answering
nlp = pipeline('question-answering')
nlp({
'question': 'What is the name of the repository ?',
'context': 'Pipeline have been included in the huggingface/transformers repository'
})
>>> {'score': 0.28756016668193496, 'start': 35, 'end': 59, 'answer': 'huggingface/transformers'}
```
## Migrating from pytorch-transformers to transformers
Here is a quick summary of what you should take care of when migrating from `pytorch-transformers` to `transformers`.
### Positional order of some models' keywords inputs (`attention_mask`, `token_type_ids`...) changed
To be able to use Torchscript (see #1010, #1204 and #1195) the specific order of some models **keywords inputs** (`attention_mask`, `token_type_ids`...) has been changed.
If you used to call the models with keyword names for keyword arguments, e.g. `model(inputs_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)`, this should not cause any change.
If you used to call the models with positional inputs for keyword arguments, e.g. `model(inputs_ids, attention_mask, token_type_ids)`, you may have to double check the exact order of input arguments.
## Migrating from pytorch-pretrained-bert to transformers
Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to `transformers`.
### Models always output `tuples`
The main breaking change when migrating from `pytorch-pretrained-bert` to `transformers` is that every model's forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.
The exact content of the tuples for each model is detailed in the models' docstrings and the [documentation](https://huggingface.co/transformers/).
In pretty much every case, you will be fine by taking the first element of the output as the output you previously used in `pytorch-pretrained-bert`.
Here is a `pytorch-pretrained-bert` to `transformers` conversion example for a `BertForSequenceClassification` classification model:
```python
# Let's load our model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
# If you used to have this line in pytorch-pretrained-bert:
loss = model(input_ids, labels=labels)
# Now just use this line in transformers to extract the loss from the output tuple:
outputs = model(input_ids, labels=labels)
loss = outputs[0]
# In transformers you can also have access to the logits:
loss, logits = outputs[:2]
# And even the attention weights if you configure the model to output them (and other outputs too, see the docstrings and documentation)
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', output_attentions=True)
outputs = model(input_ids, labels=labels)
loss, logits, attentions = outputs
```
### Using hidden states
By enabling the configuration option `output_hidden_states`, it was possible to retrieve the last hidden states of the encoder. In `pytorch-transformers` as well as `transformers` the return value has changed slightly: `all_hidden_states` now also includes the hidden state of the embeddings in addition to those of the encoding layers. This allows users to easily access the embeddings final state.
### Serialization
Breaking change in the `from_pretrained()` method:
1. Models are now set in evaluation mode by default when instantiated with the `from_pretrained()` method. To train them, don't forget to set them back in training mode (`model.train()`) to activate the dropout modules.
2. The additional `*input` and `**kwargs` arguments supplied to the `from_pretrained()` method used to be directly passed to the underlying model's class `__init__()` method. They are now used to update the model configuration attribute instead, which can break derived model classes built based on the previous `BertForSequenceClassification` examples. We are working on a way to mitigate this breaking change in [#866](https://github.com/huggingface/transformers/pull/866) by forwarding the the model's `__init__()` method (i) the provided positional arguments and (ii) the keyword arguments which do not match any configuration class attributes.
Also, while not a breaking change, the serialization methods have been standardized and you probably should switch to the new method `save_pretrained(save_directory)` if you were using any other serialization method before.
Here is an example:
```python
### Let's load a model and tokenizer
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
### Do some stuff to our model and tokenizer
# Ex: add new tokens to the vocabulary and embeddings of our model
tokenizer.add_tokens(['[SPECIAL_TOKEN_1]', '[SPECIAL_TOKEN_2]'])
model.resize_token_embeddings(len(tokenizer))
# Train our model
train(model)
### Now let's save our model and tokenizer to a directory
model.save_pretrained('./my_saved_model_directory/')
tokenizer.save_pretrained('./my_saved_model_directory/')
### Reload the model and the tokenizer
model = BertForSequenceClassification.from_pretrained('./my_saved_model_directory/')
tokenizer = BertTokenizer.from_pretrained('./my_saved_model_directory/')
```
### Optimizers: BertAdam & OpenAIAdam are now AdamW, schedules are standard PyTorch schedules
The two optimizers previously included, `BertAdam` and `OpenAIAdam`, have been replaced by a single `AdamW` optimizer which has a few differences:
- it only implements weights decay correction,
- schedules are now externals (see below),
- gradient clipping is now also external (see below).
The new optimizer `AdamW` matches PyTorch `Adam` optimizer API and let you use standard PyTorch or apex methods for the schedule and clipping.
The schedules are now standard [PyTorch learning rate schedulers](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate) and not part of the optimizer anymore.
Here is a conversion examples from `BertAdam` with a linear warmup and decay schedule to `AdamW` and the same schedule:
```python
# Parameters:
lr = 1e-3
max_grad_norm = 1.0
num_training_steps = 1000
num_warmup_steps = 100
warmup_proportion = float(num_warmup_steps) / float(num_training_steps) # 0.1
### Previously BertAdam optimizer was instantiated like this:
optimizer = BertAdam(model.parameters(), lr=lr, schedule='warmup_linear', warmup=warmup_proportion, t_total=num_training_steps)
### and used like this:
for batch in train_data:
loss = model(batch)
loss.backward()
optimizer.step()
### In Transformers, optimizer and schedules are splitted and instantiated like this:
optimizer = AdamW(model.parameters(), lr=lr, correct_bias=False) # To reproduce BertAdam specific behavior set correct_bias=False
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=num_warmup_steps, num_training_steps=num_training_steps) # PyTorch scheduler
### and used like this:
for batch in train_data:
model.train()
loss = model(batch)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm) # Gradient clipping is not in AdamW anymore (so you can use amp without issue)
optimizer.step()
scheduler.step()
optimizer.zero_grad()
```
1. **[BERT For Sequence Generation](https://huggingface.co/transformers/model_doc/bertgeneration.html)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[Blenderbot](https://huggingface.co/transformers/model_doc/blenderbot.html)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[CamemBERT](https://huggingface.co/transformers/model_doc/camembert.html)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CTRL](https://huggingface.co/transformers/model_doc/ctrl.html)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[DeBERTa](https://huggingface.co/transformers/model_doc/deberta.html)** (from Microsoft Research) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DialoGPT](https://huggingface.co/transformers/model_doc/dialogpt.html)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](https://huggingface.co/transformers/model_doc/distilbert.html)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
1. **[DPR](https://huggingface.co/transformers/model_doc/dpr.html)** (from Facebook) released with the paper [Dense Passage Retrieval
for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon
Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[ELECTRA](https://huggingface.co/transformers/model_doc/electra.html)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[FlauBERT](https://huggingface.co/transformers/model_doc/flaubert.html)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[Funnel Transformer](https://huggingface.co/transformers/model_doc/funnel.html)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GPT](https://huggingface.co/transformers/model_doc/gpt.html)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT-2](https://huggingface.co/transformers/model_doc/gpt2.html)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[Longformer](https://huggingface.co/transformers/model_doc/longformer.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LXMERT](https://huggingface.co/transformers/model_doc/lxmert.html)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[MarianMT](https://huggingface.co/transformers/model_doc/marian.html)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MBart](https://huggingface.co/transformers/model_doc/mbart.html)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[Pegasus](https://huggingface.co/transformers/model_doc/pegasus.html)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[ProphetNet](https://huggingface.co/transformers/model_doc/prophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[Reformer](https://huggingface.co/transformers/model_doc/reformer.html)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RoBERTa](https://huggingface.co/transformers/model_doc/roberta.html)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
ultilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
1. **[SqueezeBert](https://huggingface.co/transformers/model_doc/squeezebert.html)** released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[T5](https://huggingface.co/transformers/model_doc/t5.html)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[Transformer-XL](https://huggingface.co/transformers/model_doc/transformerxl.html)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[XLM](https://huggingface.co/transformers/model_doc/xlm.html)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](https://huggingface.co/transformers/model_doc/xlmprophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
1. **[XLNet](https://huggingface.co/transformers/model_doc/xlnet.html)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
1. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations. You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
## Learn more
| Section | Description |
|-|-|
| [Documentation](https://huggingface.co/transformers/) | Full API documentation and tutorials |
| [Task summary](https://huggingface.co/transformers/task_summary.html) | Tasks supported by 🤗 Transformers |
| [Preprocessing tutorial](https://huggingface.co/transformers/preprocessing.html) | Using the `Tokenizer` class to prepare data for the models |
| [Training and fine-tuning](https://huggingface.co/transformers/training.html) | Using the models provided by 🤗 Transformers in a PyTorch/TensorFlow training loop and the `Trainer` API |
| [Quick tour: Fine-tuning/usage scripts](https://github.com/huggingface/transformers/tree/master/examples) | Example scripts for fine-tuning models on a wide range of tasks |
| [Model sharing and uploading](https://huggingface.co/transformers/model_sharing.html) | Upload and share your fine-tuned models with the community |
| [Migration](https://huggingface.co/transformers/migration.html) | Migrate to 🤗 Transformers from `pytorch-transformers` or `pytorch-pretrained-bert` |
## Citation
We now have a paper you can cite for the 🤗 Transformers library:
We now have a [paper](https://arxiv.org/abs/1910.03771) you can cite for the 🤗 Transformers library:
```bibtex
@article{Wolf2019HuggingFacesTS,
title={HuggingFace's Transformers: State-of-the-art Natural Language Processing},
author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and R'emi Louf and Morgan Funtowicz and Jamie Brew},
author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush},
journal={ArXiv},
year={2019},
volume={abs/1910.03771}

7
codecov.yml Normal file
View File

@@ -0,0 +1,7 @@
coverage:
status:
project:
default:
informational: true
patch: off
comment: false

View File

@@ -1,23 +0,0 @@
cd docs
function deploy_doc(){
echo "Creating doc at commit $1 and pushing to folder $2"
git checkout $1
if [ ! -z "$2" ]
then
echo "Pushing version" $2
make clean && make html && scp -r -oStrictHostKeyChecking=no _build/html $doc:$dir/$2
else
echo "Pushing master"
make clean && make html && scp -r -oStrictHostKeyChecking=no _build/html/* $doc:$dir
fi
}
deploy_doc "master"
deploy_doc "b33a385" v1.0.0
deploy_doc "fe02e45" v1.1.0
deploy_doc "89fd345" v1.2.0
deploy_doc "fc9faa8" v2.0.0
deploy_doc "3ddce1d" v2.1.1
deploy_doc "f2f3294" v2.2.0
deploy_doc "d0f8b9a" v2.3.0

View File

@@ -1,4 +1,4 @@
FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
LABEL maintainer="Hugging Face"
LABEL repository="transformers"
@@ -18,9 +18,14 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
tensorflow \
torch
RUN git clone https://github.com/NVIDIA/apex
RUN cd apex && \
python3 setup.py install && \
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
WORKDIR /workspace
COPY . transformers/
RUN cd transformers/ && \
python3 -m pip install --no-cache-dir .
CMD ["/bin/bash"]
CMD ["/bin/bash"]

View File

@@ -1,4 +1,4 @@
FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
LABEL maintainer="Hugging Face"
LABEL repository="transformers"
@@ -17,9 +17,14 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
mkl \
torch
RUN git clone https://github.com/NVIDIA/apex
RUN cd apex && \
python3 setup.py install && \
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
WORKDIR /workspace
COPY . transformers/
RUN cd transformers/ && \
python3 -m pip install --no-cache-dir .
CMD ["/bin/bash"]
CMD ["/bin/bash"]

View File

@@ -0,0 +1,65 @@
FROM google/cloud-sdk:slim
# Build args.
ARG GITHUB_REF=refs/heads/master
# TODO: This Dockerfile installs pytorch/xla 3.6 wheels. There are also 3.7
# wheels available; see below.
ENV PYTHON_VERSION=3.6
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cmake \
git \
curl \
ca-certificates
# Install conda and python.
# NOTE new Conda does not forward the exit status... https://github.com/conda/conda/issues/8385
RUN curl -o ~/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-4.7.12-Linux-x86_64.sh && \
chmod +x ~/miniconda.sh && \
~/miniconda.sh -b && \
rm ~/miniconda.sh
ENV PATH=/root/miniconda3/bin:$PATH
RUN conda create -y --name container python=$PYTHON_VERSION
# Run the rest of commands within the new conda env.
# Use absolute path to appease Codefactor.
SHELL ["/root/miniconda3/bin/conda", "run", "-n", "container", "/bin/bash", "-c"]
RUN conda install -y python=$PYTHON_VERSION mkl
RUN pip uninstall -y torch && \
# Python 3.7 wheels are available. Replace cp36-cp36m with cp37-cp37m
gsutil cp 'gs://tpu-pytorch/wheels/torch-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' . && \
gsutil cp 'gs://tpu-pytorch/wheels/torch_xla-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' . && \
gsutil cp 'gs://tpu-pytorch/wheels/torchvision-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' . && \
pip install 'torch-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' && \
pip install 'torch_xla-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' && \
pip install 'torchvision-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' && \
rm 'torch-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' && \
rm 'torch_xla-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' && \
rm 'torchvision-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' && \
apt-get install -y libomp5
ENV LD_LIBRARY_PATH=root/miniconda3/envs/container/lib
# Install huggingface/transformers at the current PR, plus dependencies.
RUN git clone https://github.com/huggingface/transformers.git && \
cd transformers && \
git fetch origin $GITHUB_REF:CI && \
git checkout CI && \
cd .. && \
pip install ./transformers && \
pip install -r ./transformers/examples/requirements.txt && \
pip install pytest
RUN python -c "import torch_xla; print(torch_xla.__version__)"
RUN python -c "import transformers as trf; print(trf.__version__)"
RUN conda init bash
COPY docker-entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]
CMD ["bash"]

View File

@@ -0,0 +1,38 @@
local base = import 'templates/base.libsonnet';
local tpus = import 'templates/tpus.libsonnet';
local utils = import "templates/utils.libsonnet";
local volumes = import "templates/volumes.libsonnet";
local bertBaseCased = base.BaseTest {
frameworkPrefix: "hf",
modelName: "bert-base-cased",
mode: "example",
configMaps: [],
timeout: 3600, # 1 hour, in seconds
image: std.extVar('image'),
imageTag: std.extVar('image-tag'),
tpuSettings+: {
softwareVersion: "pytorch-nightly",
},
accelerator: tpus.v3_8,
volumeMap+: {
datasets: volumes.PersistentVolumeSpec {
name: "huggingface-cluster-disk",
mountPath: "/datasets",
},
},
command: utils.scriptCommand(
|||
python -m pytest -s transformers/examples/test_xla_examples.py -v
test_exit_code=$?
echo "\nFinished running commands.\n"
test $test_exit_code -eq 0
|||
),
};
bertBaseCased.oneshotJob

View File

@@ -0,0 +1,32 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: huggingface-cluster-disk
spec:
storageClassName: ""
capacity:
storage: 500Gi
accessModes:
- ReadOnlyMany
claimRef:
namespace: default
name: huggingface-cluster-disk-claim
gcePersistentDisk:
pdName: huggingface-cluster-disk
fsType: ext4
readOnly: true
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: huggingface-cluster-disk-claim
spec:
# Specify "" as the storageClassName so it matches the PersistentVolume's StorageClass.
# A nil storageClassName value uses the default StorageClass. For details, see
# https://kubernetes.io/docs/concepts/storage/persistent-volumes/#class-1
storageClassName: ""
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 1Ki

View File

@@ -0,0 +1,8 @@
#!/bin/bash
source ~/.bashrc
echo "running docker-entrypoint.sh"
conda activate container
echo $KUBE_GOOGLE_CLOUD_TPU_ENDPOINTS
echo "printed TPU info"
export XRT_TPU_CONFIG="tpu_worker;0;${KUBE_GOOGLE_CLOUD_TPU_ENDPOINTS:7}"
exec "$@"#!/bin/bash

View File

@@ -7,6 +7,14 @@ you can install them with the following command, at the root of the code reposit
pip install -e ".[docs]"
```
---
**NOTE**
You only need to generate the documentation to inspect it locally (if you're planning changes and want to
check how they look like before committing for instance). You don't have to commit the built documentation.
---
## Packages installed
Here's an overview of all the packages installed. If you ran the previous command installing all packages from
@@ -34,20 +42,14 @@ pip install recommonmark
## Building the documentation
Make sure that there is a symlink from the `example` file (in /examples) inside the source folder. Run the following
command to generate it:
```bash
ln -s ../../examples/README.md examples.md
```
Once you have setup `sphinx`, you can build the documentation by running the following command in the `/docs` folder:
```bash
make html
```
A folder called ``_build/html`` should have been created. You can now open the file ``_build/html/index.html`` in your browser.
A folder called ``_build/html`` should have been created. You can now open the file ``_build/html/index.html`` in your
browser.
---
**NOTE**
@@ -68,26 +70,43 @@ It should build the static app that will be available under `/docs/_build/html`
Accepted files are reStructuredText (.rst) and Markdown (.md). Create a file with its extension and put it
in the source directory. You can then link it to the toc-tree by putting the filename without the extension.
## Preview the documentation in a pull request
Once you have made your pull request, you can check what the documentation will look like after it's merged by
following these steps:
- Look at the checks at the bottom of the conversation page of your PR (you may need to click on "show all checks" to
expand them).
- Click on "details" next to the `ci/circleci: build_doc` check.
- In the new window, click on the "Artifacts" tab.
- Locate the file "docs/_build/html/index.html" (or any specific page you want to check) and click on it to get a
preview.
## Writing Documentation - Specification
The `huggingface/transformers` documentation follows the
[Google documentation](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) style. It is
mostly written in ReStructuredText
([Sphinx simple documentation](https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html),
[Sourceforge complete documentation](https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html))
[Sourceforge complete documentation](https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html)).
### Adding a new section
A section is a page held in the `Notes` toc-tree on the documentation. Adding a new section is done in two steps:
### Adding a new tutorial
Adding a new tutorial or section is done in two steps:
- Add a new file under `./source`. This file can either be ReStructuredText (.rst) or Markdown (.md).
- Link that file in `./source/index.rst` on the correct toc-tree.
Make sure to put your new file under the proper section. It's unlikely to go in the first section (*Get Started*), so
depending on the intended targets (beginners, more advanced users or researchers) it should go in section two, three or
four.
### Adding a new model
When adding a new model:
- Create a file `xxx.rst` under `./source/model_doc`.
- Create a file `xxx.rst` under `./source/model_doc` (don't hesitate to copy an existing file as template).
- Link that file in `./source/index.rst` on the `model_doc` toc-tree.
- Write a short overview of the model:
- Overview with paper & authors
@@ -106,18 +125,18 @@ When adding a new model:
These classes should be added using the RST syntax. Usually as follows:
```
XXXConfig
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XXXConfig
:members:
```
This will include every public method of the configuration. If for some reason you wish for a method not to be displayed
in the documentation, you can do so by specifying which methods should be in the docs:
This will include every public method of the configuration that is documented. If for some reason you wish for a method
not to be displayed in the documentation, you can do so by specifying which methods should be in the docs:
```
XXXTokenizer
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XXXTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
@@ -127,14 +146,18 @@ XXXTokenizer
### Writing source documentation
Values that should be put in `code` should either be surrounded by double backticks: \`\`like so\`\` or be written as an object
using the :obj: syntax: :obj:\`like so\`.
Values that should be put in `code` should either be surrounded by double backticks: \`\`like so\`\` or be written as
an object using the :obj: syntax: :obj:\`like so\`. Note that argument names and objects like True, None or any strings
should usually be put in `code`.
When mentionning a class, it is recommended to use the :class: syntax as the mentioned class will be automatically
linked by Sphinx: :class:\`transformers.XXXClass\`
linked by Sphinx: :class:\`~transformers.XXXClass\`
When mentioning a function, it is recommended to use the :func: syntax as the mentioned method will be automatically
linked by Sphinx: :func:\`transformers.XXXClass.method\`
When mentioning a function, it is recommended to use the :func: syntax as the mentioned function will be automatically
linked by Sphinx: :func:\`~transformers.function\`.
When mentioning a method, it is recommended to use the :meth: syntax as the mentioned method will be automatically
linked by Sphinx: :meth:\`~transformers.XXXClass.method\`.
Links should be done as so (note the double underscore at the end): \`text for the link <./local-link-or-global-link#loc>\`__
@@ -151,13 +174,34 @@ Here's an example showcasing everything so far:
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`):
Indices of input sequence tokens in the vocabulary.
Indices can be obtained using :class:`transformers.AlbertTokenizer`.
See :func:`transformers.PreTrainedTokenizer.encode` and
:func:`transformers.PreTrainedTokenizer.encode_plus` for details.
Indices can be obtained using :class:`~transformers.AlbertTokenizer`.
See :meth:`~transformers.PreTrainedTokenizer.encode` and
:meth:`~transformers.PreTrainedTokenizer.__call__` for details.
`What are input IDs? <../glossary.html#input-ids>`__
```
For optional arguments or arguments with defaults we follow the following syntax: imagine we have a function with the
following signature:
```
def my_function(x: str = None, a: float = 1):
```
then its documentation should look like this:
```
Args:
x (:obj:`str`, `optional`):
This argument controls ...
a (:obj:`float`, `optional`, defaults to 1):
This argument is used to ...
```
Note that we always omit the "defaults to :obj:\`None\`" when None is the default for any argument. Also note that even
if the first line describing your argument type and its default gets long, you can't break it on several lines. You can
however write as many lines as you want in the indented description (see the example above with `input_ids`).
#### Writing a multi-line code block
Multi-line code blocks can be useful for displaying examples. They are done like so:
@@ -172,6 +216,9 @@ Example::
The `Example` string at the beginning can be replaced by anything as long as there are two semicolons following it.
We follow the [doctest](https://docs.python.org/3/library/doctest.html) syntax for the examples to automatically test
the results stay consistent with the library.
#### Writing a return block
Arguments should be defined with the `Args:` prefix, followed by a line return and an indentation.
@@ -193,5 +240,5 @@ Here's an example for a single value return:
```
Returns:
A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
:obj:`List[int]`: A list of integers in the range [0, 1] --- 1 for a special token, 0 for a sequence token.
```

View File

@@ -9,4 +9,8 @@
.highlight .kn, .highlight .nv, .highlight .s2, .highlight .ow {
color: #6670FF;
}
.highlight .gp {
color: #FB8D68;
}

View File

@@ -1,9 +1,81 @@
/* Our DOM objects */
/* Colab dropdown */
.colab-dropdown {
position: relative;
display: inline-block;
}
.colab-dropdown-content {
display: none;
position: absolute;
background-color: #f9f9f9;
min-width: 117px;
box-shadow: 0px 8px 16px 0px rgba(0,0,0,0.2);
z-index: 1;
}
.colab-dropdown-content button {
color: #6670FF;
background-color: #f9f9f9;
font-size: 12px;
border: none;
min-width: 117px;
padding: 5px 5px;
text-decoration: none;
display: block;
}
.colab-dropdown-content button:hover {background-color: #eee;}
.colab-dropdown:hover .colab-dropdown-content {display: block;}
/* Version control */
.version-button {
background-color: #6670FF;
color: white;
border: none;
padding: 5px;
font-size: 15px;
cursor: pointer;
}
.version-button:hover, .version-button:focus {
background-color: #A6B0FF;
}
.version-dropdown {
display: none;
background-color: #6670FF;
min-width: 160px;
overflow: auto;
font-size: 15px;
}
.version-dropdown a {
color: white;
padding: 3px 4px;
text-decoration: none;
display: block;
}
.version-dropdown a:hover {
background-color: #A6B0FF;
}
.version-show {
display: block;
}
/* Framework selector */
.framework-selector {
display: flex;
flex-direction: row;
justify-content: flex-end;
margin-right: 30px;
}
.framework-selector > button {
@@ -20,6 +92,12 @@
padding: 5px;
}
/* Copy button */
a.copybtn {
margin: 3px;
}
/* The literal code blocks */
.rst-content tt.literal, .rst-content tt.literal, .rst-content code.literal {
color: #6670FF;
@@ -38,6 +116,7 @@
/* The research field on top of the toc tree */
.wy-side-nav-search{
padding-top: 0;
background-color: #6670FF;
}
@@ -46,6 +125,12 @@
background-color: #6670FF;
}
/* The section headers in the toc tree */
.wy-menu-vertical p.caption{
background-color: #4d59ff;
line-height: 40px;
}
/* The selected items in the toc tree */
.wy-menu-vertical li.current{
background-color: #A6B0FF;

View File

@@ -1,3 +1,42 @@
// These two things need to be updated at each release for the version selector.
// Last stable version
const stableVersion = "v3.3.0"
// Dictionary doc folder to label
const versionMapping = {
"master": "master",
"": "v3.3.0/v3.3.1",
"v3.2.0": "v3.2.0",
"v3.1.0": "v3.1.0 (stable)",
"v3.0.2": "v3.0.0/v3.0.1/v3.0.2",
"v2.11.0": "v2.11.0",
"v2.10.0": "v2.10.0",
"v2.9.1": "v2.9.0/v2.9.1",
"v2.8.0": "v2.8.0",
"v2.7.0": "v2.7.0",
"v2.6.0": "v2.6.0",
"v2.5.1": "v2.5.0/v2.5.1",
"v2.4.0": "v2.4.0/v2.4.1",
"v2.3.0": "v2.3.0",
"v2.2.0": "v2.2.0/v2.2.1/v2.2.2",
"v2.1.1": "v2.1.1",
"v2.0.0": "v2.0.0",
"v1.2.0": "v1.2.0",
"v1.1.0": "v1.1.0",
"v1.0.0": "v1.0.0"
}
// The page that have a notebook and therefore should have the open in colab badge.
const hasNotebook = [
"benchmarks",
"custom_datasets",
"multilingual",
"perplexity",
"preprocessing",
"quicktour",
"task_summary",
"tokenizer_summary",
"training"
];
function addIcon() {
const huggingFaceLogo = "https://huggingface.co/landing/assets/transformers-docs/huggingface_logo.svg";
const image = document.createElement("img");
@@ -58,11 +97,94 @@ function addGithubButton() {
document.querySelector(".wy-side-nav-search .icon-home").insertAdjacentHTML('afterend', div);
}
function addColabLink() {
const parts = location.toString().split('/');
const pageName = parts[parts.length - 1].split(".")[0];
if (hasNotebook.includes(pageName)) {
const baseURL = "https://colab.research.google.com/github/huggingface/notebooks/blob/master/transformers_doc/"
const linksColab = `
<div class="colab-dropdown">
<img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg">
<div class="colab-dropdown-content">
<button onclick=" window.open('${baseURL}${pageName}.ipynb')">Mixed</button>
<button onclick=" window.open('${baseURL}pytorch/${pageName}.ipynb')">PyTorch</button>
<button onclick=" window.open('${baseURL}tensorflow/${pageName}.ipynb')">TensorFlow</button>
</div>
</div>`
const leftMenu = document.querySelector(".wy-breadcrumbs-aside")
leftMenu.innerHTML = linksColab + '\n' + leftMenu.innerHTML
}
}
function addVersionControl() {
// To grab the version currently in view, we parse the url
const parts = location.toString().split('/');
let versionIndex = parts.length - 2;
// Index page may not have a last part with filename.html so we need to go up
if (parts[parts.length - 1] != "" && ! parts[parts.length - 1].match(/\.html$|^search.html?/)) {
versionIndex = parts.length - 1;
}
// Main classes and models are nested so we need to go deeper
else if (parts[versionIndex] == "main_classes" || parts[versionIndex] == "model_doc") {
versionIndex = versionIndex - 1;
}
const version = parts[versionIndex];
// Menu with all the links,
const versionMenu = document.createElement("div");
const htmlLines = [];
for (const [key, value] of Object.entries(versionMapping)) {
let baseUrlIndex = (version == "transformers") ? versionIndex + 1: versionIndex;
var urlParts = parts.slice(0, baseUrlIndex);
if (key != "") {
urlParts = urlParts.concat([key]);
}
urlParts = urlParts.concat(parts.slice(versionIndex+1));
htmlLines.push(`<a href="${urlParts.join('/')}">${value}</a>`);
}
versionMenu.classList.add("version-dropdown");
versionMenu.innerHTML = htmlLines.join('\n');
// Button for version selection
const versionButton = document.createElement("div");
versionButton.classList.add("version-button");
let label = (version == "transformers") ? stableVersion : version
versionButton.innerText = label.concat(" ▼");
// Toggle the menu when we click on the button
versionButton.addEventListener("click", () => {
versionMenu.classList.toggle("version-show");
});
// Hide the menu when we click elsewhere
window.addEventListener("click", (event) => {
if (event.target != versionButton){
versionMenu.classList.remove('version-show');
}
});
// Container
const div = document.createElement("div");
div.appendChild(versionButton);
div.appendChild(versionMenu);
div.style.paddingTop = '25px';
div.style.backgroundColor = '#6670FF';
div.style.display = 'block';
div.style.textAlign = 'center';
const scrollDiv = document.querySelector(".wy-side-scroll");
scrollDiv.insertBefore(div, scrollDiv.children[1]);
}
function addHfMenu() {
const div = `
<div class="menu">
<a href="/welcome">🔥 Sign in</a>
<a href="/models">🚀 Models</a>
<a href="http://discuss.huggingface.co">💬 Forum</a>
</div>
`;
document.body.insertAdjacentHTML('afterbegin', div);
@@ -72,6 +194,8 @@ function platformToggle() {
const codeBlocks = Array.from(document.getElementsByClassName("highlight"));
const pytorchIdentifier = "## PYTORCH CODE";
const tensorflowIdentifier = "## TENSORFLOW CODE";
const promptSpanIdentifier = `<span class="gp">&gt;&gt;&gt; </span>`
const pytorchSpanIdentifier = `<span class="c1">${pytorchIdentifier}</span>`;
const tensorflowSpanIdentifier = `<span class="c1">${tensorflowIdentifier}</span>`;
@@ -84,10 +208,22 @@ function platformToggle() {
let tensorflowSpans;
if(pytorchSpanPosition < tensorflowSpanPosition){
pytorchSpans = spans.slice(pytorchSpanPosition + pytorchSpanIdentifier.length + 1, tensorflowSpanPosition);
const isPrompt = spans.slice(
spans.indexOf(tensorflowSpanIdentifier) - promptSpanIdentifier.length,
spans.indexOf(tensorflowSpanIdentifier)
) == promptSpanIdentifier;
const finalTensorflowSpanPosition = isPrompt ? tensorflowSpanPosition - promptSpanIdentifier.length : tensorflowSpanPosition;
pytorchSpans = spans.slice(pytorchSpanPosition + pytorchSpanIdentifier.length + 1, finalTensorflowSpanPosition);
tensorflowSpans = spans.slice(tensorflowSpanPosition + tensorflowSpanIdentifier.length + 1, spans.length);
}else{
tensorflowSpans = spans.slice(tensorflowSpanPosition + tensorflowSpanIdentifier.length + 1, pytorchSpanPosition);
const isPrompt = spans.slice(
spans.indexOf(pytorchSpanIdentifier) - promptSpanIdentifier.length,
spans.indexOf(pytorchSpanIdentifier)
) == promptSpanIdentifier;
const finalPytorchSpanPosition = isPrompt ? pytorchSpanPosition - promptSpanIdentifier.length : pytorchSpanPosition;
tensorflowSpans = spans.slice(tensorflowSpanPosition + tensorflowSpanIdentifier.length + 1, finalPytorchSpanPosition);
pytorchSpans = spans.slice(pytorchSpanPosition + pytorchSpanIdentifier.length + 1, spans.length);
}
@@ -100,9 +236,11 @@ function platformToggle() {
const createFrameworkButtons = sample => {
const pytorchButton = document.createElement("button");
pytorchButton.classList.add('pytorch-button')
pytorchButton.innerText = "PyTorch";
const tensorflowButton = document.createElement("button");
tensorflowButton.classList.add('tensorflow-button')
tensorflowButton.innerText = "TensorFlow";
const selectorDiv = document.createElement("div");
@@ -117,22 +255,36 @@ function platformToggle() {
tensorflowButton.classList.remove("selected");
pytorchButton.addEventListener("click", () => {
sample.element.innerHTML = sample.pytorchSample;
pytorchButton.classList.add("selected");
tensorflowButton.classList.remove("selected");
for(const codeBlock of updatedCodeBlocks){
codeBlock.element.innerHTML = codeBlock.pytorchSample;
}
Array.from(document.getElementsByClassName('pytorch-button')).forEach(button => {
button.classList.add("selected");
})
Array.from(document.getElementsByClassName('tensorflow-button')).forEach(button => {
button.classList.remove("selected");
})
});
tensorflowButton.addEventListener("click", () => {
sample.element.innerHTML = sample.tensorflowSample;
tensorflowButton.classList.add("selected");
pytorchButton.classList.remove("selected");
for(const codeBlock of updatedCodeBlocks){
codeBlock.element.innerHTML = codeBlock.tensorflowSample;
}
Array.from(document.getElementsByClassName('tensorflow-button')).forEach(button => {
button.classList.add("selected");
})
Array.from(document.getElementsByClassName('pytorch-button')).forEach(button => {
button.classList.remove("selected");
})
});
};
codeBlocks
const updatedCodeBlocks = codeBlocks
.map(element => {return {element: element.firstChild, innerText: element.innerText}})
.filter(codeBlock => codeBlock.innerText.includes(pytorchIdentifier) && codeBlock.innerText.includes(tensorflowIdentifier))
.map(getFrameworkSpans)
.forEach(createFrameworkButtons);
updatedCodeBlocks
.forEach(createFrameworkButtons)
}
@@ -149,10 +301,12 @@ function parseGithubButtons (){"use strict";var e=window.document,t=e.location,o
function onLoad() {
addIcon();
addVersionControl();
addCustomFooter();
addGithubButton();
parseGithubButtons();
addHfMenu();
addColabLink();
platformToggle();
}

View File

@@ -1,54 +0,0 @@
# Benchmarks
This section is dedicated to the Benchmarks done by the library, both by maintainers, contributors and users. These
benchmark will help keep track of the preformance improvements that are brought to our models across versions.
## Benchmarking all models for inference
As of version 2.1 we have benchmarked all models for inference, across many different settings: using PyTorch, with
and without TorchScript, using TensorFlow, with and without XLA. All of those tests were done across CPUs (except for
TensorFlow XLA) and GPUs.
The approach is detailed in the [following blogpost](https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2)
The results are available [here](https://docs.google.com/spreadsheets/d/1sryqufw2D0XlUH4sq3e9Wnxu5EAQkaohzrJbd5HdQ_w/edit?usp=sharing).
## TF2 with mixed precision, XLA, Distribution (@tlkh)
This work was done by [Timothy Liu](https://github.com/tlkh).
There are very positive results to be gained from the various TensorFlow 2.0 features:
- Automatic Mixed Precision (AMP)
- XLA compiler
- Distribution strategies (multi-GPU)
The benefits are listed here (tested on CoLA, MRPC, SST-2):
- AMP: Between 1.4x to 1.6x decrease in overall time without change in batch size
- AMP+XLA: Up to 2.5x decrease in overall time on SST-2 (larger dataset)
- Distribution: Between 1.4x to 3.4x decrease in overall time on 4xV100
- Combined: Up to 5.7x decrease in overall training time, or 9.1x training throughput
The model quality (measured by the validation accuracy) fluctuates slightly. Taking an average of 4 training runs
on a single GPU gives the following results:
- CoLA: AMP results in slighter lower acc (0.820 vs 0.824)
- MRPC: AMP results in lower acc (0.823 vs 0.835)
- SST-2: AMP results in slighter lower acc (0.918 vs 0.922)
However, in a distributed setting with 4xV100 (4x batch size), AMP can yield in better results:
CoLA: AMP results in higher acc (0.828 vs 0.812)
MRPC: AMP results in lower acc (0.817 vs 0.827)
SST-2: AMP results in slightly lower acc (0.926 vs 0.929)
The benchmark script is available [here](https://github.com/NVAITC/benchmarking/blob/master/tf2/bert_dist.py).
Note: on some tasks (e.g. MRPC), the dataset is too small. The overhead due to the model compilation with XLA as well
as the distribution strategy setup does not speed things up. The XLA compile time is also the reason why although throughput
can increase a lot (e.g. 2.7x for single GPU), overall (end-to-end) training speed-up is not as fast (as low as 1.4x)
The benefits as seen on SST-2 (larger dataset) is much clear.
All results can be seen on this [Google Sheet](https://docs.google.com/spreadsheets/d/1538MN224EzjbRL239sqSiUy6YY-rAjHyXhTzz_Zptls/edit#gid=960868445).

322
docs/source/benchmarks.rst Normal file
View File

@@ -0,0 +1,322 @@
Benchmarks
=======================================================================================================================
Let's take a look at how 🤗 Transformer models can be benchmarked, best practices, and already available benchmarks.
A notebook explaining in more detail how to benchmark 🤗 Transformer models can be found `here <https://github.com/huggingface/transformers/blob/master/notebooks/05-benchmark.ipynb>`__.
How to benchmark 🤗 Transformer models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The classes :class:`~transformers.PyTorchBenchmark` and :class:`~transformers.TensorFlowBenchmark` allow to flexibly benchmark 🤗 Transformer models.
The benchmark classes allow us to measure the `peak memory usage` and `required time` for both
`inference` and `training`.
.. note::
Hereby, `inference` is defined by a single forward pass, and `training` is defined by a single forward pass and backward pass.
The benchmark classes :class:`~transformers.PyTorchBenchmark` and :class:`~transformers.TensorFlowBenchmark` expect an object of type :class:`~transformers.PyTorchBenchmarkArguments` and :class:`~transformers.TensorFlowBenchmarkArguments`, respectively, for instantiation. :class:`~transformers.PyTorchBenchmarkArguments` and :class:`~transformers.TensorFlowBenchmarkArguments` are data classes and contain all relevant configurations for their corresponding benchmark class.
In the following example, it is shown how a BERT model of type `bert-base-cased` can be benchmarked.
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments
>>> args = PyTorchBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
>>> benchmark = PyTorchBenchmark(args)
>>> ## TENSORFLOW CODE
>>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments
>>> args = TensorFlowBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
>>> benchmark = TensorFlowBenchmark(args)
Here, three arguments are given to the benchmark argument data classes, namely ``models``, ``batch_sizes``, and ``sequence_lengths``. The argument ``models`` is required and expects a :obj:`list` of model identifiers from the `model hub <https://huggingface.co/models>`__
The :obj:`list` arguments ``batch_sizes`` and ``sequence_lengths`` define the size of the ``input_ids`` on which the model is benchmarked.
There are many more parameters that can be configured via the benchmark argument data classes. For more detail on these one can either directly consult the files
``src/transformers/benchmark/benchmark_args_utils.py``, ``src/transformers/benchmark/benchmark_args.py`` (for PyTorch) and ``src/transformers/benchmark/benchmark_args_tf.py`` (for Tensorflow).
Alternatively, running the following shell commands from root will print out a descriptive list of all configurable parameters for PyTorch and Tensorflow respectively.
.. code-block:: bash
## PYTORCH CODE
python examples/benchmarking/run_benchmark.py --help
## TENSORFLOW CODE
python examples/benchmarking/run_benchmark_tf.py --help
An instantiated benchmark object can then simply be run by calling ``benchmark.run()``.
.. code-block::
>>> ## PYTORCH CODE
>>> results = benchmark.run()
>>> print(results)
==================== INFERENCE - SPEED - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Time in s
--------------------------------------------------------------------------------
bert-base-uncased 8 8 0.006
bert-base-uncased 8 32 0.006
bert-base-uncased 8 128 0.018
bert-base-uncased 8 512 0.088
--------------------------------------------------------------------------------
==================== INFERENCE - MEMORY - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Memory in MB
--------------------------------------------------------------------------------
bert-base-uncased 8 8 1227
bert-base-uncased 8 32 1281
bert-base-uncased 8 128 1307
bert-base-uncased 8 512 1539
--------------------------------------------------------------------------------
==================== ENVIRONMENT INFORMATION ====================
- transformers_version: 2.11.0
- framework: PyTorch
- use_torchscript: False
- framework_version: 1.4.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 08:58:43.371351
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
>>> ## TENSORFLOW CODE
>>> results = benchmark.run()
>>> print(results)
==================== INFERENCE - SPEED - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Time in s
--------------------------------------------------------------------------------
bert-base-uncased 8 8 0.005
bert-base-uncased 8 32 0.008
bert-base-uncased 8 128 0.022
bert-base-uncased 8 512 0.105
--------------------------------------------------------------------------------
==================== INFERENCE - MEMORY - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Memory in MB
--------------------------------------------------------------------------------
bert-base-uncased 8 8 1330
bert-base-uncased 8 32 1330
bert-base-uncased 8 128 1330
bert-base-uncased 8 512 1770
--------------------------------------------------------------------------------
==================== ENVIRONMENT INFORMATION ====================
- transformers_version: 2.11.0
- framework: Tensorflow
- use_xla: False
- framework_version: 2.2.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:26:35.617317
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
By default, the `time` and the `required memory` for `inference` are benchmarked.
In the example output above the first two sections show the result corresponding to `inference time` and `inference memory`.
In addition, all relevant information about the computing environment, `e.g.` the GPU type, the system, the library versions, etc... are printed out in the third section under `ENVIRONMENT INFORMATION`.
This information can optionally be saved in a `.csv` file when adding the argument :obj:`save_to_csv=True` to :class:`~transformers.PyTorchBenchmarkArguments` and :class:`~transformers.TensorFlowBenchmarkArguments` respectively.
In this case, every section is saved in a separate `.csv` file. The path to each `.csv` file can optionally be defined via the argument data classes.
Instead of benchmarking pre-trained models via their model identifier, `e.g.` `bert-base-uncased`, the user can alternatively benchmark an arbitrary configuration of any available model class.
In this case, a :obj:`list` of configurations must be inserted with the benchmark args as follows.
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments, BertConfig
>>> args = PyTorchBenchmarkArguments(models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
>>> config_base = BertConfig()
>>> config_384_hid = BertConfig(hidden_size=384)
>>> config_6_lay = BertConfig(num_hidden_layers=6)
>>> benchmark = PyTorchBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
>>> benchmark.run()
==================== INFERENCE - SPEED - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Time in s
--------------------------------------------------------------------------------
bert-base 8 128 0.006
bert-base 8 512 0.006
bert-base 8 128 0.018
bert-base 8 512 0.088
bert-384-hid 8 8 0.006
bert-384-hid 8 32 0.006
bert-384-hid 8 128 0.011
bert-384-hid 8 512 0.054
bert-6-lay 8 8 0.003
bert-6-lay 8 32 0.004
bert-6-lay 8 128 0.009
bert-6-lay 8 512 0.044
--------------------------------------------------------------------------------
==================== INFERENCE - MEMORY - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Memory in MB
--------------------------------------------------------------------------------
bert-base 8 8 1277
bert-base 8 32 1281
bert-base 8 128 1307
bert-base 8 512 1539
bert-384-hid 8 8 1005
bert-384-hid 8 32 1027
bert-384-hid 8 128 1035
bert-384-hid 8 512 1255
bert-6-lay 8 8 1097
bert-6-lay 8 32 1101
bert-6-lay 8 128 1127
bert-6-lay 8 512 1359
--------------------------------------------------------------------------------
==================== ENVIRONMENT INFORMATION ====================
- transformers_version: 2.11.0
- framework: PyTorch
- use_torchscript: False
- framework_version: 1.4.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:35:25.143267
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
>>> ## TENSORFLOW CODE
>>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments, BertConfig
>>> args = TensorFlowBenchmarkArguments(models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
>>> config_base = BertConfig()
>>> config_384_hid = BertConfig(hidden_size=384)
>>> config_6_lay = BertConfig(num_hidden_layers=6)
>>> benchmark = TensorFlowBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
>>> benchmark.run()
==================== INFERENCE - SPEED - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Time in s
--------------------------------------------------------------------------------
bert-base 8 8 0.005
bert-base 8 32 0.008
bert-base 8 128 0.022
bert-base 8 512 0.106
bert-384-hid 8 8 0.005
bert-384-hid 8 32 0.007
bert-384-hid 8 128 0.018
bert-384-hid 8 512 0.064
bert-6-lay 8 8 0.002
bert-6-lay 8 32 0.003
bert-6-lay 8 128 0.0011
bert-6-lay 8 512 0.074
--------------------------------------------------------------------------------
==================== INFERENCE - MEMORY - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Memory in MB
--------------------------------------------------------------------------------
bert-base 8 8 1330
bert-base 8 32 1330
bert-base 8 128 1330
bert-base 8 512 1770
bert-384-hid 8 8 1330
bert-384-hid 8 32 1330
bert-384-hid 8 128 1330
bert-384-hid 8 512 1540
bert-6-lay 8 8 1330
bert-6-lay 8 32 1330
bert-6-lay 8 128 1330
bert-6-lay 8 512 1540
--------------------------------------------------------------------------------
==================== ENVIRONMENT INFORMATION ====================
- transformers_version: 2.11.0
- framework: Tensorflow
- use_xla: False
- framework_version: 2.2.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:38:15.487125
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
Again, `inference time` and `required memory` for `inference` are measured, but this time for customized configurations of the :obj:`BertModel` class. This feature can especially be helpful when
deciding for which configuration the model should be trained.
Benchmark best practices
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section lists a couple of best practices one should be aware of when benchmarking a model.
- Currently, only single device benchmarking is supported. When benchmarking on GPU, it is recommended that the user
specifies on which device the code should be run by setting the ``CUDA_VISIBLE_DEVICES`` environment variable in the shell, `e.g.` ``export CUDA_VISIBLE_DEVICES=0`` before running the code.
- The option :obj:`no_multi_processing` should only be set to :obj:`True` for testing and debugging. To ensure accurate memory measurement it is recommended to run each memory benchmark in a separate process by making sure :obj:`no_multi_processing` is set to :obj:`True`.
- One should always state the environment information when sharing the results of a model benchmark. Results can vary heavily between different GPU devices, library versions, etc., so that benchmark results on their own are not very useful for the community.
Sharing your benchmark
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Previously all available core models (10 at the time) have been benchmarked for `inference time`, across many different settings: using PyTorch, with
and without TorchScript, using TensorFlow, with and without XLA. All of those tests were done across CPUs (except for
TensorFlow XLA) and GPUs.
The approach is detailed in the `following blogpost <https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2>`__ and the results are available `here <https://docs.google.com/spreadsheets/d/1sryqufw2D0XlUH4sq3e9Wnxu5EAQkaohzrJbd5HdQ_w/edit?usp=sharing>`__.
With the new `benchmark` tools, it is easier than ever to share your benchmark results with the community `here <https://github.com/huggingface/transformers/blob/master/examples/benchmarking/README.md>`__.

View File

@@ -1,5 +1,5 @@
BERTology
---------
-----------------------------------------------------------------------------------------------------------------------
There is a growing field of study concerned with investigating the inner working of large-scale transformers like BERT (that some call "BERTology"). Some good examples of this field are:

View File

@@ -26,7 +26,7 @@ author = u'huggingface'
# The short X.Y version
version = u''
# The full version, including alpha/beta/rc tags
release = u'2.9.1'
release = u'3.4.0'
# -- General configuration ---------------------------------------------------
@@ -44,7 +44,8 @@ extensions = [
'sphinx.ext.napoleon',
'recommonmark',
'sphinx.ext.viewcode',
'sphinx_markdown_tables'
'sphinx_markdown_tables',
'sphinx_copybutton'
]
# Add any paths that contain templates here, relative to this directory.
@@ -74,6 +75,9 @@ exclude_patterns = [u'_build', 'Thumbs.db', '.DS_Store']
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = None
# Remove the prompt when copying examples
copybutton_prompt_text = r">>> |\.\.\. "
copybutton_prompt_is_regexp = True
# -- Options for HTML output -------------------------------------------------
@@ -187,8 +191,8 @@ epub_title = project
epub_exclude_files = ['search.html']
def setup(app):
app.add_stylesheet('css/huggingface.css')
app.add_stylesheet('css/code-snippets.css')
app.add_css_file('css/huggingface.css')
app.add_css_file('css/code-snippets.css')
app.add_js_file('js/custom.js')
# -- Extension configuration -------------------------------------------------

1
docs/source/contributing.md Symbolic link
View File

@@ -0,0 +1 @@
../../CONTRIBUTING.md

View File

@@ -1,5 +1,5 @@
Converting Tensorflow Checkpoints
================================================
=======================================================================================================================
A command-line interface is provided to convert original Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM checkpoints in models than be loaded using the ``from_pretrained`` methods of the library.
@@ -10,7 +10,7 @@ A command-line interface is provided to convert original Bert/GPT/GPT-2/Transfor
The documentation below reflects the **transformers-cli convert** command format.
BERT
^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can convert any TensorFlow checkpoint for BERT (in particular `the pre-trained models released by Google <https://github.com/google-research/bert#pre-trained-models>`_\ ) in a PyTorch save file by using the `convert_bert_original_tf_checkpoint_to_pytorch.py <https://github.com/huggingface/transformers/blob/master/src/transformers/convert_bert_original_tf_checkpoint_to_pytorch.py>`_ script.
@@ -34,7 +34,7 @@ Here is an example of the conversion process for a pre-trained ``BERT-Base Uncas
You can download Google's pre-trained models for the conversion `here <https://github.com/google-research/bert#pre-trained-models>`__.
ALBERT
^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Convert TensorFlow model checkpoints of ALBERT to PyTorch using the `convert_albert_original_tf_checkpoint_to_pytorch.py <https://github.com/huggingface/transformers/blob/master/src/transformers/convert_bert_original_tf_checkpoint_to_pytorch.py>`_ script.
@@ -54,7 +54,7 @@ Here is an example of the conversion process for the pre-trained ``ALBERT Base``
You can download Google's pre-trained models for the conversion `here <https://github.com/google-research/albert#pre-trained-models>`__.
OpenAI GPT
^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained OpenAI GPT model, assuming that your NumPy checkpoint save as the same format than OpenAI pretrained model (see `here <https://github.com/openai/finetune-transformer-lm>`__\ )
@@ -70,7 +70,7 @@ Here is an example of the conversion process for a pre-trained OpenAI GPT model,
OpenAI GPT-2
^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained OpenAI GPT-2 model (see `here <https://github.com/openai/gpt-2>`__\ )
@@ -85,7 +85,7 @@ Here is an example of the conversion process for a pre-trained OpenAI GPT-2 mode
[--finetuning_task_name OPENAI_GPT2_FINETUNED_TASK]
Transformer-XL
^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained Transformer-XL model (see `here <https://github.com/kimiyoung/transformer-xl/tree/master/tf#obtain-and-evaluate-pretrained-sota-models>`__\ )
@@ -101,7 +101,7 @@ Here is an example of the conversion process for a pre-trained Transformer-XL mo
XLNet
^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained XLNet model:
@@ -118,7 +118,7 @@ Here is an example of the conversion process for a pre-trained XLNet model:
XLM
^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained XLM model:

View File

@@ -0,0 +1,715 @@
Fine-tuning with custom datasets
=======================================================================================================================
.. note::
The datasets used in this tutorial are available and can be more easily accessed using the
`🤗 NLP library <https://github.com/huggingface/nlp>`_. We do not use this library to access the datasets here
since this tutorial meant to illustrate how to work with your own data. A brief of introduction can be found
at the end of the tutorial in the section ":ref:`nlplib`".
This tutorial will take you through several examples of using 🤗 Transformers models with your own datasets. The
guide shows one of many valid workflows for using these models and is meant to be illustrative rather than
definitive. We show examples of reading in several data formats, preprocessing the data for several types of tasks,
and then preparing the data into PyTorch/TensorFlow ``Dataset`` objects which can easily be used either with
:class:`~transformers.Trainer`/:class:`~transformers.TFTrainer` or with native PyTorch/TensorFlow.
We include several examples, each of which demonstrates a different type of common downstream task:
- :ref:`seq_imdb`
- :ref:`tok_ner`
- :ref:`qa_squad`
- :ref:`resources`
.. _seq_imdb:
Sequence Classification with IMDb Reviews
-----------------------------------------------------------------------------------------------------------------------
.. note::
This dataset can be explored in the Hugging Face model hub (`IMDb <https://huggingface.co/datasets/imdb>`_), and can
be alternatively downloaded with the 🤗 NLP library with ``load_dataset("imdb")``.
In this example, we'll show how to download, tokenize, and train a model on the IMDb reviews dataset. This task
takes the text of a review and requires the model to predict whether the sentiment of the review is positive or
negative. Let's start by downloading the dataset from the
`Large Movie Review Dataset <http://ai.stanford.edu/~amaas/data/sentiment/>`_ webpage.
.. code-block:: bash
wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
tar -xf aclImdb_v1.tar.gz
This data is organized into ``pos`` and ``neg`` folders with one text file per example. Let's write a function that can
read this in.
.. code-block:: python
from pathlib import Path
def read_imdb_split(split_dir):
split_dir = Path(split_dir)
texts = []
labels = []
for label_dir in ["pos", "neg"]:
for text_file in (split_dir/label_dir).iterdir():
texts.append(text_file.read_text())
labels.append(0 if label_dir is "neg" else 1)
return texts, labels
train_texts, train_labels = read_imdb_split('aclImdb/train')
test_texts, test_labels = read_imdb_split('aclImdb/test')
We now have a train and test dataset, but let's also also create a validation set which we can use for for
evaluation and tuning without training our test set results. Sklearn has a convenient utility for creating such
splits:
.. code-block:: python
from sklearn.model_selection import train_test_split
train_texts, val_texts, train_labels, val_labels = train_test_split(train_texts, train_labels, test_size=.2)
Alright, we've read in our dataset. Now let's tackle tokenization. We'll eventually train a classifier using
pre-trained DistilBert, so let's use the DistilBert tokenizer.
.. code-block:: python
from transformers import DistilBertTokenizerFast
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
Now we can simply pass our texts to the tokenizer. We'll pass ``truncation=True`` and ``padding=True``, which will
ensure that all of our sequences are padded to the same length and are truncated to be no longer model's maximum
input length. This will allow us to feed batches of sequences into the model at the same time.
.. code-block:: python
train_encodings = tokenizer(train_texts, truncation=True, padding=True)
val_encodings = tokenizer(val_texts, truncation=True, padding=True)
test_encodings = tokenizer(test_texts, truncation=True, padding=True)
Now, let's turn our labels and encodings into a Dataset object. In PyTorch, this is done by subclassing a
``torch.utils.data.Dataset`` object and implementing ``__len__`` and ``__getitem__``. In TensorFlow, we pass our input encodings and
labels to the ``from_tensor_slices`` constructor method. We put the data in this format so that the data can be
easily batched such that each key in the batch encoding corresponds to a named parameter of the
:meth:`~transformers.DistilBertForSequenceClassification.forward` method of the model we will train.
.. code-block:: python
## PYTORCH CODE
import torch
class IMDbDataset(torch.utils.data.Dataset):
def __init__(self, encodings, labels):
self.encodings = encodings
self.labels = labels
def __getitem__(self, idx):
item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
item['labels'] = torch.tensor(self.labels[idx])
return item
def __len__(self):
return len(self.labels)
train_dataset = IMDbDataset(train_encodings, train_labels)
val_dataset = IMDbDataset(val_encodings, val_labels)
test_dataset = IMDbDataset(test_encodings, test_labels)
## TENSORFLOW CODE
import tensorflow as tf
train_dataset = tf.data.Dataset.from_tensor_slices((
dict(train_encodings),
train_labels
))
val_dataset = tf.data.Dataset.from_tensor_slices((
dict(val_encodings),
val_labels
))
test_dataset = tf.data.Dataset.from_tensor_slices((
dict(test_encodings),
test_labels
))
Now that our datasets our ready, we can fine-tune a model either with the 🤗
:class:`~transformers.Trainer`/:class:`~transformers.TFTrainer` or with native PyTorch/TensorFlow. See
:doc:`training <training>`.
.. _ft_trainer:
Fine-tuning with Trainer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The steps above prepared the datasets in the way that the trainer is expected. Now all we need to do is create a
model to fine-tune, define the :class:`~transformers.TrainingArguments`/:class:`~transformers.TFTrainingArguments`
and instantiate a :class:`~transformers.Trainer`/:class:`~transformers.TFTrainer`.
.. code-block:: python
## PYTORCH CODE
from transformers import DistilBertForSequenceClassification, Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir='./results', # output directory
num_train_epochs=3, # total number of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=64, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
logging_steps=10,
)
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
trainer = Trainer(
model=model, # the instantiated 🤗 Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=train_dataset, # training dataset
eval_dataset=val_dataset # evaluation dataset
)
trainer.train()
## TENSORFLOW CODE
from transformers import TFDistilBertForSequenceClassification, TFTrainer, TFTrainingArguments
training_args = TFTrainingArguments(
output_dir='./results', # output directory
num_train_epochs=3, # total number of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=64, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
logging_steps=10,
)
with training_args.strategy.scope():
model = TFDistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
trainer = TFTrainer(
model=model, # the instantiated 🤗 Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=train_dataset, # training dataset
eval_dataset=val_dataset # evaluation dataset
)
trainer.train()
.. _ft_native:
Fine-tuning with native PyTorch/TensorFlow
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We can also train use native PyTorch or TensorFlow:
.. code-block:: python
## PYTORCH CODE
from torch.utils.data import DataLoader
from transformers import DistilBertForSequenceClassification, AdamW
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
model.to(device)
model.train()
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
optim = AdamW(model.parameters(), lr=5e-5)
for epoch in range(3):
for batch in train_loader:
optim.zero_grad()
input_ids = batch['input_ids'].to(device)
attention_mask = batch['attention_mask'].to(device)
labels = batch['labels'].to(device)
outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
loss = outputs[0]
loss.backward()
optim.step()
model.eval()
## TENSORFLOW CODE
from transformers import TFDistilBertForSequenceClassification
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)
.. _tok_ner:
Token Classification with W-NUT Emerging Entities
-----------------------------------------------------------------------------------------------------------------------
.. note::
This dataset can be explored in the Hugging Face model hub (`WNUT-17 <https://huggingface.co/datasets/wnut_17>`_), and can
be alternatively downloaded with the 🤗 NLP library with ``load_dataset("wnut_17")``.
Next we will look at token classification. Rather than classifying an entire sequence, this task classifies token by
token. We'll demonstrate how to do this with
`Named Entity Recognition <http://nlpprogress.com/english/named_entity_recognition.html>`_, which involves
identifying tokens which correspond to a predefined set of "entities". Specifically, we'll use the
`W-NUT Emerging and Rare entities <http://noisy-text.github.io/2017/emerging-rare-entities.html>`_ corpus. The data
is given as a collection of pre-tokenized documents where each token is assigned a tag.
Let's start by downloading the data.
.. code-block:: bash
wget http://noisy-text.github.io/2017/files/wnut17train.conll
In this case, we'll just download the train set, which is a single text file. Each line of the file contains either
(1) a word and tag separated by a tab, or (2) a blank line indicating the end of a document. Let's write a
function to read this in. We'll take in the file path and return ``token_docs`` which is a list of lists of token
strings, and ``token_tags`` which is a list of lists of tag strings.
.. code-block:: python
from pathlib import Path
import re
def read_wnut(file_path):
file_path = Path(file_path)
raw_text = file_path.read_text().strip()
raw_docs = re.split(r'\n\t?\n', raw_text)
token_docs = []
tag_docs = []
for doc in raw_docs:
tokens = []
tags = []
for line in doc.split('\n'):
token, tag = line.split('\t')
tokens.append(token)
tags.append(tag)
token_docs.append(tokens)
tag_docs.append(tags)
return token_docs, tag_docs
texts, tags = read_wnut('wnut17train.conll')
Just to see what this data looks like, let's take a look at a segment of the first document.
.. code-block:: python
>>> print(texts[0][10:17], tags[0][10:17], sep='\n')
['for', 'two', 'weeks', '.', 'Empire', 'State', 'Building']
['O', 'O', 'O', 'O', 'B-location', 'I-location', 'I-location']
``location`` is an entity type, ``B-`` indicates the beginning of an entity, and ``I-`` indicates consecutive positions of
the same entity ("Empire State Building" is considered one entity). ``O`` indicates the token does not correspond to
any entity.
Now that we've read the data in, let's create a train/validation split:
.. code-block:: python
from sklearn.model_selection import train_test_split
train_texts, val_texts, train_tags, val_tags = train_test_split(texts, tags, test_size=.2)
Next, let's create encodings for our tokens and tags. For the tags, we can start by just create a simple mapping
which we'll use in a moment:
.. code-block:: python
unique_tags = set(tag for doc in tags for tag in doc)
tag2id = {tag: id for id, tag in enumerate(unique_tags)}
id2tag = {id: tag for tag, id in tag2id.items()}
To encode the tokens, we'll use a pre-trained DistilBert tokenizer. We can tell the tokenizer that we're dealing
with ready-split tokens rather than full sentence strings by passing ``is_split_into_words=True``. We'll also pass
``padding=True`` and ``truncation=True`` to pad the sequences to be the same length. Lastly, we can tell the model
to return information about the tokens which are split by the wordpiece tokenization process, which we will need in
a moment.
.. code-block:: python
from transformers import DistilBertTokenizerFast
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-cased')
train_encodings = tokenizer(train_texts, is_split_into_words=True, return_offsets_mapping=True, padding=True, truncation=True)
val_encodings = tokenizer(val_texts, is_split_into_words=True, return_offsets_mapping=True, padding=True, truncation=True)
Great, so now our tokens are nicely encoded in the format that they need to be in to feed them into our DistilBert
model below.
Now we arrive at a common obstacle with using pre-trained models for token-level classification: many of the tokens
in the W-NUT corpus are not in DistilBert's vocabulary. Bert and many models like it use a method called WordPiece
Tokenization, meaning that single words are split into multiple tokens such that each token is likely to be in
the vocabulary. For example, DistilBert's tokenizer would split the Twitter handle ``@huggingface`` into the tokens
``['@', 'hugging', '##face']``. This is a problem for us because we have exactly one tag per token. If the tokenizer
splits a token into multiple sub-tokens, then we will end up with a mismatch between our tokens and our labels.
One way to handle this is to only train on the tag labels for the first subtoken of a split token. We can do this in
🤗 Transformers by setting the labels we wish to ignore to ``-100``. In the example above, if the label for
``@HuggingFace`` is ``3`` (indexing ``B-corporation``), we would set the labels of ``['@', 'hugging', '##face']`` to
``[3, -100, -100]``.
Let's write a function to do this. This is where we will use the ``offset_mapping`` from the tokenizer as mentioned
above. For each sub-token returned by the tokenizer, the offset mapping gives us a tuple indicating the sub-token's
start position and end position relative to the original token it was split from. That means that if the first
position in the tuple is anything other than ``0``, we will set its corresponding label to ``-100``. While we're at
it, we can also set labels to ``-100`` if the second position of the offset mapping is ``0``, since this means it must
be a special token like ``[PAD]`` or ``[CLS]``.
.. note::
Due to a recently fixed bug, -1 must be used instead of -100 when using TensorFlow in 🤗 Transformers <= 3.02.
.. code-block:: python
import numpy as np
def encode_tags(tags, encodings):
labels = [[tag2id[tag] for tag in doc] for doc in tags]
encoded_labels = []
for doc_labels, doc_offset in zip(labels, encodings.offset_mapping):
# create an empty array of -100
doc_enc_labels = np.ones(len(doc_offset),dtype=int) * -100
arr_offset = np.array(doc_offset)
# set labels whose first offset position is 0 and the second is not 0
doc_enc_labels[(arr_offset[:,0] == 0) & (arr_offset[:,1] != 0)] = doc_labels
encoded_labels.append(doc_enc_labels.tolist())
return encoded_labels
train_labels = encode_tags(train_tags, train_encodings)
val_labels = encode_tags(val_tags, val_encodings)
The hard part is now done. Just as in the sequence classification example above, we can create a dataset object:
.. code-block:: python
## PYTORCH CODE
import torch
class WNUTDataset(torch.utils.data.Dataset):
def __init__(self, encodings, labels):
self.encodings = encodings
self.labels = labels
def __getitem__(self, idx):
item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
item['labels'] = torch.tensor(self.labels[idx])
return item
def __len__(self):
return len(self.labels)
train_encodings.pop("offset_mapping") # we don't want to pass this to the model
val_encodings.pop("offset_mapping")
train_dataset = WNUTDataset(train_encodings, train_labels)
val_dataset = WNUTDataset(val_encodings, val_labels)
## TENSORFLOW CODE
import tensorflow as tf
train_encodings.pop("offset_mapping") # we don't want to pass this to the model
val_encodings.pop("offset_mapping")
train_dataset = tf.data.Dataset.from_tensor_slices((
dict(train_encodings),
train_labels
))
val_dataset = tf.data.Dataset.from_tensor_slices((
dict(val_encodings),
val_labels
))
Now load in a token classification model and specify the number of labels:
.. code-block:: python
## PYTORCH CODE
from transformers import DistilBertForTokenClassification
model = DistilBertForTokenClassification.from_pretrained('distilbert-base-cased', num_labels=len(unique_tags))
## TENSORFLOW CODE
from transformers import TFDistilBertForTokenClassification
model = TFDistilBertForTokenClassification.from_pretrained('distilbert-base-cased', num_labels=len(unique_tags))
The data and model are both ready to go. You can train the model either with
:class:`~transformers.Trainer`/:class:`~transformers.TFTrainer` or with native PyTorch/TensorFlow, exactly as in the
sequence classification example above.
- :ref:`ft_trainer`
- :ref:`ft_native`
.. _qa_squad:
Question Answering with SQuAD 2.0
-----------------------------------------------------------------------------------------------------------------------
.. note::
This dataset can be explored in the Hugging Face model hub (`SQuAD V2 <https://huggingface.co/datasets/squad_v2>`_), and can
be alternatively downloaded with the 🤗 NLP library with ``load_dataset("squad_v2")``.
Question answering comes in many forms. In this example, we'll look at the particular type of extractive QA that
involves answering a question about a passage by highlighting the segment of the passage that answers the question.
This involves fine-tuning a model which predicts a start position and an end position in the passage. We will use the
`Stanford Question Answering Dataset (SQuAD) 2.0 <https://rajpurkar.github.io/SQuAD-explorer/>`_.
We will start by downloading the data:
.. code-block:: bash
mkdir squad
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json -O squad/train-v2.0.json
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json -O squad/dev-v2.0.json
Each split is in a structured json file with a number of questions and answers for each passage (or context). We'll
take this apart into parallel lists of contexts, questions, and answers (note that the contexts here are repeated
since there are multiple questions per context):
.. code-block:: python
import json
from pathlib import Path
def read_squad(path):
path = Path(path)
with open(path, 'rb') as f:
squad_dict = json.load(f)
contexts = []
questions = []
answers = []
for group in squad_dict['data']:
for passage in group['paragraphs']:
context = passage['context']
for qa in passage['qas']:
question = qa['question']
for answer in qa['answers']:
contexts.append(context)
questions.append(question)
answers.append(answer)
return contexts, questions, answers
train_contexts, train_questions, train_answers = read_squad('squad/train-v2.0.json')
val_contexts, val_questions, val_answers = read_squad('squad/dev-v2.0.json')
The contexts and questions are just strings. The answers are dicts containing the subsequence of the passage with
the correct answer as well as an integer indicating the character at which the answer begins. In order to train a
model on this data we need (1) the tokenized context/question pairs, and (2) integers indicating at which *token*
positions the answer begins and ends.
First, let's get the *character* position at which the answer ends in the passage (we are given the starting
position). Sometimes SQuAD answers are off by one or two characters, so we will also adjust for that.
.. code-block:: python
def add_end_idx(answers, contexts):
for answer, context in zip(answers, contexts):
gold_text = answer['text']
start_idx = answer['answer_start']
end_idx = start_idx + len(gold_text)
# sometimes squad answers are off by a character or two fix this
if context[start_idx:end_idx] == gold_text:
answer['answer_end'] = end_idx
elif context[start_idx-1:end_idx-1] == gold_text:
answer['answer_start'] = start_idx - 1
answer['answer_end'] = end_idx - 1 # When the gold label is off by one character
elif context[start_idx-2:end_idx-2] == gold_text:
answer['answer_start'] = start_idx - 2
answer['answer_end'] = end_idx - 2 # When the gold label is off by two characters
add_end_idx(train_answers, train_contexts)
add_end_idx(val_answers, val_contexts)
Now ``train_answers`` and ``val_answers`` include the character end positions and the corrected start positions.
Next, let's tokenize our context/question pairs. 🤗 Tokenizers can accept parallel lists of sequences and encode
them together as sequence pairs.
.. code-block:: python
from transformers import DistilBertTokenizerFast
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
train_encodings = tokenizer(train_contexts, train_questions, truncation=True, padding=True)
val_encodings = tokenizer(val_contexts, val_questions, truncation=True, padding=True)
Next we need to convert our character start/end positions to token start/end positions. When using 🤗 Fast
Tokenizers, we can use the built in :func:`~transformers.BatchEncoding.char_to_token` method.
.. code-block:: python
def add_token_positions(encodings, answers):
start_positions = []
end_positions = []
for i in range(len(answers)):
start_positions.append(encodings.char_to_token(i, answers[i]['answer_start']))
end_positions.append(encodings.char_to_token(i, answers[i]['answer_end'] - 1))
# if None, the answer passage has been truncated
if start_positions[-1] is None:
start_positions[-1] = tokenizer.model_max_length
if end_positions[-1] is None:
end_positions[-1] = tokenizer.model_max_length
encodings.update({'start_positions': start_positions, 'end_positions': end_positions})
add_token_positions(train_encodings, train_answers)
add_token_positions(val_encodings, val_answers)
Our data is ready. Let's just put it in a PyTorch/TensorFlow dataset so that we can easily use it for
training. In PyTorch, we define a custom ``Dataset`` class. In TensorFlow, we pass a tuple of
``(inputs_dict, labels_dict)`` to the ``from_tensor_slices`` method.
.. code-block:: python
## PYTORCH CODE
import torch
class SquadDataset(torch.utils.data.Dataset):
def __init__(self, encodings):
self.encodings = encodings
def __getitem__(self, idx):
return {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
def __len__(self):
return len(self.encodings.input_ids)
train_dataset = SquadDataset(train_encodings)
val_dataset = SquadDataset(val_encodings)
## TENSORFLOW CODE
import tensorflow as tf
train_dataset = tf.data.Dataset.from_tensor_slices((
{key: train_encodings[key] for key in ['input_ids', 'attention_mask']},
{key: train_encodings[key] for key in ['start_positions', 'end_positions']}
))
val_dataset = tf.data.Dataset.from_tensor_slices((
{key: val_encodings[key] for key in ['input_ids', 'attention_mask']},
{key: val_encodings[key] for key in ['start_positions', 'end_positions']}
))
Now we can use a DistilBert model with a QA head for training:
.. code-block:: python
## PYTORCH CODE
from transformers import DistilBertForQuestionAnswering
model = DistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased")
## TENSORFLOW CODE
from transformers import TFDistilBertForQuestionAnswering
model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased")
The data and model are both ready to go. You can train the model with
:class:`~transformers.Trainer`/:class:`~transformers.TFTrainer` exactly as in the sequence classification example
above. If using native PyTorch, replace ``labels`` with ``start_positions`` and ``end_positions`` in the training
example. If using Keras's ``fit``, we need to make a minor modification to handle this example since it involves
multiple model outputs.
- :ref:`ft_trainer`
.. code-block:: python
## PYTORCH CODE
from torch.utils.data import DataLoader
from transformers import AdamW
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)
model.train()
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
optim = AdamW(model.parameters(), lr=5e-5)
for epoch in range(3):
for batch in train_loader:
optim.zero_grad()
input_ids = batch['input_ids'].to(device)
attention_mask = batch['attention_mask'].to(device)
start_positions = batch['start_positions'].to(device)
end_positions = batch['end_positions'].to(device)
outputs = model(input_ids, attention_mask=attention_mask, start_positions=start_positions, end_positions=end_positions)
loss = outputs[0]
loss.backward()
optim.step()
model.eval()
## TENSORFLOW CODE
# Keras will expect a tuple when dealing with labels
train_dataset = train_dataset.map(lambda x, y: (x, (y['start_positions'], y['end_positions'])))
# Keras will assign a separate loss for each output and add them together. So we'll just use the standard CE loss
# instead of using the built-in model.compute_loss, which expects a dict of outputs and averages the two terms.
# Note that this means the loss will be 2x of when using TFTrainer since we're adding instead of averaging them.
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.distilbert.return_dict = False # if using 🤗 Transformers >3.02, make sure outputs are tuples
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=loss) # can also use any keras loss fn
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)
.. _resources:
Additional Resources
-----------------------------------------------------------------------------------------------------------------------
- `How to train a new language model from scratch using Transformers and Tokenizers
<https://huggingface.co/blog/how-to-train>`_. Blog post showing the steps to load in Esperanto data and train a
masked language model from scratch.
- :doc:`Preprocessing <preprocessing>`. Docs page on data preprocessing.
- :doc:`Training <training>`. Docs page on training and fine-tuning.
.. _nlplib:
Using the 🤗 NLP Datasets & Metrics library
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This tutorial demonstrates how to read in datasets from various raw text formats and prepare them for training with
🤗 Transformers so that you can do the same thing with your own custom datasets. However, we recommend users use the
`🤗 NLP library <https://github.com/huggingface/nlp>`_ for working with the 150+ datasets included in the
`hub <https://huggingface.co/datasets>`_, including the three datasets used in this tutorial. As a very brief overview,
we will show how to use the NLP library to download and prepare the IMDb dataset from the first example,
:ref:`seq_imdb`.
Start by downloading the dataset:
.. code-block:: python
from nlp import load_dataset
train = load_dataset("imdb", split="train")
Each dataset has multiple columns corresponding to different features. Let's see what our columns are.
.. code-block:: python
>>> print(train.column_names)
['label', 'text']
Great. Now let's tokenize the text. We can do this using the ``map`` method. We'll also rename the ``label`` column
to ``labels`` to match the model's input arguments.
.. code-block:: python
train = train.map(lambda batch: tokenizer(batch["text"], truncation=True, padding=True), batched=True)
train.rename_column_("label", "labels")
Lastly, we can use the ``set_format`` method to determine which columns and in what data format we want to access
dataset elements.
.. code-block:: python
## PYTORCH CODE
>>> train.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
>>> {key: val.shape for key, val in train[0].items()})
{'labels': torch.Size([]), 'input_ids': torch.Size([512]), 'attention_mask': torch.Size([512])}
## TENSORFLOW CODE
>>> train.set_format("tensorflow", columns=["input_ids", "attention_mask", "labels"])
>>> {key: val.shape for key, val in train[0].items()})
{'labels': TensorShape([]), 'input_ids': TensorShape([512]), 'attention_mask': TensorShape([512])}
We now have a fully-prepared dataset. Check out `the 🤗 NLP docs <https://huggingface.co/nlp/processing.html>`_ for
a more thorough introduction.

View File

@@ -1,649 +0,0 @@
# Examples
In this section a few examples are put together. All of these examples work for several models, making use of the very
similar API between the different models.
**Important**
To run the latest versions of the examples, you have to install from source and install some specific requirements for the examples.
Execute the following steps in a new virtual environment:
```bash
git clone https://github.com/huggingface/transformers
cd transformers
pip install .
pip install -r ./examples/requirements.txt
```
| Section | Description |
|----------------------------|------------------------------------------------------------------------------------------------------------------------------------------
| [TensorFlow 2.0 models on GLUE](#TensorFlow-2.0-Bert-models-on-GLUE) | Examples running BERT TensorFlow 2.0 model on the GLUE tasks. |
| [Running on TPUs](#running-on-tpus) | Examples on running fine-tuning tasks on Google TPUs to accelerate workloads. |
| [Language Model training](#language-model-training) | Fine-tuning (or training from scratch) the library models for language modeling on a text dataset. Causal language modeling for GPT/GPT-2, masked language modeling for BERT/RoBERTa. |
| [Language Generation](#language-generation) | Conditional text generation using the auto-regressive models of the library: GPT, GPT-2, Transformer-XL and XLNet. |
| [GLUE](#glue) | Examples running BERT/XLM/XLNet/RoBERTa on the 9 GLUE tasks. Examples feature distributed training as well as half-precision. |
| [SQuAD](#squad) | Using BERT/RoBERTa/XLNet/XLM for question answering, examples with distributed training. |
| [Multiple Choice](#multiple-choice) | Examples running BERT/XLNet/RoBERTa on the SWAG/RACE/ARC tasks. |
| [Named Entity Recognition](https://github.com/huggingface/transformers/tree/master/examples/token-classification) | Using BERT for Named Entity Recognition (NER) on the CoNLL 2003 dataset, examples with distributed training. |
| [XNLI](#xnli) | Examples running BERT/XLM on the XNLI benchmark. |
| [Adversarial evaluation of model performances](#adversarial-evaluation-of-model-performances) | Testing a model with adversarial evaluation of natural language inference on the Heuristic Analysis for NLI Systems (HANS) dataset (McCoy et al., 2019.) |
## TensorFlow 2.0 Bert models on GLUE
Based on the script [`run_tf_glue.py`](https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_tf_glue.py).
Fine-tuning the library TensorFlow 2.0 Bert model for sequence classification on the MRPC task of the GLUE benchmark: [General Language Understanding Evaluation](https://gluebenchmark.com/).
This script has an option for mixed precision (Automatic Mixed Precision / AMP) to run models on Tensor Cores (NVIDIA Volta/Turing GPUs) and future hardware and an option for XLA, which uses the XLA compiler to reduce model runtime.
Options are toggled using `USE_XLA` or `USE_AMP` variables in the script.
These options and the below benchmark are provided by @tlkh.
Quick benchmarks from the script (no other modifications):
| GPU | Mode | Time (2nd epoch) | Val Acc (3 runs) |
| --------- | -------- | ----------------------- | ----------------------|
| Titan V | FP32 | 41s | 0.8438/0.8281/0.8333 |
| Titan V | AMP | 26s | 0.8281/0.8568/0.8411 |
| V100 | FP32 | 35s | 0.8646/0.8359/0.8464 |
| V100 | AMP | 22s | 0.8646/0.8385/0.8411 |
| 1080 Ti | FP32 | 55s | - |
Mixed precision (AMP) reduces the training time considerably for the same hardware and hyper-parameters (same batch size was used).
## Running on TPUs
You can accelerate your workloads on Google's TPUs. For information on how to setup your TPU environment refer to this
[README](https://github.com/pytorch/xla/blob/master/README.md).
The following are some examples of running the `*_tpu.py` finetuning scripts on TPUs. All steps for data preparation are
identical to your normal GPU + Huggingface setup.
### GLUE
Before running anyone of these GLUE tasks you should download the
[GLUE data](https://gluebenchmark.com/tasks) by running
[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
and unpack it to some directory `$GLUE_DIR`.
For running your GLUE task on MNLI dataset you can run something like the following:
```
export XRT_TPU_CONFIG="tpu_worker;0;$TPU_IP_ADDRESS:8470"
export GLUE_DIR=/path/to/glue
export TASK_NAME=MNLI
python run_glue_tpu.py \
--model_type bert \
--model_name_or_path bert-base-cased \
--task_name $TASK_NAME \
--do_train \
--do_eval \
--data_dir $GLUE_DIR/$TASK_NAME \
--max_seq_length 128 \
--train_batch_size 32 \
--learning_rate 3e-5 \
--num_train_epochs 3.0 \
--output_dir /tmp/$TASK_NAME \
--overwrite_output_dir \
--logging_steps 50 \
--save_steps 200 \
--num_cores=8 \
--only_log_master
```
## Language model training
Based on the script [`run_language_modeling.py`](https://github.com/huggingface/transformers/blob/master/examples/language-modeling/run_language_modeling.py).
Fine-tuning (or training from scratch) the library models for language modeling on a text dataset for GPT, GPT-2, BERT and RoBERTa (DistilBERT
to be added soon). GPT and GPT-2 are fine-tuned using a causal language modeling (CLM) loss while BERT and RoBERTa
are fine-tuned using a masked language modeling (MLM) loss.
Before running the following example, you should get a file that contains text on which the language model will be
trained or fine-tuned. A good example of such text is the [WikiText-2 dataset](https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/).
We will refer to two different files: `$TRAIN_FILE`, which contains text for training, and `$TEST_FILE`, which contains
text that will be used for evaluation.
### GPT-2/GPT and causal language modeling
The following example fine-tunes GPT-2 on WikiText-2. We're using the raw WikiText-2 (no tokens were replaced before
the tokenization). The loss here is that of causal language modeling.
```bash
export TRAIN_FILE=/path/to/dataset/wiki.train.raw
export TEST_FILE=/path/to/dataset/wiki.test.raw
python run_language_modeling.py \
--output_dir=output \
--model_type=gpt2 \
--model_name_or_path=gpt2 \
--do_train \
--train_data_file=$TRAIN_FILE \
--do_eval \
--eval_data_file=$TEST_FILE
```
This takes about half an hour to train on a single K80 GPU and about one minute for the evaluation to run. It reaches
a score of ~20 perplexity once fine-tuned on the dataset.
### RoBERTa/BERT and masked language modeling
The following example fine-tunes RoBERTa on WikiText-2. Here too, we're using the raw WikiText-2. The loss is different
as BERT/RoBERTa have a bidirectional mechanism; we're therefore using the same loss that was used during their
pre-training: masked language modeling.
In accordance to the RoBERTa paper, we use dynamic masking rather than static masking. The model may, therefore, converge
slightly slower (over-fitting takes more epochs).
We use the `--mlm` flag so that the script may change its loss function.
```bash
export TRAIN_FILE=/path/to/dataset/wiki.train.raw
export TEST_FILE=/path/to/dataset/wiki.test.raw
python run_language_modeling.py \
--output_dir=output \
--model_type=roberta \
--model_name_or_path=roberta-base \
--do_train \
--train_data_file=$TRAIN_FILE \
--do_eval \
--eval_data_file=$TEST_FILE \
--mlm
```
## Language generation
Based on the script [`run_generation.py`](https://github.com/huggingface/transformers/blob/master/examples/text-generation/run_generation.py).
Conditional text generation using the auto-regressive models of the library: GPT, GPT-2, Transformer-XL, XLNet, CTRL.
A similar script is used for our official demo [Write With Transfomer](https://transformer.huggingface.co), where you
can try out the different models available in the library.
Example usage:
```bash
python run_generation.py \
--model_type=gpt2 \
--model_name_or_path=gpt2
```
## GLUE
Based on the script [`run_glue.py`](https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_glue.py).
Fine-tuning the library models for sequence classification on the GLUE benchmark: [General Language Understanding
Evaluation](https://gluebenchmark.com/). This script can fine-tune the following models: BERT, XLM, XLNet and RoBERTa.
GLUE is made up of a total of 9 different tasks. We get the following results on the dev set of the benchmark with an
uncased BERT base model (the checkpoint `bert-base-uncased`). All experiments ran single V100 GPUs with a total train
batch sizes between 16 and 64. Some of these tasks have a small dataset and training can lead to high variance in the results
between different runs. We report the median on 5 runs (with different seeds) for each of the metrics.
| Task | Metric | Result |
|-------|------------------------------|-------------|
| CoLA | Matthew's corr | 49.23 |
| SST-2 | Accuracy | 91.97 |
| MRPC | F1/Accuracy | 89.47/85.29 |
| STS-B | Person/Spearman corr. | 83.95/83.70 |
| QQP | Accuracy/F1 | 88.40/84.31 |
| MNLI | Matched acc./Mismatched acc. | 80.61/81.08 |
| QNLI | Accuracy | 87.46 |
| RTE | Accuracy | 61.73 |
| WNLI | Accuracy | 45.07 |
Some of these results are significantly different from the ones reported on the test set
of GLUE benchmark on the website. For QQP and WNLI, please refer to [FAQ #12](https://gluebenchmark.com/faq) on the webite.
Before running any one of these GLUE tasks you should download the
[GLUE data](https://gluebenchmark.com/tasks) by running
[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
and unpack it to some directory `$GLUE_DIR`.
```bash
export GLUE_DIR=/path/to/glue
export TASK_NAME=MRPC
python run_glue.py \
--model_type bert \
--model_name_or_path bert-base-cased \
--task_name $TASK_NAME \
--do_train \
--do_eval \
--data_dir $GLUE_DIR/$TASK_NAME \
--max_seq_length 128 \
--per_gpu_train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir /tmp/$TASK_NAME/
```
where task name can be one of CoLA, SST-2, MRPC, STS-B, QQP, MNLI, QNLI, RTE, WNLI.
The dev set results will be present within the text file `eval_results.txt` in the specified output_dir.
In case of MNLI, since there are two separate dev sets (matched and mismatched), there will be a separate
output folder called `/tmp/MNLI-MM/` in addition to `/tmp/MNLI/`.
The code has not been tested with half-precision training with apex on any GLUE task apart from MRPC, MNLI,
CoLA, SST-2. The following section provides details on how to run half-precision training with MRPC. With that being
said, there shouldnt be any issues in running half-precision training with the remaining GLUE tasks as well,
since the data processor for each task inherits from the base class DataProcessor.
### MRPC
#### Fine-tuning example
The following examples fine-tune BERT on the Microsoft Research Paraphrase Corpus (MRPC) corpus and runs in less
than 10 minutes on a single K-80 and in 27 seconds (!) on single tesla V100 16GB with apex installed.
Before running any one of these GLUE tasks you should download the
[GLUE data](https://gluebenchmark.com/tasks) by running
[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
and unpack it to some directory `$GLUE_DIR`.
```bash
export GLUE_DIR=/path/to/glue
python run_glue.py \
--model_name_or_path bert-base-cased \
--task_name MRPC \
--do_train \
--do_eval \
--data_dir $GLUE_DIR/MRPC/ \
--max_seq_length 128 \
--per_gpu_train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir /tmp/mrpc_output/
```
Our test ran on a few seeds with [the original implementation hyper-
parameters](https://github.com/google-research/bert#sentence-and-sentence-pair-classification-tasks) gave evaluation
results between 84% and 88%.
#### Using Apex and mixed-precision
Using Apex and 16 bit precision, the fine-tuning on MRPC only takes 27 seconds. First install
[apex](https://github.com/NVIDIA/apex), then run the following example:
```bash
export GLUE_DIR=/path/to/glue
python run_glue.py \
--model_name_or_path bert-base-cased \
--task_name MRPC \
--do_train \
--do_eval \
--data_dir $GLUE_DIR/MRPC/ \
--max_seq_length 128 \
--per_gpu_train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir /tmp/mrpc_output/ \
--fp16
```
#### Distributed training
Here is an example using distributed training on 8 V100 GPUs. The model used is the BERT whole-word-masking and it
reaches F1 > 92 on MRPC.
```bash
export GLUE_DIR=/path/to/glue
python -m torch.distributed.launch \
--nproc_per_node 8 run_glue.py \
--model_name_or_path bert-base-cased \
--task_name MRPC \
--do_train \
--do_eval \
--data_dir $GLUE_DIR/MRPC/ \
--max_seq_length 128 \
--per_gpu_train_batch_size 8 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir /tmp/mrpc_output/
```
Training with these hyper-parameters gave us the following results:
```bash
acc = 0.8823529411764706
acc_and_f1 = 0.901702786377709
eval_loss = 0.3418912578906332
f1 = 0.9210526315789473
global_step = 174
loss = 0.07231863956341798
```
### MNLI
The following example uses the BERT-large, uncased, whole-word-masking model and fine-tunes it on the MNLI task.
```bash
export GLUE_DIR=/path/to/glue
python -m torch.distributed.launch \
--nproc_per_node 8 run_glue.py \
--model_name_or_path bert-base-cased \
--task_name mnli \
--do_train \
--do_eval \
--data_dir $GLUE_DIR/MNLI/ \
--max_seq_length 128 \
--per_gpu_train_batch_size 8 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir output_dir \
```
The results are the following:
```bash
***** Eval results *****
acc = 0.8679706601466992
eval_loss = 0.4911287787382479
global_step = 18408
loss = 0.04755385363816904
***** Eval results *****
acc = 0.8747965825874695
eval_loss = 0.45516540421714036
global_step = 18408
loss = 0.04755385363816904
```
## Multiple Choice
Based on the script [`run_multiple_choice.py`]().
#### Fine-tuning on SWAG
Download [swag](https://github.com/rowanz/swagaf/tree/master/data) data
```bash
#training on 4 tesla V100(16GB) GPUS
export SWAG_DIR=/path/to/swag_data_dir
python ./examples/multiple-choice/run_multiple_choice.py \
--task_name swag \
--model_name_or_path roberta-base \
--do_train \
--do_eval \
--data_dir $SWAG_DIR \
--learning_rate 5e-5 \
--num_train_epochs 3 \
--max_seq_length 80 \
--output_dir models_bert/swag_base \
--per_gpu_eval_batch_size=16 \
--per_gpu_train_batch_size=16 \
--gradient_accumulation_steps 2 \
--overwrite_output
```
Training with the defined hyper-parameters yields the following results:
```
***** Eval results *****
eval_acc = 0.8338998300509847
eval_loss = 0.44457291918821606
```
## SQuAD
Based on the script [`run_squad.py`](https://github.com/huggingface/transformers/blob/master/examples/question-answering/run_squad.py).
#### Fine-tuning BERT on SQuAD1.0
This example code fine-tunes BERT on the SQuAD1.0 dataset. It runs in 24 min (with BERT-base) or 68 min (with BERT-large)
on a single tesla V100 16GB. The data for SQuAD can be downloaded with the following links and should be saved in a
$SQUAD_DIR directory.
* [train-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json)
* [dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json)
* [evaluate-v1.1.py](https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py)
And for SQuAD2.0, you need to download:
- [train-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json)
- [dev-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json)
- [evaluate-v2.0.py](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/)
```bash
export SQUAD_DIR=/path/to/SQUAD
python run_squad.py \
--model_type bert \
--model_name_or_path bert-base-uncased \
--do_train \
--do_eval \
--train_file $SQUAD_DIR/train-v1.1.json \
--predict_file $SQUAD_DIR/dev-v1.1.json \
--per_gpu_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 2.0 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /tmp/debug_squad/
```
Training with the previously defined hyper-parameters yields the following results:
```bash
f1 = 88.52
exact_match = 81.22
```
#### Distributed training
Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD1.1:
```bash
python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_squad.py \
--model_type bert \
--model_name_or_path bert-large-uncased-whole-word-masking \
--do_train \
--do_eval \
--train_file $SQUAD_DIR/train-v1.1.json \
--predict_file $SQUAD_DIR/dev-v1.1.json \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ./examples/models/wwm_uncased_finetuned_squad/ \
--per_gpu_eval_batch_size=3 \
--per_gpu_train_batch_size=3 \
```
Training with the previously defined hyper-parameters yields the following results:
```bash
f1 = 93.15
exact_match = 86.91
```
This fine-tuned model is available as a checkpoint under the reference
`bert-large-uncased-whole-word-masking-finetuned-squad`.
#### Fine-tuning XLNet on SQuAD
This example code fine-tunes XLNet on both SQuAD1.0 and SQuAD2.0 dataset. See above to download the data for SQuAD .
##### Command for SQuAD1.0:
```bash
export SQUAD_DIR=/path/to/SQUAD
python run_squad.py \
--model_type xlnet \
--model_name_or_path xlnet-large-cased \
--do_train \
--do_eval \
--train_file $SQUAD_DIR/train-v1.1.json \
--predict_file $SQUAD_DIR/dev-v1.1.json \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ./wwm_cased_finetuned_squad/ \
--per_gpu_eval_batch_size=4 \
--per_gpu_train_batch_size=4 \
--save_steps 5000
```
##### Command for SQuAD2.0:
```bash
export SQUAD_DIR=/path/to/SQUAD
python run_squad.py \
--model_type xlnet \
--model_name_or_path xlnet-large-cased \
--do_train \
--do_eval \
--version_2_with_negative \
--train_file $SQUAD_DIR/train-v2.0.json \
--predict_file $SQUAD_DIR/dev-v2.0.json \
--learning_rate 3e-5 \
--num_train_epochs 4 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ./wwm_cased_finetuned_squad/ \
--per_gpu_eval_batch_size=2 \
--per_gpu_train_batch_size=2 \
--save_steps 5000
```
Larger batch size may improve the performance while costing more memory.
##### Results for SQuAD1.0 with the previously defined hyper-parameters:
```python
{
"exact": 85.45884578997162,
"f1": 92.5974600601065,
"total": 10570,
"HasAns_exact": 85.45884578997162,
"HasAns_f1": 92.59746006010651,
"HasAns_total": 10570
}
```
##### Results for SQuAD2.0 with the previously defined hyper-parameters:
```python
{
"exact": 80.4177545691906,
"f1": 84.07154997729623,
"total": 11873,
"HasAns_exact": 76.73751686909581,
"HasAns_f1": 84.05558584352873,
"HasAns_total": 5928,
"NoAns_exact": 84.0874684608915,
"NoAns_f1": 84.0874684608915,
"NoAns_total": 5945
}
```
## XNLI
Based on the script [`run_xnli.py`](https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_xnli.py).
[XNLI](https://www.nyu.edu/projects/bowman/xnli/) is crowd-sourced dataset based on [MultiNLI](http://www.nyu.edu/projects/bowman/multinli/). It is an evaluation benchmark for cross-lingual text representations. Pairs of text are labeled with textual entailment annotations for 15 different languages (including both high-resource language such as English and low-resource languages such as Swahili).
#### Fine-tuning on XNLI
This example code fine-tunes mBERT (multi-lingual BERT) on the XNLI dataset. It runs in 106 mins
on a single tesla V100 16GB. The data for XNLI can be downloaded with the following links and should be both saved (and un-zipped) in a
`$XNLI_DIR` directory.
* [XNLI 1.0](https://www.nyu.edu/projects/bowman/xnli/XNLI-1.0.zip)
* [XNLI-MT 1.0](https://www.nyu.edu/projects/bowman/xnli/XNLI-MT-1.0.zip)
```bash
export XNLI_DIR=/path/to/XNLI
python run_xnli.py \
--model_type bert \
--model_name_or_path bert-base-multilingual-cased \
--language de \
--train_language en \
--do_train \
--do_eval \
--data_dir $XNLI_DIR \
--per_gpu_train_batch_size 32 \
--learning_rate 5e-5 \
--num_train_epochs 2.0 \
--max_seq_length 128 \
--output_dir /tmp/debug_xnli/ \
--save_steps -1
```
Training with the previously defined hyper-parameters yields the following results on the **test** set:
```bash
acc = 0.7093812375249501
```
## MM-IMDb
Based on the script [`run_mmimdb.py`](https://github.com/huggingface/transformers/blob/master/examples/contrib/mm-imdb/run_mmimdb.py).
[MM-IMDb](http://lisi1.unal.edu.co/mmimdb/) is a Multimodal dataset with around 26,000 movies including images, plots and other metadata.
### Training on MM-IMDb
```
python run_mmimdb.py \
--data_dir /path/to/mmimdb/dataset/ \
--model_type bert \
--model_name_or_path bert-base-uncased \
--output_dir /path/to/save/dir/ \
--do_train \
--do_eval \
--max_seq_len 512 \
--gradient_accumulation_steps 20 \
--num_image_embeds 3 \
--num_train_epochs 100 \
--patience 5
```
## Adversarial evaluation of model performances
Here is an example on evaluating a model using adversarial evaluation of natural language inference with the Heuristic Analysis for NLI Systems (HANS) dataset [McCoy et al., 2019](https://arxiv.org/abs/1902.01007). The example was gracefully provided by [Nafise Sadat Moosavi](https://github.com/ns-moosavi).
The HANS dataset can be downloaded from [this location](https://github.com/tommccoy1/hans).
This is an example of using test_hans.py:
```bash
export HANS_DIR=path-to-hans
export MODEL_TYPE=type-of-the-model-e.g.-bert-roberta-xlnet-etc
export MODEL_PATH=path-to-the-model-directory-that-is-trained-on-NLI-e.g.-by-using-run_glue.py
python examples/hans/test_hans.py \
--task_name hans \
--model_type $MODEL_TYPE \
--do_eval \
--data_dir $HANS_DIR \
--model_name_or_path $MODEL_PATH \
--max_seq_length 128 \
--output_dir $MODEL_PATH \
```
This will create the hans_predictions.txt file in MODEL_PATH, which can then be evaluated using hans/evaluate_heur_output.py from the HANS dataset.
The results of the BERT-base model that is trained on MNLI using batch size 8 and the random seed 42 on the HANS dataset is as follows:
```bash
Heuristic entailed results:
lexical_overlap: 0.9702
subsequence: 0.9942
constituent: 0.9962
Heuristic non-entailed results:
lexical_overlap: 0.199
subsequence: 0.0396
constituent: 0.118
```

1
docs/source/examples.md Symbolic link
View File

@@ -0,0 +1 @@
../../examples/README.md

View File

@@ -1,11 +1,41 @@
Glossary
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
General terms
-----------------------------------------------------------------------------------------------------------------------
- autoencoding models: see MLM
- autoregressive models: see CLM
- CLM: causal language modeling, a pretraining task where the model reads the texts in order and has to predict the
next word. It's usually done by reading the whole sentence but using a mask inside the model to hide the future
tokens at a certain timestep.
- MLM: masked language modeling, a pretraining task where the model sees a corrupted version of the texts, usually done
by masking some tokens randomly, and has to predict the original text.
- multimodal: a task that combines texts with another kind of inputs (for instance images).
- NLG: natural language generation, all tasks related to generating text ( for instance talk with transformers,
translation)
- NLP: natural language processing, a generic way to say "deal with texts".
- NLU: natural language understanding, all tasks related to understanding what is in a text (for instance classifying
the whole text, individual words)
- pretrained model: a model that has been pretrained on some data (for instance all of Wikipedia). Pretraining methods
involve a self-supervised objective, which can be reading the text and trying to predict the next word (see CLM) or
masking some words and trying to predict them (see MLM).
- RNN: recurrent neural network, a type of model that uses a loop over a layer to process texts.
- seq2seq or sequence-to-sequence: models that generate a new sequence from an input, like translation models, or
summarization models (such as :doc:`Bart </model_doc/bart>` or :doc:`T5 </model_doc/t5>`).
- token: a part of a sentence, usually a word, but can also be a subword (non-common words are often split in subwords)
or a punctuation symbol.
Model inputs
-----------------------------------------------------------------------------------------------------------------------
Every model is different yet bears similarities with the others. Therefore most models use the same inputs, which are
detailed here alongside usage examples.
.. _input-ids:
Input IDs
--------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The input ids are often the only required parameters to be passed to the model as input. *They are token indices,
numerical representations of tokens building the sequences that will be used as input by the model*.
@@ -13,144 +43,245 @@ numerical representations of tokens building the sequences that will be used as
Each tokenizer works differently but the underlying mechanism remains the same. Here's an example using the BERT
tokenizer, which is a `WordPiece <https://arxiv.org/pdf/1609.08144.pdf>`__ tokenizer:
::
.. code-block::
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
sequence = "A Titan RTX has 24GB of VRAM"
>>> sequence = "A Titan RTX has 24GB of VRAM"
The tokenizer takes care of splitting the sequence into tokens available in the tokenizer vocabulary.
::
.. code-block::
# Continuation of the previous script
tokenized_sequence = tokenizer.tokenize(sequence)
assert tokenized_sequence == ['A', 'Titan', 'R', '##T', '##X', 'has', '24', '##GB', 'of', 'V', '##RA', '##M']
>>> tokenized_sequence = tokenizer.tokenize(sequence)
These tokens can then be converted into IDs which are understandable by the model. Several methods are available for
this, the recommended being `encode` or `encode_plus`, which leverage the Rust implementation of
The tokens are either words or subwords. Here for instance, "VRAM" wasn't in the model vocabulary, so it's been split
in "V", "RA" and "M". To indicate those tokens are not separate words but parts of the same word, a double-hash prefix is
added for "RA" and "M":
.. code-block::
>>> print(tokenized_sequence)
['A', 'Titan', 'R', '##T', '##X', 'has', '24', '##GB', 'of', 'V', '##RA', '##M']
These tokens can then be converted into IDs which are understandable by the model. This can be done by directly feeding
the sentence to the tokenizer, which leverages the Rust implementation of
`huggingface/tokenizers <https://github.com/huggingface/tokenizers>`__ for peak performance.
::
.. code-block::
# Continuation of the previous script
encoded_sequence = tokenizer.encode(sequence)
assert encoded_sequence == [101, 138, 18696, 155, 1942, 3190, 1144, 1572, 13745, 1104, 159, 9664, 2107, 102]
>>> inputs = tokenizer(sequence)
The `encode` and `encode_plus` methods automatically add "special tokens" which are special IDs the model uses.
The tokenizer returns a dictionary with all the arguments necessary for its corresponding model to work properly. The
token indices are under the key "input_ids":
.. code-block::
>>> encoded_sequence = inputs["input_ids"]
>>> print(encoded_sequence)
[101, 138, 18696, 155, 1942, 3190, 1144, 1572, 13745, 1104, 159, 9664, 2107, 102]
Note that the tokenizer automatically adds "special tokens" (if the associated model relies on them) which are special
IDs the model sometimes uses.
If we decode the previous sequence of ids,
.. code-block::
>>> decoded_sequence = tokenizer.decode(encoded_sequence)
we will see
.. code-block::
>>> print(decoded_sequence)
[CLS] A Titan RTX has 24GB of VRAM [SEP]
because this is the way a :class:`~transformers.BertModel` is going to expect its inputs.
.. _attention-mask:
Attention mask
--------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The attention mask is an optional argument used when batching sequences together. This argument indicates to the
model which tokens should be attended to, and which should not.
For example, consider these two sequences:
::
.. code-block::
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
sequence_a = "This is a short sequence."
sequence_b = "This is a rather long sequence. It is at least longer than the sequence A."
>>> sequence_a = "This is a short sequence."
>>> sequence_b = "This is a rather long sequence. It is at least longer than the sequence A."
encoded_sequence_a = tokenizer.encode(sequence_a)
assert len(encoded_sequence_a) == 8
>>> encoded_sequence_a = tokenizer(sequence_a)["input_ids"]
>>> encoded_sequence_b = tokenizer(sequence_b)["input_ids"]
encoded_sequence_b = tokenizer.encode(sequence_b)
assert len(encoded_sequence_b) == 19
The encoded versions have different lengths:
These two sequences have different lengths and therefore can't be put together in a same tensor as-is. The first
sequence needs to be padded up to the length of the second one, or the second one needs to be truncated down to
the length of the first one.
.. code-block::
In the first case, the list of IDs will be extended by the padding indices:
>>> len(encoded_sequence_a), len(encoded_sequence_b)
(8, 19)
::
Therefore, we can't put them together in the same tensor as-is. The first sequence needs to be padded up to the length
of the second one, or the second one needs to be truncated down to the length of the first one.
# Continuation of the previous script
padded_sequence_a = tokenizer.encode(sequence_a, max_length=19, pad_to_max_length=True)
In the first case, the list of IDs will be extended by the padding indices. We can pass a list to the tokenizer and ask
it to pad like this:
assert padded_sequence_a == [101, 1188, 1110, 170, 1603, 4954, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
assert encoded_sequence_b == [101, 1188, 1110, 170, 1897, 1263, 4954, 119, 1135, 1110, 1120, 1655, 2039, 1190, 1103, 4954, 138, 119, 102]
.. code-block::
These can then be converted into a tensor in PyTorch or TensorFlow. The attention mask is a binary tensor indicating
>>> padded_sequences = tokenizer([sequence_a, sequence_b], padding=True)
We can see that 0s have been added on the right of the first sentence to make it the same length as the second one:
.. code-block::
>>> padded_sequences["input_ids"]
[[101, 1188, 1110, 170, 1603, 4954, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [101, 1188, 1110, 170, 1897, 1263, 4954, 119, 1135, 1110, 1120, 1655, 2039, 1190, 1103, 4954, 138, 119, 102]]
This can then be converted into a tensor in PyTorch or TensorFlow. The attention mask is a binary tensor indicating
the position of the padded indices so that the model does not attend to them. For the
:class:`~transformers.BertTokenizer`, :obj:`1` indicate a value that should be attended to while :obj:`0` indicate
a padded value.
:class:`~transformers.BertTokenizer`, :obj:`1` indicates a value that should be attended to, while :obj:`0` indicates
a padded value. This attention mask is in the dictionary returned by the tokenizer under the key "attention_mask":
The method :func:`~transformers.PreTrainedTokenizer.encode_plus` may be used to obtain the attention mask directly:
.. code-block::
::
# Continuation of the previous script
sequence_a_dict = tokenizer.encode_plus(sequence_a, max_length=19, pad_to_max_length=True)
assert sequence_a_dict['input_ids'] == [101, 1188, 1110, 170, 1603, 4954, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
assert sequence_a_dict['attention_mask'] == [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> padded_sequences["attention_mask"]
[[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
.. _token-type-ids:
Token Type IDs
--------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some models' purpose is to do sequence classification or question answering. These require two different sequences to
be encoded in the same input IDs. They are usually separated by special tokens, such as the classifier and separator
be joined in a single "input_ids" entry, which usually is performed with the help of special tokens, such as the classifier (``[CLS]``) and separator (``[SEP]``)
tokens. For example, the BERT model builds its two sequence input as such:
::
.. code-block::
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> # [CLS] SEQUENCE_A [SEP] SEQUENCE_B [SEP]
# [CLS] SEQ_A [SEP] SEQ_B [SEP]
We can use our tokenizer to automatically generate such a sentence by passing the two sequences to ``tokenizer`` as two arguments (and
not a list, like before) like this:
sequence_a = "HuggingFace is based in NYC"
sequence_b = "Where is HuggingFace based?"
.. code-block::
encoded_sequence = tokenizer.encode(sequence_a, sequence_b)
assert tokenizer.decode(encoded_sequence) == "[CLS] HuggingFace is based in NYC [SEP] Where is HuggingFace based? [SEP]"
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> sequence_a = "HuggingFace is based in NYC"
>>> sequence_b = "Where is HuggingFace based?"
This is enough for some models to understand where one sequence ends and where another begins. However, other models
such as BERT have an additional mechanism, which are the segment IDs. The Token Type IDs are a binary mask identifying
the different sequences in the model.
>>> encoded_dict = tokenizer(sequence_a, sequence_b)
>>> decoded = tokenizer.decode(encoded_dict["input_ids"])
We can leverage :func:`~transformers.PreTrainedTokenizer.encode_plus` to output the Token Type IDs for us:
which will return:
::
.. code-block::
# Continuation of the previous script
encoded_dict = tokenizer.encode_plus(sequence_a, sequence_b)
>>> print(decoded)
[CLS] HuggingFace is based in NYC [SEP] Where is HuggingFace based? [SEP]
assert encoded_dict['input_ids'] == [101, 20164, 10932, 2271, 7954, 1110, 1359, 1107, 17520, 102, 2777, 1110, 20164, 10932, 2271, 7954, 1359, 136, 102]
assert encoded_dict['token_type_ids'] == [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]
This is enough for some models to understand where one sequence ends and where another begins. However, other models,
such as BERT, also deploy token type IDs (also called segment IDs). They are represented as a binary
mask identifying the two types of sequence in the model.
The first sequence, the "context" used for the question, has all its tokens represented by :obj:`0`, whereas the
question has all its tokens represented by :obj:`1`. Some models, like :class:`~transformers.XLNetModel` use an
additional token represented by a :obj:`2`.
The tokenizer returns this mask as the "token_type_ids" entry:
.. code-block::
>>> encoded_dict['token_type_ids']
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]
The first sequence, the "context" used for the question, has all its tokens represented by a :obj:`0`, whereas the
second sequence, corresponding to the "question", has all its tokens represented by a :obj:`1`.
Some models, like :class:`~transformers.XLNetModel` use an additional token represented by a :obj:`2`.
.. _position-ids:
Position IDs
--------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The position IDs are used by the model to identify which token is at which position. Contrary to RNNs that have the
position of each token embedded within them, transformers are unaware of the position of each token. The position
IDs are created for this purpose.
Contrary to RNNs that have the position of each token embedded within them,
transformers are unaware of the position of each token. Therefore, the position IDs (``position_ids``) are used by the model to identify each token's position in the list of tokens.
They are an optional parameter. If no position IDs are passed to the model, they are automatically created as absolute
They are an optional parameter. If no ``position_ids`` is passed to the model, the IDs are automatically created as absolute
positional embeddings.
Absolute positional embeddings are selected in the range ``[0, config.max_position_embeddings - 1]``. Some models
use other types of positional embeddings, such as sinusoidal position embeddings or relative position embeddings.
.. _labels:
Labels
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The labels are an optional argument which can be passed in order for the model to compute the loss itself. These labels
should be the expected prediction of the model: it will use the standard loss in order to compute the loss between
its predictions and the expected value (the label).
These labels are different according to the model head, for example:
- For sequence classification models (e.g., :class:`~transformers.BertForSequenceClassification`), the model expects
a tensor of dimension :obj:`(batch_size)` with each value of the batch corresponding to the expected label of the
entire sequence.
- For token classification models (e.g., :class:`~transformers.BertForTokenClassification`), the model expects
a tensor of dimension :obj:`(batch_size, seq_length)` with each value corresponding to the expected label of each
individual token.
- For masked language modeling (e.g., :class:`~transformers.BertForMaskedLM`), the model expects
a tensor of dimension :obj:`(batch_size, seq_length)` with each value corresponding to the expected label of each
individual token: the labels being the token ID for the masked token, and values to be ignored for the rest (usually
-100).
- For sequence to sequence tasks,(e.g., :class:`~transformers.BartForConditionalGeneration`,
:class:`~transformers.MBartForConditionalGeneration`), the model expects a tensor of dimension
:obj:`(batch_size, tgt_seq_length)` with each value corresponding to the target sequences associated with each
input sequence. During training, both `BART` and `T5` will make the appropriate `decoder_input_ids` and decoder
attention masks internally. They usually do not need to be supplied. This does not apply to models leveraging the
Encoder-Decoder framework.
See the documentation of each model for more information on each specific model's labels.
The base models (e.g., :class:`~transformers.BertModel`) do not accept labels, as these are the base transformer models,
simply outputting features.
.. _decoder-input-ids:
Decoder input IDs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This input is specific to encoder-decoder models, and contains the input IDs that will be fed to the decoder.
These inputs should be used for sequence to sequence tasks, such as translation or summarization, and are usually
built in a way specific to each model.
Most encoder-decoder models (BART, T5) create their :obj:`decoder_input_ids` on their own from the :obj:`labels`.
In such models, passing the :obj:`labels` is the preferred way to handle training.
Please check each model's docs to see how they handle these input IDs for sequence to sequence training.
.. _feed-forward-chunking:
Feed Forward Chunking
--------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In transformers two feed forward layers usually follows the self attention layer in each residual attention block. The intermediate embedding size of the feed forward layers is often bigger than the hidden size of the model (*e.g.* for ``bert-base-uncased``).
In each residual attention block in transformers the self-attention layer is usually followed by 2 feed forward layers.
The intermediate embedding size of the feed forward layers is often bigger than the hidden size of the model (e.g.,
for ``bert-base-uncased``).
For an input of size ``[batch_size, sequence_length]``, the memory required to store the intermediate feed forward embeddings ``[batch_size, sequence_length, config.intermediate_size]`` can account for a large fraction of the memory use. The authors of `Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451>`_ noticed that since the computation is independent of the ``sequence_length`` dimension, it is mathematically equivalent to compute the output embeddings of both feed forward layers ``[batch_size, config.hidden_size]_0, ..., [batch_size, config.hidden_size]_n`` individually and concat them afterward to ``[batch_size, sequence_length, config.hidden_size]`` with ``n = sequence_length``, which trades increased computation time against reduced memory use, but yields a mathematically **equivalent** result.
For an input of size ``[batch_size, sequence_length]``, the memory required to store the intermediate feed forward
embeddings ``[batch_size, sequence_length, config.intermediate_size]`` can account for a large fraction of the memory
use. The authors of `Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451>`_ noticed that since the
computation is independent of the ``sequence_length`` dimension, it is mathematically equivalent to compute the output
embeddings of both feed forward layers ``[batch_size, config.hidden_size]_0, ..., [batch_size, config.hidden_size]_n``
individually and concat them afterward to ``[batch_size, sequence_length, config.hidden_size]`` with
``n = sequence_length``, which trades increased computation time against reduced memory use, but yields a
mathematically **equivalent** result.
For models employing the function :func:`~.transformers.apply_chunking_to_forward`, the ``chunk_size`` defines the number of output embeddings that are computed in parallel and thus defines the trade-off between memory and time complexity.
If ``chunk_size`` is set to 0, no feed forward chunking is done.
For models employing the function :func:`~.transformers.apply_chunking_to_forward`, the ``chunk_size`` defines the
number of output embeddings that are computed in parallel and thus defines the trade-off between memory and time
complexity. If ``chunk_size`` is set to 0, no feed forward chunking is done.

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 352 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 418 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 373 KiB

View File

@@ -1,17 +1,18 @@
Transformers
================================================================================================================================================
=======================================================================================================================
🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose architectures
(BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural Language Generation
(NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.
State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
This is the documentation of our repository `transformers <https://github.com/huggingface/transformers>`__.
🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
This is the documentation of our repository `transformers <https://github.com/huggingface/transformers>`_.
Features
---------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
- As easy to use as pytorch-transformers
- As powerful and concise as Keras
- High performance on NLU and NLG tasks
- Low barrier to entry for educators and practitioners
@@ -35,77 +36,238 @@ Choose the right framework for every part of a model's lifetime:
- Seamlessly pick the right framework for training, evaluation, production
Contents
---------------------------------
-----------------------------------------------------------------------------------------------------------------------
The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and conversion utilities for the following models:
The documentation is organized in five parts:
1. `BERT <https://github.com/google-research/bert>`_ (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`_ by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
2. `GPT <https://github.com/openai/finetune-transformer-lm>`_ (from OpenAI) released with the paper `Improving Language Understanding by Generative Pre-Training <https://blog.openai.com/language-unsupervised>`_ by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
3. `GPT-2 <https://blog.openai.com/better-language-models>`_ (from OpenAI) released with the paper `Language Models are Unsupervised Multitask Learners <https://blog.openai.com/better-language-models>`_ by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
4. `Transformer-XL <https://github.com/kimiyoung/transformer-xl>`_ (from Google/CMU) released with the paper `Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`_ by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
5. `XLNet <https://github.com/zihangdai/xlnet>`_ (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`_ by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
6. `XLM <https://github.com/facebookresearch/XLM>`_ (from Facebook) released together with the paper `Cross-lingual Language Model Pretraining <https://arxiv.org/abs/1901.07291>`_ by Guillaume Lample and Alexis Conneau.
7. `RoBERTa <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_ (from Facebook), released together with the paper a `Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
8. `DistilBERT <https://huggingface.co/transformers/model_doc/distilbert.html>`_ (from HuggingFace) released together with the paper `DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`_ by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2 <https://github.com/huggingface/transformers/tree/master/examples/distillation>`_.
9. `CTRL <https://github.com/pytorch/fairseq/tree/master/examples/ctrl>`_ (from Salesforce), released together with the paper `CTRL: A Conditional Transformer Language Model for Controllable Generation <https://www.github.com/salesforce/ctrl>`_ by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
10. `CamemBERT <https://huggingface.co/transformers/model_doc/camembert.html>`_ (from FAIR, Inria, Sorbonne Université) released together with the paper `CamemBERT: a Tasty French Language Model <https://arxiv.org/abs/1911.03894>`_ by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suarez, Yoann Dupont, Laurent Romary, Eric Villemonte de la Clergerie, Djame Seddah, and Benoît Sagot.
11. `ALBERT <https://github.com/google-research/ALBERT>`_ (from Google Research), released together with the paper a `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_ by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
12. `XLM-RoBERTa <https://github.com/pytorch/fairseq/tree/master/examples/xlmr>`_ (from Facebook AI), released together with the paper `Unsupervised Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`_ by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
13. `FlauBERT <https://github.com/getalp/Flaubert>`_ (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model Pre-training for French <https://arxiv.org/abs/1912.05372>`_ by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
- **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy
and a glossary.
- **USING 🤗 TRANSFORMERS** contains general tutorials on how to use the library.
- **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library.
- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general resarch in
transformers model
- The three last section contain the documentation of each public class and function, grouped in:
- **MAIN CLASSES** for the main classes exposing the important APIs of the library.
- **MODELS** for the classes and functions related to each model implemented in the library.
- **INTERNAL HELPERS** for the classes and functions we use internally.
The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and
conversion utilities for the following models:
..
This list is updated automatically from the README with `make fix-copies`. Do not update manually!
1. :doc:`ALBERT <model_doc/albert>` (from Google Research and the Toyota Technological Institute at Chicago) released
with the paper `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
<https://arxiv.org/abs/1909.11942>`__, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush
Sharma, Radu Soricut.
2. :doc:`BART <model_doc/bart>` (from Facebook) released with the paper `BART: Denoising Sequence-to-Sequence
Pre-training for Natural Language Generation, Translation, and Comprehension
<https://arxiv.org/pdf/1910.13461.pdf>`__ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman
Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
3. :doc:`BERT <model_doc/bert>` (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`__ by Jacob Devlin, Ming-Wei Chang,
Kenton Lee and Kristina Toutanova.
4. :doc:`BERT For Sequence Generation <model_doc/bertgeneration>` (from Google) released with the paper `Leveraging
Pre-trained Checkpoints for Sequence Generation Tasks <https://arxiv.org/abs/1907.12461>`__ by Sascha Rothe, Shashi
Narayan, Aliaksei Severyn.
5. :doc:`Blenderbot <model_doc/blenderbot>` (from Facebook) released with the paper `Recipes for building an
open-domain chatbot <https://arxiv.org/abs/2004.13637>`__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary
Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
6. :doc:`CamemBERT <model_doc/camembert>` (from Inria/Facebook/Sorbonne) released with the paper `CamemBERT: a Tasty
French Language Model <https://arxiv.org/abs/1911.03894>`__ by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz
Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
7. :doc:`CTRL <model_doc/ctrl>` (from Salesforce) released with the paper `CTRL: A Conditional Transformer Language
Model for Controllable Generation <https://arxiv.org/abs/1909.05858>`__ by Nitish Shirish Keskar*, Bryan McCann*,
Lav R. Varshney, Caiming Xiong and Richard Socher.
8. :doc:`DeBERTa <model_doc/deberta>` (from Microsoft Research) released with the paper `DeBERTa: Decoding-enhanced
BERT with Disentangled Attention <https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao,
Weizhu Chen.
9. :doc:`DialoGPT <model_doc/dialogpt>` (from Microsoft Research) released with the paper `DialoGPT: Large-Scale
Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536>`__ by Yizhe Zhang,
Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
10. :doc:`DistilBERT <model_doc/distilbert>` (from HuggingFace), released together with the paper `DistilBERT, a
distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__ by Victor
Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, RoBERTa into `DistilRoBERTa
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, Multilingual BERT into
`DistilmBERT <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__ and a German
version of DistilBERT.
11. :doc:`DPR <model_doc/dpr>` (from Facebook) released with the paper `Dense Passage Retrieval for Open-Domain
Question Answering <https://arxiv.org/abs/2004.04906>`__ by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick
Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
12. :doc:`ELECTRA <model_doc/electra>` (from Google Research/Stanford University) released with the paper `ELECTRA:
Pre-training text encoders as discriminators rather than generators <https://arxiv.org/abs/2003.10555>`__ by Kevin
Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
13. :doc:`FlauBERT <model_doc/flaubert>` (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model
Pre-training for French <https://arxiv.org/abs/1912.05372>`__ by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne,
Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
14. :doc:`Funnel Transformer <model_doc/funnel>` (from CMU/Google Brain) released with the paper `Funnel-Transformer:
Filtering out Sequential Redundancy for Efficient Language Processing <https://arxiv.org/abs/2006.03236>`__ by
Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
15. :doc:`GPT <model_doc/gpt>` (from OpenAI) released with the paper `Improving Language Understanding by Generative
Pre-Training <https://blog.openai.com/language-unsupervised/>`__ by Alec Radford, Karthik Narasimhan, Tim Salimans
and Ilya Sutskever.
16. :doc:`GPT-2 <model_doc/gpt2>` (from OpenAI) released with the paper `Language Models are Unsupervised Multitask
Learners <https://blog.openai.com/better-language-models/>`__ by Alec Radford*, Jeffrey Wu*, Rewon Child, David
Luan, Dario Amodei** and Ilya Sutskever**.
17. :doc:`LayoutLM <model_doc/layoutlm>` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training
of Text and Layout for Document Image Understanding <https://arxiv.org/abs/1912.13318>`__ by Yiheng Xu, Minghao Li,
Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
18. :doc:`Longformer <model_doc/longformer>` (from AllenAI) released with the paper `Longformer: The Long-Document
Transformer <https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
19. :doc:`LXMERT <model_doc/lxmert>` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality
Encoder Representations from Transformers for Open-Domain Question Answering <https://arxiv.org/abs/1908.07490>`__
by Hao Tan and Mohit Bansal.
20. :doc:`MarianMT <model_doc/marian>` Machine translation models trained using `OPUS <http://opus.nlpl.eu/>`__ data by
Jörg Tiedemann. The `Marian Framework <https://marian-nmt.github.io/>`__ is being developed by the Microsoft
Translator Team.
21. :doc:`MBart <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Denoising Pre-training for
Neural Machine Translation <https://arxiv.org/abs/2001.08210>`__ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li,
Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
22. :doc:`Pegasus <model_doc/pegasus>` (from Google) released with the paper `PEGASUS: Pre-training with Extracted
Gap-sentences for Abstractive Summarization <https://arxiv.org/abs/1912.08777>`__> by Jingqing Zhang, Yao Zhao,
Mohammad Saleh and Peter J. Liu.
23. :doc:`ProphetNet <model_doc/prophetnet>` (from Microsoft Research) released with the paper `ProphetNet: Predicting
Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi,
Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
24. :doc:`Reformer <model_doc/reformer>` (from Google Research) released with the paper `Reformer: The Efficient
Transformer <https://arxiv.org/abs/2001.04451>`__ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
25. :doc:`RoBERTa <model_doc/roberta>` (from Facebook), released together with the paper a `Robustly Optimized BERT
Pretraining Approach <https://arxiv.org/abs/1907.11692>`__ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar
Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. ultilingual BERT into `DistilmBERT
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__ and a German version of
DistilBERT.
26. :doc:`SqueezeBert <model_doc/squeezebert>` released with the paper `SqueezeBERT: What can computer vision teach NLP
about efficient neural networks? <https://arxiv.org/abs/2006.11316>`__ by Forrest N. Iandola, Albert E. Shaw, Ravi
Krishna, and Kurt W. Keutzer.
27. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam
Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
28. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*,
Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
29. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau.
30. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan,
Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
31. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau*, Kartikay
Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke
Zettlemoyer and Veselin Stoyanov.
32. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive
Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang*, Zihang Dai*, Yiming
Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
33. `Other community models <https://huggingface.co/models>`__, contributed by the `community
<https://huggingface.co/users>`__.
.. toctree::
:maxdepth: 2
:caption: Notes
:caption: Get started
quicktour
installation
quickstart
philosophy
glossary
pretrained_models
usage
.. toctree::
:maxdepth: 2
:caption: Using 🤗 Transformers
task_summary
model_summary
preprocessing
training
model_sharing
tokenizer_summary
multilingual
.. toctree::
:maxdepth: 2
:caption: Advanced guides
pretrained_models
examples
custom_datasets
notebooks
serialization
converting_tensorflow_models
migration
contributing
testing
serialization
.. toctree::
:maxdepth: 2
:caption: Research
bertology
torchscript
multilingual
perplexity
benchmarks
.. toctree::
:maxdepth: 2
:caption: Main classes
:caption: Main Classes
main_classes/callback
main_classes/configuration
main_classes/logging
main_classes/model
main_classes/tokenizer
main_classes/pipelines
main_classes/optimizer_schedules
main_classes/output
main_classes/pipelines
main_classes/processors
main_classes/tokenizer
main_classes/trainer
.. toctree::
:maxdepth: 2
:caption: Package Reference
:caption: Models
model_doc/auto
model_doc/encoderdecoder
model_doc/bert
model_doc/gpt
model_doc/transformerxl
model_doc/gpt2
model_doc/xlm
model_doc/xlnet
model_doc/roberta
model_doc/distilbert
model_doc/ctrl
model_doc/camembert
model_doc/albert
model_doc/xlmroberta
model_doc/flaubert
model_doc/auto
model_doc/bart
model_doc/t5
model_doc/electra
model_doc/bert
model_doc/bertgeneration
model_doc/blenderbot
model_doc/camembert
model_doc/ctrl
model_doc/deberta
model_doc/dialogpt
model_doc/reformer
model_doc/distilbert
model_doc/dpr
model_doc/electra
model_doc/encoderdecoder
model_doc/flaubert
model_doc/fsmt
model_doc/funnel
model_doc/layoutlm
model_doc/longformer
model_doc/lxmert
model_doc/marian
model_doc/mbart
model_doc/mobilebert
model_doc/gpt
model_doc/gpt2
model_doc/pegasus
model_doc/prophetnet
model_doc/rag
model_doc/reformer
model_doc/retribert
model_doc/roberta
model_doc/squeezebert
model_doc/t5
model_doc/transformerxl
model_doc/xlm
model_doc/xlmprophetnet
model_doc/xlmroberta
model_doc/xlnet
.. toctree::
:maxdepth: 2
:caption: Internal Helpers
internal/modeling_utils
internal/pipelines_utils
internal/tokenization_utils
internal/trainer_utils

View File

@@ -1,51 +1,102 @@
# Installation
Transformers is tested on Python 3.6+ and PyTorch 1.1.0
🤗 Transformers is tested on Python 3.6+, and PyTorch 1.1.0+ or TensorFlow 2.0+.
## With pip
You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're
unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). Create a virtual environment with the version of Python you're going
to use and activate it.
PyTorch Transformers can be installed using pip as follows:
Now, if you want to use 🤗 Transformers, you can install it with pip. If you'd like to play with the examples, you
must install it from source.
``` bash
## Installation with pip
First you need to install one of, or both, TensorFlow 2.0 and PyTorch.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available)
and/or [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific
install command for your platform.
When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows:
```bash
pip install transformers
```
## From source
Alternatively, for CPU-support only, you can install 🤗 Transformers and PyTorch in one line with:
To install from source, clone the repository and install with:
```bash
pip install transformers[torch]
```
or 🤗 Transformers and TensorFlow 2.0 in one line with:
```bash
pip install transformers[tf-cpu]
```
To check 🤗 Transformers is properly installed, run the following command:
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('we love you'))"
```
It should download a pretrained model then print something like
```bash
[{'label': 'POSITIVE', 'score': 0.9998704791069031}]
```
(Note that TensorFlow will print additional stuff before that last statement.)
## Installing from source
To install from source, clone the repository and install with the following commands:
``` bash
git clone https://github.com/huggingface/transformers.git
cd transformers
pip install .
pip install -e .
```
## Tests
Again, you can run
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in the [tests folder](https://github.com/huggingface/transformers/tree/master/tests) and examples tests in the [examples folder](https://github.com/huggingface/transformers/tree/master/examples).
Refer to the [contributing guide](https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md#tests) for details about running tests.
## OpenAI GPT original tokenization workflow
If you want to reproduce the original tokenization process of the `OpenAI GPT` paper, you will need to install `ftfy` and `SpaCy`:
``` bash
pip install spacy ftfy==4.4.3
python -m spacy download en
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I hate you'))"
```
If you don't install `ftfy` and `SpaCy`, the `OpenAI GPT` tokenizer will default to tokenize using BERT's `BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't worry).
to check 🤗 Transformers is properly installed.
## Note on model downloads (Continuous Integration or large-scale deployments)
## Caching models
If you expect to be downloading large volumes of models (more than 1,000) from our hosted bucket (for instance through your CI setup, or a large-scale production deployment), please cache the model files on your end. It will be way faster, and cheaper. Feel free to contact us privately if you need any help.
This library provides pretrained models that will be downloaded and cached locally. Unless you specify a location with
`cache_dir=...` when you use methods like `from_pretrained`, these models will automatically be downloaded in the
folder given by the shell environment variable ``TRANSFORMERS_CACHE``. The default value for it will be the PyTorch
cache home followed by ``/transformers/`` (even if you don't have PyTorch installed). This is (by order of priority):
* shell environment variable ``TORCH_HOME``
* shell environment variable ``XDG_CACHE_HOME`` + ``/torch/``
* default: ``~/.cache/torch/``
So if you don't have any specific environment variable set, the cache directory will be at
``~/.cache/torch/transformers/``.
**Note:** If you have set a shell enviromnent variable for one of the predecessors of this library
(``PYTORCH_TRANSFORMERS_CACHE`` or ``PYTORCH_PRETRAINED_BERT_CACHE``), those will be used if there is no shell
enviromnent variable for ``TRANSFORMERS_CACHE``.
### Note on model downloads (Continuous Integration or large-scale deployments)
If you expect to be downloading large volumes of models (more than 1,000) from our hosted bucket (for instance through
your CI setup, or a large-scale production deployment), please cache the model files on your end. It will be way
faster, and cheaper. Feel free to contact us privately if you need any help.
## Do you want to run a Transformer model on a mobile device?
You should check out our [swift-coreml-transformers](https://github.com/huggingface/swift-coreml-transformers) repo.
It contains a set of tools to convert PyTorch or TensorFlow 2.0 trained Transformer models (currently contains `GPT-2`, `DistilGPT-2`, `BERT`, and `DistilBERT`) to CoreML models that run on iOS devices.
It contains a set of tools to convert PyTorch or TensorFlow 2.0 trained Transformer models (currently contains `GPT-2`,
`DistilGPT-2`, `BERT`, and `DistilBERT`) to CoreML models that run on iOS devices.
At some point in the future, you'll be able to seamlessly move from pre-training or fine-tuning models in PyTorch to productizing them in CoreML,
or prototype a model or an app in CoreML then research its hyperparameters or architecture from PyTorch. Super exciting!
At some point in the future, you'll be able to seamlessly move from pre-training or fine-tuning models in PyTorch or
TensorFlow 2.0 to productizing them in CoreML, or prototype a model or an app in CoreML then research its
hyperparameters or architecture from PyTorch or TensorFlow 2.0. Super exciting!

View File

@@ -0,0 +1,88 @@
Custom Layers and Utilities
-----------------------------------------------------------------------------------------------------------------------
This page lists all the custom layers used by the library, as well as the utility functions it provides for modeling.
Most of those are only useful if you are studying the code of the models in the library.
Pytorch custom modules
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_utils.Conv1D
.. autoclass:: transformers.modeling_utils.PoolerStartLogits
:members: forward
.. autoclass:: transformers.modeling_utils.PoolerEndLogits
:members: forward
.. autoclass:: transformers.modeling_utils.PoolerAnswerClass
:members: forward
.. autoclass:: transformers.modeling_utils.SquadHeadOutput
.. autoclass:: transformers.modeling_utils.SQuADHead
:members: forward
.. autoclass:: transformers.modeling_utils.SequenceSummary
:members: forward
PyTorch Helper Functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.apply_chunking_to_forward
.. autofunction:: transformers.modeling_utils.find_pruneable_heads_and_indices
.. autofunction:: transformers.modeling_utils.prune_layer
.. autofunction:: transformers.modeling_utils.prune_conv1d_layer
.. autofunction:: transformers.modeling_utils.prune_linear_layer
TensorFlow custom layers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_utils.TFConv1D
.. autoclass:: transformers.modeling_tf_utils.TFSharedEmbeddings
:members: call
.. autoclass:: transformers.modeling_tf_utils.TFSequenceSummary
:members: call
TensorFlow loss functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_utils.TFCausalLanguageModelingLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFMaskedLanguageModelingLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFMultipleChoiceLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFQuestionAnsweringLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFSequenceClassificationLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFTokenClassificationLoss
:members:
TensorFlow Helper Functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.modeling_tf_utils.cast_bool_to_primitive
.. autofunction:: transformers.modeling_tf_utils.get_initializer
.. autofunction:: transformers.modeling_tf_utils.keras_serializable
.. autofunction:: transformers.modeling_tf_utils.shape_list

View File

@@ -0,0 +1,40 @@
Utilities for pipelines
-----------------------------------------------------------------------------------------------------------------------
This page lists all the utility functions the library provides for pipelines.
Most of those are only useful if you are studying the code of the models in the library.
Argument handling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.pipelines.ArgumentHandler
.. autoclass:: transformers.pipelines.ZeroShotClassificationArgumentHandler
.. autoclass:: transformers.pipelines.QuestionAnsweringArgumentHandler
Data format
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.pipelines.PipelineDataFormat
:members:
.. autoclass:: transformers.pipelines.CsvPipelineDataFormat
:members:
.. autoclass:: transformers.pipelines.JsonPipelineDataFormat
:members:
.. autoclass:: transformers.pipelines.PipedPipelineDataFormat
:members:
Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.pipelines.get_framework
.. autoclass:: transformers.pipelines.PipelineException

View File

@@ -0,0 +1,38 @@
Utilities for Tokenizers
-----------------------------------------------------------------------------------------------------------------------
This page lists all the utility functions used by the tokenizers, mainly the class
:class:`~transformers.tokenization_utils_base.PreTrainedTokenizerBase` that implements the common methods between
:class:`~transformers.PreTrainedTokenizer` and :class:`~transformers.PreTrainedTokenizerFast` and the mixin
:class:`~transformers.tokenization_utils_base.SpecialTokensMixin`.
Most of those are only useful if you are studying the code of the tokenizers in the library.
PreTrainedTokenizerBase
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.tokenization_utils_base.PreTrainedTokenizerBase
:special-members: __call__
:members:
SpecialTokensMixin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.tokenization_utils_base.SpecialTokensMixin
:members:
Enums and namedtuples
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.tokenization_utils_base.ExplicitEnum
.. autoclass:: transformers.tokenization_utils_base.PaddingStrategy
.. autoclass:: transformers.tokenization_utils_base.TensorType
.. autoclass:: transformers.tokenization_utils_base.TruncationStrategy
.. autoclass:: transformers.tokenization_utils_base.CharSpan
.. autoclass:: transformers.tokenization_utils_base.TokenSpan

View File

@@ -0,0 +1,27 @@
Utilities for Trainer
-----------------------------------------------------------------------------------------------------------------------
This page lists all the utility functions used by :class:`~transformers.Trainer`.
Most of those are only useful if you are studying the code of the Trainer in the library.
Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.EvalPrediction
.. autofunction:: transformers.set_seed
.. autofunction:: transformers.torch_distributed_zero_first
Callbacks internals
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.trainer_callback.CallbackHandler
Distributed Evaluation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.trainer_pt_utils.DistributedTensorGatherer
:members:

View File

@@ -0,0 +1,68 @@
Callbacks
-----------------------------------------------------------------------------------------------------------------------
Callbacks are objects that can customize the behavior of the training loop in the PyTorch
:class:`~transformers.Trainer` (this feature is not yet implemented in TensorFlow) that can inspect the training loop
state (for progress reporting, logging on TensorBoard or other ML platforms...) and take decisions (like early
stopping).
Callbacks are "read only" pieces of code, apart from the :class:`~transformers.TrainerControl` object they return, they
cannot change anything in the training loop. For customizations that require changes in the training loop, you should
subclass :class:`~transformers.Trainer` and override the methods you need (see :doc:`trainer` for examples).
By default a :class:`~transformers.Trainer` will use the following callbacks:
- :class:`~transformers.DefaultFlowCallback` which handles the default behavior for logging, saving and evaluation.
- :class:`~transformers.PrinterCallback` or :class:`~transformers.ProrgressCallback` to display progress and print the
logs (the first one is used if you deactivate tqdm through the :class:`~transformers.TrainingArguments`, otherwise
it's the second one).
- :class:`~transformers.integrations.TensorBoardCallback` if tensorboard is accessible (either through PyTorch >= 1.4
or tensorboardX).
- :class:`~transformers.integrations.WandbCallback` if `wandb <https://www.wandb.com/>`__ is installed.
- :class:`~transformers.integrations.CometCallback` if `comet_ml <https://www.comet.ml/site/>`__ is installed.
The main class that implements callbacks is :class:`~transformers.TrainerCallback`. It gets the
:class:`~transformers.TrainingArguments` used to instantiate the :class:`~transformers.Trainer`, can access that
Trainer's internal state via :class:`~transformers.TrainerState`, and can take some actions on the training loop via
:class:`~transformers.TrainerControl`.
Available Callbacks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here is the list of the available :class:`~transformers.TrainerCallback` in the library:
.. autoclass:: transformers.integrations.CometCallback
:members: setup
.. autoclass:: transformers.DefaultFlowCallback
.. autoclass:: transformers.PrinterCallback
.. autoclass:: transformers.ProgressCallback
.. autoclass:: transformers.integrations.TensorBoardCallback
.. autoclass:: transformers.integrations.WandbCallback
:members: setup
TrainerCallback
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TrainerCallback
:members:
TrainerState
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TrainerState
:members:
TrainerControl
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TrainerControl
:members:

View File

@@ -1,10 +1,13 @@
Configuration
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
The base class ``PretrainedConfig`` implements the common methods for loading/saving a configuration either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded from HuggingFace's AWS S3 repository).
The base class :class:`~transformers.PretrainedConfig` implements the common methods for loading/saving a configuration
either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded
from HuggingFace's AWS S3 repository).
``PretrainedConfig``
~~~~~~~~~~~~~~~~~~~~~
PretrainedConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PretrainedConfig
:members:

View File

@@ -0,0 +1,58 @@
Logging
-----------------------------------------------------------------------------------------------------------------------
🤗 Transformers has a centralized logging system, so that you can setup the verbosity of the library easily.
Currently the default verbosity of the library is ``WARNING``.
To change the level of verbosity, just use one of the direct setters. For instance, here is how to change the verbosity
to the INFO level.
.. code-block:: python
import transformers
transformers.logging.set_verbosity_info()
You can also use the environment variable ``TRANSFORMERS_VERBOSITY`` to override the default verbosity. You can set it
to one of the following: ``debug``, ``info``, ``warning``, ``error``, ``critical``. For example:
.. code-block:: bash
TRANSFORMERS_VERBOSITY=error ./myprogram.py
All the methods of this logging module are documented below, the main ones are
:func:`transformers.logging.get_verbosity` to get the current level of verbosity in the logger and
:func:`transformers.logging.set_verbosity` to set the verbosity to the level of your choice. In order (from the least
verbose to the most verbose), those levels (with their corresponding int values in parenthesis) are:
- :obj:`transformers.logging.CRITICAL` or :obj:`transformers.logging.FATAL` (int value, 50): only report the most
critical errors.
- :obj:`transformers.logging.ERROR` (int value, 40): only report errors.
- :obj:`transformers.logging.WARNING` or :obj:`transformers.logging.WARN` (int value, 30): only reports error and
warnings. This the default level used by the library.
- :obj:`transformers.logging.INFO` (int value, 20): reports error, warnings and basic information.
- :obj:`transformers.logging.DEBUG` (int value, 10): report all information.
Base setters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.logging.set_verbosity_error
.. autofunction:: transformers.logging.set_verbosity_warning
.. autofunction:: transformers.logging.set_verbosity_info
.. autofunction:: transformers.logging.set_verbosity_debug
Other functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.logging.get_verbosity
.. autofunction:: transformers.logging.set_verbosity
.. autofunction:: transformers.logging.get_logger
.. autofunction:: transformers.logging.enable_explicit_format
.. autofunction:: transformers.logging.reset_format

View File

@@ -1,27 +1,55 @@
Models
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
The base class ``PreTrainedModel`` implements the common methods for loading/saving a model either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded from HuggingFace's AWS S3 repository).
The base classes :class:`~transformers.PreTrainedModel` and :class:`~transformers.TFPreTrainedModel` implement the
common methods for loading/saving a model either from a local file or directory, or from a pretrained model
configuration provided by the library (downloaded from HuggingFace's AWS S3 repository).
``PreTrainedModel`` also implements a few methods which are common among all the models to:
:class:`~transformers.PreTrainedModel` and :class:`~transformers.TFPreTrainedModel` also implement a few methods which
are common among all the models to:
- resize the input token embeddings when new tokens are added to the vocabulary
- prune the attention heads of the model.
``PreTrainedModel``
~~~~~~~~~~~~~~~~~~~~~
The other methods that are common to each model are defined in :class:`~transformers.modeling_utils.ModuleUtilsMixin`
(for the PyTorch models) and :class:`~transformers.modeling_tf_utils.TFModuleUtilsMixin` (for the TensorFlow models) or
for text generation, :class:`~transformers.generation_utils.GenerationMixin` (for the PyTorch models) and
:class:`~transformers.generation_tf_utils.TFGenerationMixin` (for the TensorFlow models)
PreTrainedModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PreTrainedModel
:members:
``Helper Functions``
~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.apply_chunking_to_forward
ModuleUtilsMixin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_utils.ModuleUtilsMixin
:members:
``TFPreTrainedModel``
~~~~~~~~~~~~~~~~~~~~~
TFPreTrainedModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFPreTrainedModel
:members:
TFModelUtilsMixin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_utils.TFModelUtilsMixin
:members:
Generative models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.generation_utils.GenerationMixin
:members:
.. autoclass:: transformers.generation_tf_utils.TFGenerationMixin
:members:

View File

@@ -1,5 +1,5 @@
Optimizer
----------------------------------------------------
Optimization
-----------------------------------------------------------------------------------------------------------------------
The ``.optimization`` module provides:
@@ -7,25 +7,30 @@ The ``.optimization`` module provides:
- several schedules in the form of schedule objects that inherit from ``_LRSchedule``:
- a gradient accumulation class to accumulate the gradients of multiple batches
``AdamW``
~~~~~~~~~~~~~~~~
AdamW (PyTorch)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AdamW
:members:
``AdamWeightDecay``
~~~~~~~~~~~~~~~~~~~
AdaFactor (PyTorch)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.Adafactor
AdamWeightDecay (TensorFlow)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AdamWeightDecay
:members:
.. autofunction:: transformers.create_optimizer
Schedules
----------------------------------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Learning Rate Schedules (Pytorch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Learning Rate Schedules
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.get_constant_schedule
@@ -57,16 +62,16 @@ Learning Rate Schedules
:target: /imgs/warmup_linear_schedule.png
:alt:
``Warmup``
~~~~~~~~~~~~~~~~
Warmup (TensorFlow)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.WarmUp
:members:
Gradient Strategies
----------------------------------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``GradientAccumulator``
~~~~~~~~~~~~~~~~~~~~~~~
GradientAccumulator (TensorFlow)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.GradientAccumulator

View File

@@ -0,0 +1,260 @@
Model outputs
-----------------------------------------------------------------------------------------------------------------------
PyTorch models have outputs that are instances of subclasses of :class:`~transformers.file_utils.ModelOutput`. Those
are data structures containing all the information returned by the model, but that can also be used as tuples or
dictionaries.
Let's see of this looks on an example:
.. code-block::
from transformers import BertTokenizer, BertForSequenceClassification
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
outputs = model(**inputs, labels=labels)
The ``outputs`` object is a :class:`~transformers.modeling_outputs.SequenceClassifierOutput`, as we can see in the
documentation of that class below, it means it has an optional ``loss``, a ``logits`` an optional ``hidden_states`` and
an optional ``attentions`` attribute. Here we have the ``loss`` since we passed along ``labels``, but we don't have
``hidden_states`` and ``attentions`` because we didn't pass ``output_hidden_states=True`` or
``output_attentions=True``.
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
will get ``None``. Here for instance ``outputs.loss`` is the loss computed by the model, and ``outputs.attentions`` is
``None``.
When considering our ``outputs`` object as tuple, it only considers the attributes that don't have ``None`` values.
Here for instance, it has two elements, ``loss`` then ``logits``, so
.. code-block::
outputs[:2]
will return the tuple ``(outputs.loss, outputs.logits)`` for instance.
When considering our ``outputs`` object as dictionary, it only considers the attributes that don't have ``None``
values. Here for instance, it has two keys that are ``loss`` and ``logits``.
We document here the generic model outputs that are used by more than one model type. Specific output types are
documented on their corresponding model page.
ModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.file_utils.ModelOutput
:members:
BaseModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.BaseModelOutput
:members:
BaseModelOutputWithPooling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.BaseModelOutputWithPooling
:members:
BaseModelOutputWithPast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.BaseModelOutputWithPast
:members:
Seq2SeqModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.Seq2SeqModelOutput
:members:
CausalLMOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.CausalLMOutput
:members:
CausalLMOutputWithPast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.CausalLMOutputWithPast
:members:
MaskedLMOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.MaskedLMOutput
:members:
Seq2SeqLMOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.Seq2SeqLMOutput
:members:
NextSentencePredictorOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.NextSentencePredictorOutput
:members:
SequenceClassifierOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.SequenceClassifierOutput
:members:
Seq2SeqSequenceClassifierOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.Seq2SeqSequenceClassifierOutput
:members:
MultipleChoiceModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.MultipleChoiceModelOutput
:members:
TokenClassifierOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.TokenClassifierOutput
:members:
QuestionAnsweringModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.QuestionAnsweringModelOutput
:members:
Seq2SeqQuestionAnsweringModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.Seq2SeqQuestionAnsweringModelOutput
:members:
TFBaseModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFBaseModelOutput
:members:
TFBaseModelOutputWithPooling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFBaseModelOutputWithPooling
:members:
TFBaseModelOutputWithPast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFBaseModelOutputWithPast
:members:
TFSeq2SeqModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFSeq2SeqModelOutput
:members:
TFCausalLMOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFCausalLMOutput
:members:
TFCausalLMOutputWithPast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFCausalLMOutputWithPast
:members:
TFMaskedLMOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFMaskedLMOutput
:members:
TFSeq2SeqLMOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFSeq2SeqLMOutput
:members:
TFNextSentencePredictorOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFNextSentencePredictorOutput
:members:
TFSequenceClassifierOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFSequenceClassifierOutput
:members:
TFSeq2SeqSequenceClassifierOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFSeq2SeqSequenceClassifierOutput
:members:
TFMultipleChoiceModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFMultipleChoiceModelOutput
:members:
TFTokenClassifierOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFTokenClassifierOutput
:members:
TFQuestionAnsweringModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFQuestionAnsweringModelOutput
:members:
TFSeq2SeqQuestionAnsweringModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFSeq2SeqQuestionAnsweringModelOutput
:members:

View File

@@ -1,74 +1,120 @@
Pipelines
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most
of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity
Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering.
Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the
:doc:`task summary <../task_summary>` for examples of use.
There are two categories of pipeline abstractions to be aware about:
- The :class:`~transformers.pipeline` which is the most powerful object encapsulating all other pipelines
- The other task-specific pipelines, such as :class:`~transformers.NerPipeline`
or :class:`~transformers.QuestionAnsweringPipeline`
- The :func:`~transformers.pipeline` which is the most powerful object encapsulating all other pipelines.
- The other task-specific pipelines:
- :class:`~transformers.ConversationalPipeline`
- :class:`~transformers.FeatureExtractionPipeline`
- :class:`~transformers.FillMaskPipeline`
- :class:`~transformers.QuestionAnsweringPipeline`
- :class:`~transformers.SummarizationPipeline`
- :class:`~transformers.TextClassificationPipeline`
- :class:`~transformers.TextGenerationPipeline`
- :class:`~transformers.TokenClassificationPipeline`
- :class:`~transformers.TranslationPipeline`
- :class:`~transformers.ZeroShotClassificationPipeline`
- :class:`~transformers.Text2TextGenerationPipeline`
The pipeline abstraction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The `pipeline` abstraction is a wrapper around all the other available pipelines. It is instantiated as any
other pipeline but requires an additional argument which is the `task`.
.. autoclass:: transformers.pipeline
:members:
.. autofunction:: transformers.pipeline
The task specific pipelines
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Parent class: Pipeline
=========================================
ConversationalPipeline
=======================================================================================================================
.. autoclass:: transformers.Pipeline
:members: predict, transform, save_pretrained
.. autoclass:: transformers.Conversation
NerPipeline
==========================================
.. autoclass:: transformers.NerPipeline
TokenClassificationPipeline
==========================================
This class is an alias of the :class:`~transformers.NerPipeline` defined above. Please refer to that pipeline for
documentation and usage examples.
FillMaskPipeline
==========================================
.. autoclass:: transformers.FillMaskPipeline
.. autoclass:: transformers.ConversationalPipeline
:special-members: __call__
:members:
FeatureExtractionPipeline
==========================================
=======================================================================================================================
.. autoclass:: transformers.FeatureExtractionPipeline
:special-members: __call__
:members:
TextClassificationPipeline
==========================================
FillMaskPipeline
=======================================================================================================================
.. autoclass:: transformers.TextClassificationPipeline
.. autoclass:: transformers.FillMaskPipeline
:special-members: __call__
:members:
NerPipeline
=======================================================================================================================
This class is an alias of the :class:`~transformers.TokenClassificationPipeline` defined below. Please refer to that
pipeline for documentation and usage examples.
QuestionAnsweringPipeline
==========================================
=======================================================================================================================
.. autoclass:: transformers.QuestionAnsweringPipeline
:special-members: __call__
:members:
SummarizationPipeline
==========================================
=======================================================================================================================
.. autoclass:: transformers.SummarizationPipeline
:special-members: __call__
:members:
TextClassificationPipeline
=======================================================================================================================
.. autoclass:: transformers.TextClassificationPipeline
:special-members: __call__
:members:
TextGenerationPipeline
==========================================
=======================================================================================================================
.. autoclass:: transformers.TextGenerationPipeline
:special-members: __call__
:members:
Text2TextGenerationPipeline
=======================================================================================================================
.. autoclass:: transformers.Text2TextGenerationPipeline
:special-members: __call__
:members:
TokenClassificationPipeline
=======================================================================================================================
.. autoclass:: transformers.TokenClassificationPipeline
:special-members: __call__
:members:
ZeroShotClassificationPipeline
=======================================================================================================================
.. autoclass:: transformers.ZeroShotClassificationPipeline
:special-members: __call__
:members:
Parent class: :obj:`Pipeline`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.Pipeline
:members:

View File

@@ -1,11 +1,11 @@
Processors
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
This library includes processors for several traditional tasks. These processors can be used to process a dataset into
examples that can be fed to a model.
Processors
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All processors follow the same architecture which is that of the
:class:`~transformers.data.processors.utils.DataProcessor`. The processor returns a list
@@ -26,7 +26,7 @@ of :class:`~transformers.data.processors.utils.InputExample`. These
GLUE
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`General Language Understanding Evaluation (GLUE) <https://gluebenchmark.com/>`__ is a benchmark that evaluates
the performance of models across a diverse set of existing NLU tasks. It was released together with the paper
@@ -52,13 +52,13 @@ Additionally, the following method can be used to load values from a data file
.. automethod:: transformers.data.processors.glue.glue_convert_examples_to_features
Example usage
^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
An example using these processors is given in the `run_glue.py <https://github.com/huggingface/pytorch-transformers/blob/master/examples/text-classification/run_glue.py>`__ script.
XNLI
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`The Cross-Lingual NLI Corpus (XNLI) <https://www.nyu.edu/projects/bowman/xnli/>`__ is a benchmark that evaluates
the quality of cross-lingual text representations.
@@ -78,7 +78,7 @@ An example using these processors is given in the
SQuAD
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`The Stanford Question Answering Dataset (SQuAD) <https://rajpurkar.github.io/SQuAD-explorer//>`__ is a benchmark that evaluates
the performance of models on question answering. Two versions are available, v1.1 and v2.0. The first version (v1.1) was released together with the paper
@@ -88,7 +88,7 @@ the paper `Know What You Don't Know: Unanswerable Questions for SQuAD <https://a
This library hosts a processor for each of the two versions:
Processors
^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Those processors are:
- :class:`~transformers.data.processors.utils.SquadV1Processor`
@@ -109,7 +109,7 @@ Examples are given below.
Example usage
^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example using the processors as well as the conversion method using data files:
Example::

View File

@@ -1,38 +1,59 @@
Tokenizer
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
A tokenizer is in charge of preparing the inputs for a model. The library comprise tokenizers for all the models. Most of the tokenizers are available in two flavors: a full python implementation and a "Fast" implementation based on the Rust library `tokenizers`. The "Fast" implementations allows (1) a significant speed-up in particular when doing batched tokenization and (2) additional methods to map between the original string (character and words) and the token space (e.g. getting the index of the token comprising a given character or the span of characters corresponding to a given token). Currently no "Fast" implementation is available for the SentencePiece-based tokenizers (for T5, ALBERT, CamemBERT, XLMRoBERTa and XLNet models).
A tokenizer is in charge of preparing the inputs for a model. The library contains tokenizers for all the models. Most
of the tokenizers are available in two flavors: a full python implementation and a "Fast" implementation based on the
Rust library `tokenizers <https://github.com/huggingface/tokenizers>`__. The "Fast" implementations allows:
The base classes ``PreTrainedTokenizer`` and ``PreTrainedTokenizerFast`` implements the common methods for encoding string inputs in model inputs (see below) and instantiating/saving python and "Fast" tokenizers either from a local file or directory or from a pretrained tokenizer provided by the library (downloaded from HuggingFace's AWS S3 repository).
1. a significant speed-up in particular when doing batched tokenization and
2. additional methods to map between the original string (character and words) and the token space (e.g. getting the
index of the token comprising a given character or the span of characters corresponding to a given token). Currently
no "Fast" implementation is available for the SentencePiece-based tokenizers (for T5, ALBERT, CamemBERT, XLMRoBERTa
and XLNet models).
``PreTrainedTokenizer`` and ``PreTrainedTokenizerFast`` thus implements the main methods for using all the tokenizers:
The base classes :class:`~transformers.PreTrainedTokenizer` and :class:`~transformers.PreTrainedTokenizerFast`
implement the common methods for encoding string inputs in model inputs (see below) and instantiating/saving python and
"Fast" tokenizers either from a local file or directory or from a pretrained tokenizer provided by the library
(downloaded from HuggingFace's AWS S3 repository). They both rely on
:class:`~transformers.tokenization_utils_base.PreTrainedTokenizerBase` that contains the common methods, and
:class:`~transformers.tokenization_utils_base.SpecialTokensMixin`.
- tokenizing (spliting strings in sub-word token strings), converting tokens strings to ids and back, and encoding/decoding (i.e. tokenizing + convert to integers),
- adding new tokens to the vocabulary in a way that is independant of the underlying structure (BPE, SentencePiece...),
- managing special tokens like mask, beginning-of-sentence, etc tokens (adding them, assigning them to attributes in the tokenizer for easy access and making sure they are not split during tokenization)
:class:`~transformers.PreTrainedTokenizer` and :class:`~transformers.PreTrainedTokenizerFast` thus implement the main
methods for using all the tokenizers:
``BatchEncoding`` holds the output of the tokenizer's encoding methods (``encode_plus`` and ``batch_encode_plus``) and is derived from a Python dictionary. When the tokenizer is a pure python tokenizer, this class behave just like a standard python dictionary and hold the various model inputs computed by these methodes (``input_ids``, ``attention_mask``...). When the tokenizer is a "Fast" tokenizer (i.e. backed by HuggingFace tokenizers library), this class provides in addition several advanced alignement methods which can be used to map between the original string (character and words) and the token space (e.g. getting the index of the token comprising a given character or the span of characters corresponding to a given token).
- Tokenizing (splitting strings in sub-word token strings), converting tokens strings to ids and back, and
encoding/decoding (i.e., tokenizing and converting to integers).
- Adding new tokens to the vocabulary in a way that is independent of the underlying structure (BPE, SentencePiece...).
- Managing special tokens (like mask, beginning-of-sentence, etc.): adding them, assigning them to attributes in the
tokenizer for easy access and making sure they are not split during tokenization.
``PreTrainedTokenizer``
~~~~~~~~~~~~~~~~~~~~~~~~
:class:`~transformers.BatchEncoding` holds the output of the tokenizer's encoding methods (``__call__``,
``encode_plus`` and ``batch_encode_plus``) and is derived from a Python dictionary. When the tokenizer is a pure python
tokenizer, this class behaves just like a standard python dictionary and holds the various model inputs computed by these
methods (``input_ids``, ``attention_mask``...). When the tokenizer is a "Fast" tokenizer (i.e., backed by HuggingFace
`tokenizers library <https://github.com/huggingface/tokenizers>`__), this class provides in addition several advanced
alignment methods which can be used to map between the original string (character and words) and the token space (e.g.,
getting the index of the token comprising a given character or the span of characters corresponding to a given token).
PreTrainedTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PreTrainedTokenizer
:special-members: __call__
:members:
``PreTrainedTokenizerFast``
~~~~~~~~~~~~~~~~~~~~~~~~
PreTrainedTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PreTrainedTokenizerFast
:special-members: __call__
:members:
``BatchEncoding``
~~~~~~~~~~~~~~~~~~~~~~~~
BatchEncoding
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BatchEncoding
:members:
``SpecialTokensMixin``
~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SpecialTokensMixin
:members:

View File

@@ -0,0 +1,72 @@
Trainer
-----------------------------------------------------------------------------------------------------------------------
The :class:`~transformers.Trainer` and :class:`~transformers.TFTrainer` classes provide an API for feature-complete
training in most standard use cases. It's used in most of the :doc:`example scripts <../examples>`.
Before instantiating your :class:`~transformers.Trainer`/:class:`~transformers.TFTrainer`, create a
:class:`~transformers.TrainingArguments`/:class:`~transformers.TFTrainingArguments` to access all the points of
customization during training.
The API supports distributed training on multiple GPUs/TPUs, mixed precision through `NVIDIA Apex
<https://github.com/NVIDIA/apex>`__ for PyTorch and :obj:`tf.keras.mixed_precision` for TensorFlow.
Both :class:`~transformers.Trainer` and :class:`~transformers.TFTrainer` contain the basic training loop supporting the
previous features. To inject custom behavior you can subclass them and override the following methods:
- **get_train_dataloader**/**get_train_tfdataset** -- Creates the training DataLoader (PyTorch) or TF Dataset.
- **get_eval_dataloader**/**get_eval_tfdataset** -- Creates the evaluation DataLoader (PyTorch) or TF Dataset.
- **get_test_dataloader**/**get_test_tfdataset** -- Creates the test DataLoader (PyTorch) or TF Dataset.
- **log** -- Logs information on the various objects watching training.
- **create_optimizer_and_scheduler** -- Setups the optimizer and learning rate scheduler if they were not passed at
init.
- **compute_loss** - Computes the loss on a batch of training inputs.
- **training_step** -- Performs a training step.
- **prediction_step** -- Performs an evaluation/test step.
- **run_model** (TensorFlow only) -- Basic pass through the model.
- **evaluate** -- Runs an evaluation loop and returns metrics.
- **predict** -- Returns predictions (with metrics if labels are available) on a test set.
Here is an example of how to customize :class:`~transformers.Trainer` using a custom loss function:
.. code-block:: python
from transformers import Trainer
class MyTrainer(Trainer):
def compute_loss(self, model, inputs):
labels = inputs.pop("labels")
outputs = models(**inputs)
logits = outputs[0]
return my_custom_loss(logits, labels)
Another way to customize the training loop behavior for the PyTorch :class:`~transformers.Trainer` is to use
:doc:`callbacks <callback>` that can inspect the training loop state (for progress reporting, logging on TensorBoard or
other ML platforms...) and take decisions (like early stopping).
Trainer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.Trainer
:members:
TFTrainer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFTrainer
:members:
TrainingArguments
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TrainingArguments
:members:
TFTrainingArguments
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFTrainingArguments
:members:

View File

@@ -1,8 +1,8 @@
# Migrating from previous packages
## Migrating from pytorch-transformers to transformers
## Migrating from pytorch-transformers to 🤗 Transformers
Here is a quick summary of what you should take care of when migrating from `pytorch-transformers` to `transformers`.
Here is a quick summary of what you should take care of when migrating from `pytorch-transformers` to 🤗 Transformers.
### Positional order of some models' keywords inputs (`attention_mask`, `token_type_ids`...) changed
@@ -14,17 +14,17 @@ If you used to call the models with positional inputs for keyword arguments, e.g
## Migrating from pytorch-pretrained-bert
Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to `transformers`
Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to 🤗 Transformers
### Models always output `tuples`
The main breaking change when migrating from `pytorch-pretrained-bert` to `transformers` is that the models forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.
The main breaking change when migrating from `pytorch-pretrained-bert` to 🤗 Transformers is that the models forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.
The exact content of the tuples for each model are detailled in the models' docstrings and the [documentation](https://huggingface.co/transformers/).
In pretty much every case, you will be fine by taking the first element of the output as the output you previously used in `pytorch-pretrained-bert`.
Here is a `pytorch-pretrained-bert` to `transformers` conversion example for a `BertForSequenceClassification` classification model:
Here is a `pytorch-pretrained-bert` to 🤗 Transformers conversion example for a `BertForSequenceClassification` classification model:
```python
# Let's load our model
@@ -33,11 +33,11 @@ model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
# If you used to have this line in pytorch-pretrained-bert:
loss = model(input_ids, labels=labels)
# Now just use this line in transformers to extract the loss from the output tuple:
# Now just use this line in 🤗 Transformers to extract the loss from the output tuple:
outputs = model(input_ids, labels=labels)
loss = outputs[0]
# In transformers you can also have access to the logits:
# In 🤗 Transformers you can also have access to the logits:
loss, logits = outputs[:2]
# And even the attention weights if you configure the model to output them (and other outputs too, see the docstrings and documentation)
@@ -109,7 +109,7 @@ for batch in train_data:
loss.backward()
optimizer.step()
### In Transformers, optimizer and schedules are splitted and instantiated like this:
### In 🤗 Transformers, optimizer and schedules are splitted and instantiated like this:
optimizer = AdamW(model.parameters(), lr=lr, correct_bias=False) # To reproduce BertAdam specific behavior set correct_bias=False
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=num_warmup_steps, num_training_steps=num_training_steps) # PyTorch scheduler
### and used like this:

View File

@@ -1,15 +1,16 @@
ALBERT
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ALBERT model was proposed in `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_
by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. It presents
two parameter-reduction techniques to lower memory consumption and increase the trainig speed of BERT:
The ALBERT model was proposed in `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
<https://arxiv.org/abs/1909.11942>`__ by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma,
Radu Soricut. It presents two parameter-reduction techniques to lower memory consumption and increase the training
speed of BERT:
- Splitting the embedding matrix into two smaller matrices
- Using repeating layers split among groups
- Splitting the embedding matrix into two smaller matrices.
- Using repeating layers split among groups.
The abstract from the paper is the following:
@@ -30,67 +31,126 @@ Tips:
similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same
number of (repeating) layers.
The original code can be found `here <https://github.com/google-research/ALBERT>`_.
The original code can be found `here <https://github.com/google-research/ALBERT>`__.
AlbertConfig
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertConfig
:members:
AlbertTokenizer
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
Albert specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_albert.AlbertForPreTrainingOutput
:members:
.. autoclass:: transformers.modeling_tf_albert.TFAlbertForPreTrainingOutput
:members:
AlbertModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertModel
:members:
:members: forward
AlbertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForPreTraining
:members: forward
AlbertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForMaskedLM
:members:
:members: forward
AlbertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForSequenceClassification
:members: forward
AlbertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForMultipleChoice
:members:
AlbertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForTokenClassification
:members: forward
AlbertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForQuestionAnswering
:members:
:members: forward
TFAlbertModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertModel
:members:
:members: call
TFAlbertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForPreTraining
:members: call
TFAlbertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForMaskedLM
:members:
:members: call
TFAlbertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForSequenceClassification
:members:
:members: call
TFAlbertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForMultipleChoice
:members: call
TFAlbertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForTokenClassification
:members: call
TFAlbertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForQuestionAnswering
:members: call

View File

@@ -1,65 +1,131 @@
AutoModels
-----------
AutoClasses
-----------------------------------------------------------------------------------------------------------------------
In many cases, the architecture you want to use can be guessed from the name or the path of the pretrained model you are supplying to the ``from_pretrained`` method.
In many cases, the architecture you want to use can be guessed from the name or the path of the pretrained model you
are supplying to the :obj:`from_pretrained()` method.
AutoClasses are here to do this job for you so that you automatically retrieve the relevant model given the name/path
to the pretrained weights/config/vocabulary.
AutoClasses are here to do this job for you so that you automatically retrieve the relevant model given the name/path to the pretrained weights/config/vocabulary:
Instantiating one of ``AutoModel``, ``AutoConfig`` and ``AutoTokenizer`` will directly create a class of the relevant architecture (ex: ``model = AutoModel.from_pretrained('bert-base-cased')`` will create a instance of ``BertModel``).
Instantiating one of :class:`~transformers.AutoConfig`, :class:`~transformers.AutoModel`, and
:class:`~transformers.AutoTokenizer` will directly create a class of the relevant architecture. For instance
``AutoConfig``
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
model = AutoModel.from_pretrained('bert-base-cased')
will create a model that is an instance of :class:`~transformers.BertModel`.
There is one class of :obj:`AutoModel` for each task, and for each backend (PyTorch or TensorFlow).
AutoConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoConfig
:members:
``AutoTokenizer``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
AutoTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoTokenizer
:members:
``AutoModel``
~~~~~~~~~~~~~~~~~~~~~
AutoModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModel
:members:
``AutoModelForPreTraining``
~~~~~~~~~~~~~~~~~~~~~
AutoModelForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForPreTraining
:members:
``AutoModelWithLMHead``
~~~~~~~~~~~~~~~~~~~~~
AutoModelWithLMHead
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelWithLMHead
:members:
``AutoModelForSequenceClassification``
~~~~~~~~~~~~~~~~~~~~~
AutoModelForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForSequenceClassification
:members:
``AutoModelForQuestionAnswering``
~~~~~~~~~~~~~~~~~~~~~
AutoModelForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForMultipleChoice
:members:
AutoModelForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForTokenClassification
:members:
AutoModelForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForQuestionAnswering
:members:
``AutoModelForTokenClassification``
~~~~~~~~~~~~~~~~~~~~~
TFAutoModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForTokenClassification
.. autoclass:: transformers.TFAutoModel
:members:
TFAutoModelForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForPreTraining
:members:
TFAutoModelWithLMHead
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelWithLMHead
:members:
TFAutoModelForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForSequenceClassification
:members:
TFAutoModelForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForMultipleChoice
:members:
TFAutoModelForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForTokenClassification
:members:
TFAutoModelForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForQuestionAnswering
:members:

View File

@@ -1,33 +1,65 @@
Bart
----------------------------------------------------
**DISCLAIMER:** If you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
BART
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** If you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
@sshleifer
Paper
~~~~~
The Bart model was `proposed <https://arxiv.org/abs/1910.13461>`_ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer on 29 Oct, 2019.
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Bart model was proposed in `BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation,
Translation, and Comprehension <https://arxiv.org/abs/1910.13461>`__ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan
Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer on 29 Oct, 2019.
According to the abstract,
- Bart uses a standard seq2seq/machine translation architecture with a bidirectional encoder (like BERT) and a left-to-right decoder (like GPT).
- The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme, where spans of text are replaced with a single mask token.
- BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE.
- Bart uses a standard seq2seq/machine translation architecture with a bidirectional encoder (like BERT) and a
left-to-right decoder (like GPT).
- The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme,
where spans of text are replaced with a single mask token.
- BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It
matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new
state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
of up to 6 ROUGE.
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/bart>`_
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/bart>`__.
Implementation Notes
~~~~~~~~~~~~~~~~~~~~
- Bart doesn't use :obj:`token_type_ids` for sequence classification. Use BartTokenizer.encode to get the proper splitting.
- The forward pass of ``BartModel`` will create decoder inputs (using the helper function ``transformers.modeling_bart._prepare_bart_decoder_inputs``) if they are not passed. This is different than some other modeling APIs.
- Model predictions are intended to be identical to the original implementation. This only works, however, if the string you pass to ``fairseq.encode`` starts with a space.
- ``BartForConditionalGeneration.generate`` should be used for conditional generation tasks like summarization, see the example in that docstrings
- Models that load the ``"bart-large-cnn"`` weights will not have a ``mask_token_id``, or be able to perform mask filling tasks.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Bart doesn't use :obj:`token_type_ids` for sequence classification. Use :class:`~transformers.BartTokenizer`
or :meth:`~transformers.BartTokenizer.encode` to get the proper splitting.
- The forward pass of :class:`~transformers.BartModel` will create decoder inputs (using the helper function
:func:`transformers.modeling_bart._prepare_bart_decoder_inputs`) if they are not passed. This is different than some
other modeling APIs.
- Model predictions are intended to be identical to the original implementation. This only works, however, if the
string you pass to :func:`fairseq.encode` starts with a space.
- :meth:`~transformers.BartForConditionalGeneration.generate` should be used for conditional generation tasks like
summarization, see the example in that docstrings.
- Models that load the `facebook/bart-large-cnn` weights will not have a :obj:`mask_token_id`, or be able to perform
mask-filling tasks.
- For training/forward passes that don't involve beam search, pass :obj:`use_cache=False`.
BartConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartConfig
:members:
BartTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartTokenizer
:members:
BartModel
~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartModel
:members: forward
@@ -36,21 +68,21 @@ BartModel
BartForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartForConditionalGeneration
:members: generate, forward
:members: forward
BartForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartForSequenceClassification
:members: forward
BartConfig
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartConfig
:members:
BartForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartForQuestionAnswering
:members: forward

View File

@@ -1,13 +1,13 @@
BERT
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The BERT model was proposed in `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`__
by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. It's a bidirectional transformer
pre-trained using a combination of masked language modeling objective and next sentence prediction
on a large corpus comprising the Toronto Book Corpus and Wikipedia.
The BERT model was proposed in `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
<https://arxiv.org/abs/1810.04805>`__ by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. It's a
bidirectional transformer pretrained using a combination of masked language modeling objective and next sentence
prediction on a large corpus comprising the Toronto Book Corpus and Wikipedia.
The abstract from the paper is the following:
@@ -27,25 +27,20 @@ Tips:
- BERT is a model with absolute position embeddings so it's usually advised to pad the inputs on
the right rather than the left.
- BERT was trained with a masked language modeling (MLM) objective. It is therefore efficient at predicting masked
tokens and at NLU in general, but is not optimal for text generation. Models trained with a causal language
modeling (CLM) objective are better in that regard.
- Alongside MLM, BERT was trained using a next sentence prediction (NSP) objective using the [CLS] token as a sequence
approximate. The user may use this token (the first token in a sequence built with special tokens) to get a sequence
prediction rather than a token prediction. However, averaging over the sequence may yield better results than using
the [CLS] token.
- BERT was trained with the masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation.
The original code can be found `here <https://github.com/google-research/bert>`_.
The original code can be found `here <https://github.com/google-research/bert>`__.
BertConfig
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertConfig
:members:
BertTokenizer
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
@@ -53,120 +48,143 @@ BertTokenizer
BertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertTokenizerFast
:members:
Bert specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_bert.BertForPreTrainingOutput
:members:
.. autoclass:: transformers.modeling_tf_bert.TFBertForPreTrainingOutput
:members:
BertModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertModel
:members:
:members: forward
BertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForPreTraining
:members:
:members: forward
BertModelLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertLMHeadModel
:members: forward
BertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForMaskedLM
:members:
:members: forward
BertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForNextSentencePrediction
:members:
:members: forward
BertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForSequenceClassification
:members:
:members: forward
BertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForMultipleChoice
:members:
:members: forward
BertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForTokenClassification
:members:
:members: forward
BertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForQuestionAnswering
:members:
:members: forward
TFBertModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertModel
:members:
:members: call
TFBertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForPreTraining
:members:
:members: call
TFBertModelLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertLMHeadModel
:members: call
TFBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForMaskedLM
:members:
:members: call
TFBertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForNextSentencePrediction
:members:
:members: call
TFBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForSequenceClassification
:members:
:members: call
TFBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForMultipleChoice
:members:
:members: call
TFBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForTokenClassification
:members:
:members: call
TFBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForQuestionAnswering
:members:
:members: call

View File

@@ -0,0 +1,96 @@
BertGeneration
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The BertGeneration model is a BERT model that can be leveraged for sequence-to-sequence tasks using
:class:`~transformers.EncoderDecoderModel` as proposed in `Leveraging Pre-trained Checkpoints for Sequence Generation
Tasks <https://arxiv.org/abs/1907.12461>`__ by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
The abstract from the paper is the following:
*Unsupervised pre-training of large neural models has recently revolutionized Natural Language Processing. By
warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple
benchmarks while saving significant amounts of compute time. So far the focus has been mainly on the Natural Language
Understanding tasks. In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. We
developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT,
GPT-2 and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model, both
encoder and decoder, with these checkpoints. Our models result in new state-of-the-art results on Machine Translation,
Text Summarization, Sentence Splitting, and Sentence Fusion.*
Usage:
- The model can be used in combination with the :class:`~transformers.EncoderDecoderModel` to leverage two pretrained
BERT checkpoints for subsequent fine-tuning.
:: code-block
# leverage checkpoints for Bert2Bert model...
# use BERT's cls token as BOS token and sep token as EOS token
encoder = BertGenerationEncoder.from_pretrained("bert-large-uncased", bos_token_id=101, eos_token_id=102)
# add cross attention layers and use BERT's cls token as BOS token and sep token as EOS token
decoder = BertGenerationDecoder.from_pretrained("bert-large-uncased", add_cross_attention=True, is_decoder=True, bos_token_id=101, eos_token_id=102)
bert2bert = EncoderDecoderModel(encoder=encoder, decoder=decoder)
# create tokenizer...
tokenizer = BertTokenizer.from_pretrained("bert-large-uncased")
input_ids = tokenizer('This is a long article to summarize', add_special_tokens=False, return_tensors="pt").input_ids
labels = tokenizer('This is a short summary', return_tensors="pt").input_ids
# train...
loss = bert2bert(input_ids=input_ids, decoder_input_ids=labels, labels=labels, return_dict=True).loss
loss.backward()
- Pretrained :class:`~transformers.EncoderDecoderModel` are also directly available in the model hub, e.g.,
:: code-block
# instantiate sentence fusion model
sentence_fuser = EncoderDecoderModel.from_pretrained("google/roberta2roberta_L-24_discofuse")
tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_discofuse")
input_ids = tokenizer('This is the first sentence. This is the second sentence.', add_special_tokens=False, return_tensors="pt").input_ids
outputs = sentence_fuser.generate(input_ids)
print(tokenizer.decode(outputs[0]))
Tips:
- :class:`~transformers.BertGenerationEncoder` and :class:`~transformers.BertGenerationDecoder` should be used in
combination with :class:`~transformers.EncoderDecoder`.
- For summarization, sentence splitting, sentence fusion and translation, no special tokens are required for the input.
Therefore, no EOS token should be added to the end of the input.
The original code can be found `here <https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder>`__.
BertGenerationConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertGenerationConfig
:members:
BertGenerationTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertGenerationTokenizer
:members: save_vocabulary
BertGenerationEncoder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertGenerationEncoder
:members: forward
BertGenerationDecoder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertGenerationDecoder
:members: forward

View File

@@ -0,0 +1,75 @@
Blenderbot
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** If you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ .
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Blender chatbot model was proposed in `Recipes for building an open-domain chatbot <https://arxiv.org/pdf/2004.13637.pdf>`__ Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston on 30 Apr 2020.
The abstract of the paper is the following:
*Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to their partners, and displaying knowledge, empathy and personality appropriately, while maintaining a consistent persona. We show that large scale models can learn these skills when given appropriate training data and choice of generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter models, and make our models and code publicly available. Human evaluations show our best models are superior to existing approaches in multi-turn dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models.*
The authors' code can be found `here <https://github.com/facebookresearch/ParlAI>`__ .
Implementation Notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Blenderbot uses a standard `seq2seq model transformer <https://arxiv.org/pdf/1706.03762.pdf>`__ based architecture.
- It inherits completely from :class:`~transformers.BartForConditionalGeneration`
- Even though blenderbot is one model, it uses two tokenizers :class:`~transformers.BlenderbotSmallTokenizer` for 90M checkpoint and :class:`~transformers.BlenderbotTokenizer` for all other checkpoints.
- :class:`~transformers.BlenderbotSmallTokenizer` will always return :class:`~transformers.BlenderbotSmallTokenizer`, regardless of checkpoint. To use the 3B parameter checkpoint, you must call :class:`~transformers.BlenderbotTokenizer` directly.
- Available checkpoints can be found in the `model hub <https://huggingface.co/models?search=blenderbot>`__.
Usage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Model Usage:
>>> from transformers import BlenderbotSmallTokenizer, BlenderbotForConditionalGeneration
>>> mname = 'facebook/blenderbot-90M'
>>> model = BlenderbotForConditionalGeneration.from_pretrained(mname)
>>> tokenizer = BlenderbotSmallTokenizer.from_pretrained(mname)
>>> UTTERANCE = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer([UTTERANCE], return_tensors='pt')
>>> reply_ids = model.generate(**inputs)
>>> print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in reply_ids])
See Config Values:
>>> from transformers import BlenderbotConfig
>>> config_90 = BlenderbotConfig.from_pretrained("facebook/blenderbot-90M")
>>> config_90.to_diff_dict() # show interesting Values.
>>> configuration_3B = BlenderbotConfig("facebook/blenderbot-3B")
>>> configuration_3B.to_diff_dict()
BlenderbotConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BlenderbotConfig
:members:
BlenderbotTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BlenderbotTokenizer
:members: build_inputs_with_special_tokens
BlenderbotSmallTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BlenderbotSmallTokenizer
:members:
BlenderbotForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
See :obj:`transformers.BartForConditionalGeneration` for arguments to `forward` and `generate`
.. autoclass:: transformers.BlenderbotForConditionalGeneration
:members:

View File

@@ -1,5 +1,8 @@
CamemBERT
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The CamemBERT model was proposed in `CamemBERT: a Tasty French Language Model <https://arxiv.org/abs/1911.03894>`__
by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la
@@ -19,20 +22,20 @@ pretrained model for CamemBERT hoping to foster research and downstream applicat
Tips:
- This implementation is the same as RoBERTa. Refer to the `documentation of RoBERTa <./roberta.html>`__ for usage
- This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa <roberta>` for usage
examples as well as the information relative to the inputs and outputs.
The original code can be found `here <https://camembert-model.fr/>`_.
The original code can be found `here <https://camembert-model.fr/>`__.
CamembertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertConfig
:members:
CamembertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
@@ -40,63 +43,91 @@ CamembertTokenizer
CamembertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertModel
:members:
CamembertForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForCausalLM
:members:
CamembertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForMaskedLM
:members:
CamembertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForSequenceClassification
:members:
CamembertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForMultipleChoice
:members:
CamembertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForTokenClassification
:members:
CamembertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForQuestionAnswering
:members:
TFCamembertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertModel
:members:
TFCamembertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForMaskedLM
:members:
TFCamembertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForSequenceClassification
:members:
TFCamembertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForMultipleChoice
:members:
TFCamembertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForTokenClassification
:members:
TFCamembertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForQuestionAnswering
:members:

View File

@@ -1,9 +1,12 @@
CTRL
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
CTRL model was proposed in `CTRL: A Conditional Transformer Language Model for Controllable Generation <https://arxiv.org/abs/1909.05858>`_
by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
It's a causal (unidirectional) transformer pre-trained using language modeling on a very large
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CTRL model was proposed in `CTRL: A Conditional Transformer Language Model for Controllable Generation
<https://arxiv.org/abs/1909.05858>`_ by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and
Richard Socher. It's a causal (unidirectional) transformer pre-trained using language modeling on a very large
corpus of ~140 GB of text data with the first token reserved as a control code (such as Links, Books, Wikipedia etc.).
The abstract from the paper is the following:
@@ -28,50 +31,50 @@ Tips:
it can be observed in the `run_generation.py` example script.
- The PyTorch models can take the `past` as input, which is the previously computed key/value attention pairs. Using
this `past` value prevents the model from re-computing pre-computed values in the context of text generation.
See `reusing the past in generative models <../quickstart.html#using-the-past>`_ for more information on the usage
See `reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage
of this argument.
The original code can be found `here <https://github.com/salesforce/ctrl>`_.
The original code can be found `here <https://github.com/salesforce/ctrl>`__.
CTRLConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CTRLConfig
:members:
CTRLTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CTRLTokenizer
:members: save_vocabulary
CTRLModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CTRLModel
:members:
:members: forward
CTRLLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CTRLLMHeadModel
:members:
:members: forward
TFCTRLModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCTRLModel
:members:
:members: call
TFCTRLLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCTRLLMHeadModel
:members:
:members: call

View File

@@ -0,0 +1,62 @@
DeBERTa
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The DeBERTa model was proposed in `DeBERTa: Decoding-enhanced BERT with Disentangled Attention <https://arxiv.org/abs/2006.03654>`__
by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen
It is based on Google's BERT model released in 2018 and Facebook's RoBERTa model released in 2019.
It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in RoBERTa.
The abstract from the paper is the following:
*Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks.
In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with disentangled attention) that improves the BERT and RoBERTa
models using two novel techniques. The first is the disentangled attention mechanism, where each word is represented using two vectors that encode
its content and position, respectively, and the attention weights among words are computed using disentangled matrices on their contents and
relative positions. Second, an enhanced mask decoder is used to replace the output softmax layer to predict the masked tokens for model pretraining.
We show that these two techniques significantly improve the efficiency of model pre-training and performance of downstream tasks. Compared to
RoBERTa-Large, a DeBERTa model trained on half of the training data performs consistently better on a wide range of NLP tasks, achieving improvements
on MNLI by +0.9% (90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). The DeBERTa code and pre-trained
models will be made publicly available at https://github.com/microsoft/DeBERTa.*
The original code can be found `here <https://github.com/microsoft/DeBERTa>`__.
DebertaConfig
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaConfig
:members:
DebertaTokenizer
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
DebertaModel
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaModel
:members:
DebertaPreTrainedModel
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaPreTrainedModel
:members:
DebertaForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaForSequenceClassification
:members:

View File

@@ -1,8 +1,8 @@
DialoGPT
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DialoGPT was proposed in
`DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536>`_

View File

@@ -1,11 +1,15 @@
DistilBERT
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The DistilBERT model was proposed in the blog post
`Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT <https://medium.com/huggingface/distilbert-8cf3380435b5>`__,
and the paper `DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__.
DistilBERT is a small, fast, cheap and light Transformer model trained by distilling Bert base. It has 40% less
parameters than `bert-base-uncased`, runs 60% faster while preserving over 95% of Bert's performances as measured on
`Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT
<https://medium.com/huggingface/distilbert-8cf3380435b5>`__, and the paper `DistilBERT, a distilled version of BERT:
smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__.
DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less
parameters than `bert-base-uncased`, runs 60% faster while preserving over 95% of BERT's performances as measured on
the GLUE language understanding benchmark.
The abstract from the paper is the following:
@@ -24,83 +28,115 @@ on-device study.*
Tips:
- DistilBert doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token `tokenizer.sep_token` (or `[SEP]`)
- DistilBert doesn't have options to select the input positions (`position_ids` input). This could be added if necessary though, just let's us know if you need this option.
- DistilBERT doesn't have :obj:`token_type_ids`, you don't need to indicate which token belongs to which segment. Just
separate your segments with the separation token :obj:`tokenizer.sep_token` (or :obj:`[SEP]`).
- DistilBERT doesn't have options to select the input positions (:obj:`position_ids` input). This could be added if
necessary though, just let us know if you need this option.
The original code can be found `here <https://github.com/huggingface/transformers/tree/master/examples/distillation>`_.
The original code can be found `here <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
DistilBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertConfig
:members:
DistilBertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertTokenizer
:members:
DistilBertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertTokenizerFast
:members:
DistilBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertModel
:members:
:members: forward
DistilBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForMaskedLM
:members:
:members: forward
DistilBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForSequenceClassification
:members:
:members: forward
DistilBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForMultipleChoice
:members: forward
DistilBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForTokenClassification
:members: forward
DistilBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForQuestionAnswering
:members:
:members: forward
TFDistilBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertModel
:members:
:members: call
TFDistilBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForMaskedLM
:members:
:members: call
TFDistilBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForSequenceClassification
:members:
:members: call
TFDistilBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForMultipleChoice
:members: call
TFDistilBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForTokenClassification
:members: call
TFDistilBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForQuestionAnswering
:members:
:members: call

View File

@@ -0,0 +1,101 @@
DPR
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dense Passage Retrieval (DPR) is a set of tools and models for state-of-the-art open-domain Q&A research.
It was intorduced in `Dense Passage Retrieval for Open-Domain Question Answering <https://arxiv.org/abs/2004.04906>`__
by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih.
The abstract from the paper is the following:
*Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional
sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can
be practically implemented using dense representations alone, where embeddings are learned from a small number of
questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets,
our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage
retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA
benchmarks.*
The original code can be found `here <https://github.com/facebookresearch/DPR>`__.
DPRConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRConfig
:members:
DPRContextEncoderTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRContextEncoderTokenizer
:members:
DPRContextEncoderTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRContextEncoderTokenizerFast
:members:
DPRQuestionEncoderTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRQuestionEncoderTokenizer
:members:
DPRQuestionEncoderTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRQuestionEncoderTokenizerFast
:members:
DPRReaderTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRReaderTokenizer
:members:
DPRReaderTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRReaderTokenizerFast
:members:
DPR specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_dpr.DPRContextEncoderOutput
:members:
.. autoclass:: transformers.modeling_dpr.DPRQuestionEncoderOutput
:members:
.. autoclass:: transformers.modeling_dpr.DPRReaderOutput
:members:
DPRContextEncoder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRContextEncoder
:members: forward
DPRQuestionEncoder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRQuestionEncoder
:members: forward
DPRReader
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRReader
:members: forward

View File

@@ -1,11 +1,14 @@
ELECTRA
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
The ELECTRA model was proposed in the paper.
`ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators <https://openreview.net/pdf?id=r1xMH1BtvB>`__.
ELECTRA is a new pre-training approach which trains two transformer models: the generator and the discriminator. The
generator's role is to replace tokens in a sequence, and is therefore trained as a masked language model. The discriminator,
which is the model we're interested in, tries to identify which tokens were replaced by the generator in the sequence.
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ELECTRA model was proposed in the paper `ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
Generators <https://openreview.net/pdf?id=r1xMH1BtvB>`__. ELECTRA is a new pretraining approach which trains two
transformer models: the generator and the discriminator. The generator's role is to replace tokens in a sequence, and
is therefore trained as a masked language model. The discriminator, which is the model we're interested in, tries to
identify which tokens were replaced by the generator in the sequence.
The abstract from the paper is the following:
@@ -32,93 +35,146 @@ compute and outperforms them when using the same amount of compute.*
Tips:
- ELECTRA is the pre-training approach, therefore there is nearly no changes done to the underlying model: BERT. The
only change is the separation of the embedding size and the hidden size -> The embedding size is generally smaller,
- ELECTRA is the pretraining approach, therefore there is nearly no changes done to the underlying model: BERT. The
only change is the separation of the embedding size and the hidden size: the embedding size is generally smaller,
while the hidden size is larger. An additional projection layer (linear) is used to project the embeddings from
their embedding size to the hidden size. In the case where the embedding size is the same as the hidden size, no
projection layer is used.
- The ELECTRA checkpoints saved using `Google Research's implementation <https://github.com/google-research/electra>`__
contain both the generator and discriminator. The conversion script requires the user to name which model to export
into the correct architecture. Once converted to the HuggingFace format, these checkpoints may be loaded into all
available ELECTRA models, however. This means that the discriminator may be loaded in the `ElectraForMaskedLM` model,
and the generator may be loaded in the `ElectraForPreTraining` model (the classification head will be randomly
initialized as it doesn't exist in the generator).
available ELECTRA models, however. This means that the discriminator may be loaded in the
:class:`~transformers.ElectraForMaskedLM` model, and the generator may be loaded in the
:class:`~transformers.ElectraForPreTraining` model (the classification head will be randomly initialized as it
doesn't exist in the generator).
The original code can be found `here <https://github.com/google-research/electra>`_.
The original code can be found `here <https://github.com/google-research/electra>`__.
ElectraConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraConfig
:members:
ElectraTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraTokenizer
:members:
ElectraTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraTokenizerFast
:members:
Electra specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_electra.ElectraForPreTrainingOutput
:members:
.. autoclass:: transformers.modeling_tf_electra.TFElectraForPreTrainingOutput
:members:
ElectraModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraModel
:members:
:members: forward
ElectraForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForPreTraining
:members:
:members: forward
ElectraForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForMaskedLM
:members:
:members: forward
ElectraForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForSequenceClassification
:members: forward
ElectraForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForMultipleChoice
:members: forward
ElectraForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForTokenClassification
:members:
:members: forward
ElectraForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForQuestionAnswering
:members: forward
TFElectraModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraModel
:members:
:members: call
TFElectraForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForPreTraining
:members:
:members: call
TFElectraForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForMaskedLM
:members:
:members: call
TFElectraForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForSequenceClassification
:members: call
TFElectraForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForMultipleChoice
:members: call
TFElectraForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForTokenClassification
:members:
:members: call
TFElectraForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForQuestionAnswering
:members: call

View File

@@ -1,23 +1,30 @@
Encoder Decoder Models
-----------
-----------------------------------------------------------------------------------------------------------------------
This class can wrap an encoder model, such as ``BertModel`` and a decoder modeling with a language modeling head, such as ``BertForMaskedLM`` into a encoder-decoder model.
The :class:`~transformers.EncoderDecoderModel` can be used to initialize a sequence-to-sequence model with any
pretrained autoencoding model as the encoder and any pretrained autoregressive model as the decoder.
The ``EncoderDecoderModel`` class allows to instantiate a encoder decoder model using the ``from_encoder_decoder_pretrain`` class method taking a pretrained encoder and pretrained decoder model as an input.
The ``EncoderDecoderModel`` is saved using the standard ``save_pretrained()`` method and can also again be loaded using the standard ``from_pretrained()`` method.
The effectiveness of initializing sequence-to-sequence models with pretrained checkpoints for sequence generation tasks
was shown in `Leveraging Pre-trained Checkpoints for Sequence Generation Tasks <https://arxiv.org/abs/1907.12461>`__ by
Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
An application of this architecture could be *summarization* using two pretrained Bert models as is shown in the paper: `Text Summarization with Pretrained Encoders <https://arxiv.org/abs/1910.13461>`_ by Yang Liu and Mirella Lapata.
After such an :class:`~transformers.EncoderDecoderModel` has been trained/fine-tuned, it can be saved/loaded just like
any other models (see the examples for more information).
An application of this architecture could be to leverage two pretrained :class:`~transformers.BertModel` as the encoder
and decoder for a summarization model as was shown in: `Text Summarization with Pretrained Encoders
<https://arxiv.org/abs/1908.08345>`__ by Yang Liu and Mirella Lapata.
``EncoderDecoderConfig``
~~~~~~~~~~~~~~~~~~~~~
EncoderDecoderConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.EncoderDecoderConfig
:members:
``EncoderDecoderModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
EncoderDecoderModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.EncoderDecoderModel
:members:
:members: forward, from_encoder_decoder_pretrained

View File

@@ -1,9 +1,12 @@
FlauBERT
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
The FlauBERT model was proposed in the paper
`FlauBERT: Unsupervised Language Model Pre-training for French <https://arxiv.org/abs/1912.05372>`__ by Hang Le et al.
It's a transformer pre-trained using a masked language modeling (MLM) objective (BERT-like).
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The FlauBERT model was proposed in the paper `FlauBERT: Unsupervised Language Model Pre-training for French
<https://arxiv.org/abs/1912.05372>`__ by Hang Le et al. It's a transformer model pretrained using a masked language
modeling (MLM) objective (like BERT).
The abstract from the paper is the following:
@@ -20,55 +23,109 @@ of the time they outperform other pre-training approaches. Different versions of
evaluation protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared
to the research community for further reproducible experiments in French NLP.*
The original code can be found `here <https://github.com/getalp/Flaubert>`_.
The original code can be found `here <https://github.com/getalp/Flaubert>`__.
FlaubertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertConfig
:members:
FlaubertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertTokenizer
:members:
FlaubertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertModel
:members:
:members: forward
FlaubertWithLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertWithLMHeadModel
:members:
:members: forward
FlaubertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertForSequenceClassification
:members:
:members: forward
FlaubertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertForMultipleChoice
:members: forward
FlaubertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertForTokenClassification
:members: forward
FlaubertForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertForQuestionAnsweringSimple
:members:
:members: forward
FlaubertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertForQuestionAnswering
:members:
:members: forward
TFFlaubertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertModel
:members: call
TFFlaubertWithLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertWithLMHeadModel
:members: call
TFFlaubertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForSequenceClassification
:members: call
TFFlaubertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForMultipleChoice
:members: call
TFFlaubertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForTokenClassification
:members: call
TFFlaubertForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForQuestionAnsweringSimple
:members: call

View File

@@ -0,0 +1,61 @@
FSMT
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** If you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
@stas00.
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
FSMT (FairSeq MachineTranslation) models were introduced in `Facebook FAIR's WMT19 News Translation Task Submission
<https://arxiv.org/abs/1907.06616>`__ by Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, Sergey Edunov.
The abstract of the paper is the following:
*This paper describes Facebook FAIR's submission to the WMT19 shared news translation task. We participate in two
language pairs and four language directions, English <-> German and English <-> Russian. Following our submission from
last year, our baseline systems are large BPE-based transformer models trained with the Fairseq sequence modeling
toolkit which rely on sampled back-translations. This year we experiment with different bitext data filtering schemes,
as well as with adding filtered back-translated data. We also ensemble and fine-tune our models on domain-specific
data, then decode using noisy channel model reranking. Our submissions are ranked first in all four directions of the
human evaluation campaign. On En->De, our system significantly outperforms other systems as well as human translations.
This system improves upon our WMT'18 submission by 4.5 BLEU points.*
The original code can be found here <https://github.com/pytorch/fairseq/tree/master/examples/wmt19>__.
Implementation Notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- FSMT uses source and target vocabulary pairs that aren't combined into one. It doesn't share embeddings tokens
either. Its tokenizer is very similar to :class:`~transformers.XLMTokenizer` and the main model is derived from
:class:`~transformers.BartModel`.
FSMTConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FSMTConfig
:members:
FSMTTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FSMTTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, prepare_seq2seq_batch, save_vocabulary
FSMTModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FSMTModel
:members: forward
FSMTForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FSMTForConditionalGeneration
:members: forward

View File

@@ -0,0 +1,184 @@
Funnel Transformer
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Funnel Transformer model was proposed in the paper `Funnel-Transformer: Filtering out Sequential Redundancy for
Efficient Language Processing <https://arxiv.org/abs/2006.03236>`__. It is a bidirectional transformer model, like
BERT, but with a pooling operation after each block of layers, a bit like in traditional convolutional neural networks
(CNN) in computer vision.
The abstract from the paper is the following:
*With the success of language pretraining, it is highly desirable to develop more efficient architectures of good
scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the
much-overlooked redundancy in maintaining a full-length token-level presentation, especially for tasks that only
require a single-vector presentation of the sequence. With this intuition, we propose Funnel-Transformer which
gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More
importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, we further
improve the model capacity. In addition, to perform token-level predictions as required by common pretraining
objectives, Funnel-Transformer is able to recover a deep representation for each token from the reduced hidden sequence
via a decoder. Empirically, with comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on
a wide variety of sequence-level prediction tasks, including text classification, language understanding, and reading
comprehension.*
Tips:
- Since Funnel Transformer uses pooling, the sequence length of the hidden states changes after each block of layers.
The base model therefore has a final sequence length that is a quarter of the original one. This model can be used
directly for tasks that just require a sentence summary (like sequence classification or multiple choice). For other
tasks, the full model is used; this full model has a decoder that upsamples the final hidden states to the same
sequence length as the input.
- The Funnel Transformer checkpoints are all available with a full version and a base version. The first ones should
be used for :class:`~transformers.FunnelModel`, :class:`~transformers.FunnelForPreTraining`,
:class:`~transformers.FunnelForMaskedLM`, :class:`~transformers.FunnelForTokenClassification` and
class:`~transformers.FunnelForQuestionAnswering`. The second ones should be used for
:class:`~transformers.FunnelBaseModel`, :class:`~transformers.FunnelForSequenceClassification` and
:class:`~transformers.FunnelForMultipleChoice`.
The original code can be found `here <https://github.com/laiguokun/Funnel-Transformer>`__.
FunnelConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelConfig
:members:
FunnelTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
FunnelTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelTokenizerFast
:members:
Funnel specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_funnel.FunnelForPreTrainingOutput
:members:
.. autoclass:: transformers.modeling_tf_funnel.TFFunnelForPreTrainingOutput
:members:
FunnelBaseModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelBaseModel
:members: forward
FunnelModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelModel
:members: forward
FunnelModelForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelForPreTraining
:members: forward
FunnelForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelForMaskedLM
:members: forward
FunnelForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelForSequenceClassification
:members: forward
FunnelForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelForMultipleChoice
:members: forward
FunnelForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelForTokenClassification
:members: forward
FunnelForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelForQuestionAnswering
:members: forward
TFFunnelBaseModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelBaseModel
:members: call
TFFunnelModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelModel
:members: call
TFFunnelModelForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelForPreTraining
:members: call
TFFunnelForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelForMaskedLM
:members: call
TFFunnelForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelForSequenceClassification
:members: call
TFFunnelForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelForMultipleChoice
:members: call
TFFunnelForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelForTokenClassification
:members: call
TFFunnelForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelForQuestionAnswering
:members: call

View File

@@ -1,12 +1,14 @@
OpenAI GPT
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OpenAI GPT model was proposed in `Improving Language Understanding by Generative Pre-Training <https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf>`__
OpenAI GPT model was proposed in `Improving Language Understanding by Generative Pre-Training
<https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf>`__
by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. It's a causal (unidirectional)
transformer pre-trained using language modeling on a large corpus will long range dependencies, the Toronto Book Corpus.
transformer pre-trained using language modeling on a large corpus will long range dependencies, the Toronto Book
Corpus.
The abstract from the paper is the following:
@@ -36,67 +38,95 @@ Tips:
`Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by
Hugging Face showcasing the generative capabilities of several models. GPT is one of them.
The original code can be found `here <https://github.com/openai/finetune-transformer-lm>`_.
The original code can be found `here <https://github.com/openai/finetune-transformer-lm>`__.
Note:
If you want to reproduce the original tokenization process of the `OpenAI GPT` paper, you will need to install
``ftfy`` and ``SpaCy``::
pip install spacy ftfy==4.4.3
python -m spacy download en
If you don't install ``ftfy`` and ``SpaCy``, the :class:`~transformers.OpenAIGPTTokenizer` will default to tokenize
using BERT's :obj:`BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't
worry).
OpenAIGPTConfig
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTConfig
:members:
OpenAIGPTTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTTokenizer
:members: save_vocabulary
OpenAIGPTTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTTokenizerFast
:members:
OpenAI specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_openai.OpenAIGPTDoubleHeadsModelOutput
:members:
.. autoclass:: transformers.modeling_tf_openai.TFOpenAIGPTDoubleHeadsModelOutput
:members:
OpenAIGPTModel
~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTModel
:members:
:members: forward
OpenAIGPTLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTLMHeadModel
:members:
:members: forward
OpenAIGPTDoubleHeadsModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTDoubleHeadsModel
:members:
:members: forward
OpenAIGPTForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTForSequenceClassification
:members: forward
TFOpenAIGPTModel
~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFOpenAIGPTModel
:members:
:members: call
TFOpenAIGPTLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFOpenAIGPTLMHeadModel
:members:
:members: call
TFOpenAIGPTDoubleHeadsModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFOpenAIGPTDoubleHeadsModel
:members:
:members: call

View File

@@ -1,14 +1,13 @@
OpenAI GPT2
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OpenAI GPT-2 model was proposed in
`Language Models are Unsupervised Multitask Learners <https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf>`_
by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
It's a causal (unidirectional) transformer pre-trained using language modeling on a very large
corpus of ~40 GB of text data.
OpenAI GPT-2 model was proposed in `Language Models are Unsupervised Multitask Learners
<https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf>`_
by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever. It's a causal (unidirectional)
transformer pretrained using language modeling on a very large corpus of ~40 GB of text data.
The abstract from the paper is the following:
@@ -27,74 +26,91 @@ Tips:
it can be observed in the `run_generation.py` example script.
- The PyTorch models can take the `past` as input, which is the previously computed key/value attention pairs. Using
this `past` value prevents the model from re-computing pre-computed values in the context of text generation.
See `reusing the past in generative models <../quickstart.html#using-the-past>`_ for more information on the usage
See `reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage
of this argument.
`Write With Transformer <https://transformer.huggingface.co/doc/gpt2-large>`__ is a webapp created and hosted by
Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
different sizes: small, medium, large, xl and a distilled version of the small checkpoint: distilgpt-2.
different sizes: small, medium, large, xl and a distilled version of the small checkpoint: `distilgpt-2`.
The original code can be found `here <https://openai.com/blog/better-language-models/>`_.
The original code can be found `here <https://openai.com/blog/better-language-models/>`__.
GPT2Config
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2Config
:members:
GPT2Tokenizer
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2Tokenizer
:members: save_vocabulary
GPT2TokenizerFast
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2TokenizerFast
:members:
GPT2 specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_gpt2.GPT2DoubleHeadsModelOutput
:members:
.. autoclass:: transformers.modeling_tf_gpt2.TFGPT2DoubleHeadsModelOutput
:members:
GPT2Model
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2Model
:members:
:members: forward
GPT2LMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2LMHeadModel
:members:
:members: forward
GPT2DoubleHeadsModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2DoubleHeadsModel
:members:
:members: forward
GPT2ForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2ForSequenceClassification
:members: forward
TFGPT2Model
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFGPT2Model
:members:
:members: call
TFGPT2LMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFGPT2LMHeadModel
:members:
:members: call
TFGPT2DoubleHeadsModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFGPT2DoubleHeadsModel
:members:
:members: call

View File

@@ -0,0 +1,55 @@
LayoutLM
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The LayoutLM model was proposed in the paper `LayoutLM: Pre-training of Text and Layout for Document Image Understanding <https://arxiv.org/abs/1912.13318>`__
by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou. It's a simple but effective pre-training method
of text and layout for document image understanding and information extraction tasks, such as form understanding and receipt understanding.
The abstract from the paper is the following:
*Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42).*
Tips:
- LayoutLM has an extra input called :obj:`bbox`, which is the bounding boxes of the input tokens.
- The :obj:`bbox` requires the data that on 0-1000 scale, which means you should normalize the bounding box before passing them into model.
The original code can be found `here <https://github.com/microsoft/unilm/tree/master/layoutlm>`_.
LayoutLMConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LayoutLMConfig
:members:
LayoutLMTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LayoutLMTokenizer
:members:
LayoutLMModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LayoutLMModel
:members:
LayoutLMForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LayoutLMForMaskedLM
:members:
LayoutLMForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LayoutLMForTokenClassification
:members:

View File

@@ -0,0 +1,155 @@
Longformer
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** This model is still a work in progress, if you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__.
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Longformer model was presented in `Longformer: The Long-Document Transformer
<https://arxiv.org/pdf/2004.05150.pdf>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
The abstract from the paper is the following:
*Transformer-based models are unable to process long sequences due to their self-attention operation, which scales
quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention
mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or
longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local
windowed attention with a task motivated global attention. Following prior work on long-sequence transformers, we
evaluate Longformer on character-level language modeling and achieve state-of-the-art results on text8 and enwik8. In
contrast to most prior work, we also pretrain Longformer and finetune it on a variety of downstream tasks. Our
pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on
WikiHop and TriviaQA.*
The Authors' code can be found `here <https://github.com/allenai/longformer>`__.
Longformer Self Attention
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Longformer self attention employs self attention on both a "local" context and a "global" context.
Most tokens only attend "locally" to each other meaning that each token attends to its :math:`\frac{1}{2} w` previous
tokens and :math:`\frac{1}{2} w` succeding tokens with :math:`w` being the window length as defined in
:obj:`config.attention_window`. Note that :obj:`config.attention_window` can be of type :obj:`List` to define a
different :math:`w` for each layer. A selected few tokens attend "globally" to all other tokens, as it is
conventionally done for all tokens in :obj:`BertSelfAttention`.
Note that "locally" and "globally" attending tokens are projected by different query, key and value matrices.
Also note that every "locally" attending token not only attends to tokens within its window :math:`w`, but also to all
"globally" attending tokens so that global attention is *symmetric*.
The user can define which tokens attend "locally" and which tokens attend "globally" by setting the tensor
:obj:`global_attention_mask` at run-time appropriately. All Longformer models employ the following logic for
:obj:`global_attention_mask`:
- 0: the token attends "locally",
- 1: the token attends "globally".
For more information please also refer to :meth:`~transformers.LongformerModel.forward` method.
Using Longformer self attention, the memory and time complexity of the query-key matmul operation, which usually
represents the memory and time bottleneck, can be reduced from :math:`\mathcal{O}(n_s \times n_s)` to
:math:`\mathcal{O}(n_s \times w)`, with :math:`n_s` being the sequence length and :math:`w` being the average window
size. It is assumed that the number of "globally" attending tokens is insignificant as compared to the number of
"locally" attending tokens.
For more information, please refer to the official `paper <https://arxiv.org/pdf/2004.05150.pdf>`__.
Training
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
:class:`~transformers.LongformerForMaskedLM` is trained the exact same way :class:`~transformers.RobertaForMaskedLM` is
trained and should be used as follows:
.. code-block::
input_ids = tokenizer.encode('This is a sentence from [MASK] training data', return_tensors='pt')
mlm_labels = tokenizer.encode('This is a sentence from the training data', return_tensors='pt')
loss = model(input_ids, labels=input_ids, masked_lm_labels=mlm_labels)[0]
LongformerConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerConfig
:members:
LongformerTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerTokenizer
:members:
LongformerTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerTokenizerFast
:members:
LongformerModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerModel
:members: forward
LongformerForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForMaskedLM
:members: forward
LongformerForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForSequenceClassification
:members: forward
LongformerForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForMultipleChoice
:members: forward
LongformerForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForTokenClassification
:members: forward
LongformerForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForQuestionAnswering
:members: forward
TFLongformerModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLongformerModel
:members: call
TFLongformerForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLongformerForMaskedLM
:members: call
TFLongformerForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLongformerForQuestionAnswering
:members: call

View File

@@ -0,0 +1,116 @@
LXMERT
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The LXMERT model was proposed in `LXMERT: Learning Cross-Modality Encoder Representations from Transformers
<https://arxiv.org/abs/1908.07490>`__ by Hao Tan & Mohit Bansal. It is a series of bidirectional transformer encoders
(one for the vision modality, one for the language modality, and then one to fuse both modalities) pretrained using a
combination of masked language modeling, visual-language text alignment, ROI-feature regression, masked
visual-attribute modeling, masked visual-object modeling, and visual-question answering objectives.
The pretraining consists of multiple multi-modal datasets: MSCOCO, Visual-Genome + Visual-Genome Question Answering,
VQA 2.0, and GQA.
The abstract from the paper is the following:
*Vision-and-language reasoning requires an understanding of visual concepts, language semantics, and, most importantly,
the alignment and relationships between these two modalities. We thus propose the LXMERT (Learning Cross-Modality
Encoder Representations from Transformers) framework to learn these vision-and-language connections. In LXMERT, we
build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language
encoder, and a cross-modality encoder. Next, to endow our model with the capability of connecting vision and language
semantics, we pre-train the model with large amounts of image-and-sentence pairs, via five diverse representative
pre-training tasks: masked language modeling, masked object prediction (feature regression and label classification),
cross-modality matching, and image question answering. These tasks help in learning both intra-modality and
cross-modality relationships. After fine-tuning from our pretrained parameters, our model achieves the state-of-the-art
results on two visual question answering datasets (i.e., VQA and GQA). We also show the generalizability of our
pretrained cross-modality model by adapting it to a challenging visual-reasoning task, NLVR, and improve the previous
best result by 22% absolute (54% to 76%). Lastly, we demonstrate detailed ablation studies to prove that both our novel
model components and pretraining strategies significantly contribute to our strong results; and also present several
attention visualizations for the different encoders*
Tips:
- Bounding boxes are not necessary to be used in the visual feature embeddings, any kind of visual-spacial features
will work.
- Both the language hidden states and the visual hidden states that LXMERT outputs are passed through the
cross-modality layer, so they contain information from both modalities. To access a modality that only attends to
itself, select the vision/language hidden states from the first input in the tuple.
- The bidirectional cross-modality encoder attention only returns attention values when the language modality is used
as the input and the vision modality is used as the context vector. Further, while the cross-modality encoder
contains self-attention for each respective modality and cross-attention, only the cross attention is returned and
both self attention outputs are disregarded.
The original code can be found `here <https://github.com/airsplay/lxmert>`__.
LxmertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LxmertConfig
:members:
LxmertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LxmertTokenizer
:members:
LxmertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LxmertTokenizerFast
:members:
Lxmert specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_lxmert.LxmertModelOutput
:members:
.. autoclass:: transformers.modeling_lxmert.LxmertForPreTrainingOutput
:members:
.. autoclass:: transformers.modeling_lxmert.LxmertForQuestionAnsweringOutput
:members:
.. autoclass:: transformers.modeling_tf_lxmert.TFLxmertModelOutput
:members:
.. autoclass:: transformers.modeling_tf_lxmert.TFLxmertForPreTrainingOutput
:members:
LxmertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LxmertModel
:members: forward
LxmertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LxmertForPreTraining
:members: forward
LxmertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LxmertForQuestionAnswering
:members: forward
TFLxmertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLxmertModel
:members: call
TFLxmertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLxmertForPreTraining
:members: call

View File

@@ -1,36 +1,51 @@
MarianMT
----------------------------------------------------
**DISCLAIMER:** If you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
@sshleifer. Translations should be similar, but not identical to, output in the test set linked to in each model card.
-----------------------------------------------------------------------------------------------------------------------
**Bugs:** If you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=sshleifer&labels=&template=bug-report.md&title>`__
and assign @sshleifer.
Translations should be similar, but not identical to, output in the test set linked to in each model card.
Implementation Notes
~~~~~~~~~~~~~~~~~~~~
- each model is about 298 MB on disk, there are 1,000+ models.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Each model is about 298 MB on disk, there are more than 1,000 models.
- The list of supported language pairs can be found `here <https://huggingface.co/Helsinki-NLP>`__.
- The 1,000+ models were originally trained by `Jörg Tiedemann <https://researchportal.helsinki.fi/en/persons/j%C3%B6rg-tiedemann>`__ using the `Marian <https://marian-nmt.github.io/>`_ C++ library, which supports fast training and translation.
- All models are transformer encoder-decoders with 6 layers in each component. Each model's performance is documented in a model card.
- the 80 opus models that require BPE preprocessing are not supported.
- The modeling code is the same as ``BartForConditionalGeneration`` with a few minor modifications:
- static (sinusoid) positional embeddings (``MarianConfig.static_position_embeddings=True``)
- a new final_logits_bias (``MarianConfig.add_bias_logits=True``)
- no layernorm_embedding (``MarianConfig.normalize_embedding=False``)
- the model starts generating with pad_token_id (which has 0 token_embedding) as the prefix. (Bart uses <s/>)
- Code to bulk convert models can be found in ``convert_marian_to_pytorch.py``
- Models were originally trained by
`Jörg Tiedemann <https://researchportal.helsinki.fi/en/persons/j%C3%B6rg-tiedemann>`__ using the
`Marian <https://marian-nmt.github.io/>`__ C++ library, which supports fast training and translation.
- All models are transformer encoder-decoders with 6 layers in each component. Each model's performance is documented
in a model card.
- The 80 opus models that require BPE preprocessing are not supported.
- The modeling code is the same as :class:`~transformers.BartForConditionalGeneration` with a few minor modifications:
- static (sinusoid) positional embeddings (:obj:`MarianConfig.static_position_embeddings=True`)
- a new final_logits_bias (:obj:`MarianConfig.add_bias_logits=True`)
- no layernorm_embedding (:obj:`MarianConfig.normalize_embedding=False`)
- the model starts generating with :obj:`pad_token_id` (which has 0 as a token_embedding) as the prefix (Bart uses
:obj:`<s/>`),
- Code to bulk convert models can be found in ``convert_marian_to_pytorch.py``.
Naming
~~~~~~
- All model names use the following format: ``Helsinki-NLP/opus-mt-{src}-{tgt}``
- The language codes used to name models are inconsistent. Two digit codes can usually be found `here <https://developers.google.com/admin-sdk/directory/v1/languages>`_, three digit codes require googling "language code {code}".
- Codes formatted like ``es_AR`` are usually ``code_{region}``. That one is spanish documents from Argentina.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- All model names use the following format: :obj:`Helsinki-NLP/opus-mt-{src}-{tgt}`
- The language codes used to name models are inconsistent. Two digit codes can usually be found `here
<https://developers.google.com/admin-sdk/directory/v1/languages>`__, three digit codes require googling
"language code {code}".
- Codes formatted like :obj:`es_AR` are usually :obj:`code_{region}`. That one is Spanish from Argentina.
Multilingual Models
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All model names use the following format: ``Helsinki-NLP/opus-mt-{src}-{tgt}``:
- if ``src`` is in all caps, the model supports multiple input languages, you can figure out which ones by looking at the model card, or the Group Members `mapping <https://gist.github.com/sshleifer/6d20e7761931b08e73c3219027b97b8a>`_ .
- if ``tgt`` is in all caps, the model can output multiple languages, and you should specify a language code by prepending the desired output language to the src_text
All model names use the following format: :obj:`Helsinki-NLP/opus-mt-{src}-{tgt}`:
- If :obj:`src` is in all caps, the model supports multiple input languages, you can figure out which ones by
looking at the model card, or the Group Members `mapping
<https://gist.github.com/sshleifer/6d20e7761931b08e73c3219027b97b8a>`_ .
- If :obj:`tgt` is in all caps, the model can output multiple languages, and you should specify a language code by
prepending the desired output language to the :obj:`src_text`.
- You can see a tokenizer's supported language codes in ``tokenizer.supported_language_codes``
Example of translating english to many romance languages, using language codes:
@@ -48,18 +63,26 @@ Example of translating english to many romance languages, using language codes:
tokenizer = MarianTokenizer.from_pretrained(model_name)
print(tokenizer.supported_language_codes)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer.prepare_translation_batch(src_text))
translated = model.generate(**tokenizer.prepare_seq2seq_batch(src_text))
tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
# ["c'est une phrase en anglais que nous voulons traduire en français",
# 'Isto deve ir para o português.',
# 'Y esto al español']
Sometimes, models were trained on collections of languages that do not resolve to a group. In this case, _ is used as a separator for src or tgt, as in ``'Helsinki-NLP/opus-mt-en_el_es_fi-en_el_es_fi'``. These still require language codes.
There are many supported regional language codes, like ``>>es_ES<<`` (Spain) and ``>>es_AR<<`` (Argentina), that do not seem to change translations. I have not found these to provide different results than just using ``>>es<<``.
Sometimes, models were trained on collections of languages that do not resolve to a group. In this case, _ is used as a
separator for src or tgt, as in :obj:`Helsinki-NLP/opus-mt-en_el_es_fi-en_el_es_fi`. These still require language
codes.
For Example:
- ``Helsinki-NLP/opus-mt-NORTH_EU-NORTH_EU``: translates from all NORTH_EU languages (see `mapping <https://gist.github.com/sshleifer/6d20e7761931b08e73c3219027b97b8a>`_) to all NORTH_EU languages. Use a special language code like ``>>de<<`` to specify output language.
- ``Helsinki-NLP/opus-mt-ROMANCE-en``: translates from many romance languages to english, no codes needed since there is only 1 tgt language.
There are many supported regional language codes, like :obj:`>>es_ES<<` (Spain) and :obj:`>>es_AR<<` (Argentina), that
do not seem to change translations. I have not found these to provide different results than just using :obj:`>>es<<`.
For example:
- `Helsinki-NLP/opus-mt-NORTH_EU-NORTH_EU`: translates from all NORTH_EU languages (see `mapping
<https://gist.github.com/sshleifer/6d20e7761931b08e73c3219027b97b8a>`_) to all NORTH_EU languages. Use a special
language code like :obj:`>>de<<` to specify output language.
- `Helsinki-NLP/opus-mt-ROMANCE-en`: translates from many romance languages to english, no codes needed since there
is only one target language.
@@ -86,20 +109,21 @@ Code to see available pretrained models:
suffix = [x.split('/')[1] for x in model_ids]
multi_models = [f'{org}/{s}' for s in suffix if s != s.lower()]
MarianMTModel
~~~~~~~~~~~~~
Pytorch version of marian-nmt's transformer.h (c++). Designed for the OPUS-NMT translation checkpoints.
Model API is identical to BartForConditionalGeneration.
Available models are listed at `Model List <https://huggingface.co/models?search=Helsinki-NLP>`__
This class inherits all functionality from ``BartForConditionalGeneration``, see that page for method signatures.
.. autoclass:: transformers.MarianMTModel
MarianConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MarianConfig
:members:
MarianTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MarianTokenizer
:members: prepare_translation_batch
:members: prepare_seq2seq_batch
MarianMTModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MarianMTModel

View File

@@ -0,0 +1,80 @@
MBart
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** If you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
@sshleifer
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The MBart model was presented in `Multilingual Denoising Pre-training for Neural Machine Translation
<https://arxiv.org/abs/2001.08210>`_ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov
Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
According to the abstract, MBART is a sequence-to-sequence denoising auto-encoder pretrained on large-scale monolingual
corpora in many languages using the BART objective. mBART is one of the first methods for pre-training a complete
sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only
on the encoder, decoder, or reconstructing parts of the text.
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__
Training
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
MBart is a multilingual encoder-decoder (seq-to-seq) model primarily intended for translation task.
As the model is multilingual it expects the sequences in a different format. A special language id token
is added in both the source and target text. The source text format is :obj:`X [eos, src_lang_code]`
where :obj:`X` is the source text. The target text format is :obj:`[tgt_lang_code] X [eos]`. :obj:`bos` is never used.
The :meth:`~transformers.MBartTokenizer.prepare_seq2seq_batch` handles this automatically and should be used to encode
the sequences for sequence-to-sequence fine-tuning.
- Supervised training
.. code-block::
example_english_phrase = "UN Chief Says There Is No Military Solution in Syria"
expected_translation_romanian = "Şeful ONU declară că nu există o soluţie militară în Siria"
batch = tokenizer.prepare_seq2seq_batch(example_english_phrase, src_lang="en_XX", tgt_lang="ro_RO", tgt_texts=expected_translation_romanian)
input_ids = batch["input_ids"]
target_ids = batch["decoder_input_ids"]
decoder_input_ids = target_ids[:, :-1].contiguous()
labels = target_ids[:, 1:].clone()
model(input_ids=input_ids, decoder_input_ids=decoder_input_ids, labels=labels) #forward
- Generation
While generating the target text set the :obj:`decoder_start_token_id` to the target language id.
The following example shows how to translate English to Romanian using the `facebook/mbart-large-en-ro` model.
.. code-block::
from transformers import MBartForConditionalGeneration, MBartTokenizer
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-en-ro")
tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-en-ro")
article = "UN Chief Says There Is No Military Solution in Syria"
batch = tokenizer.prepare_seq2seq_batch(src_texts=[article], src_lang="en_XX")
translated_tokens = model.generate(**batch, decoder_start_token_id=tokenizer.lang_code_to_id["ro_RO"])
translation = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
assert translation == "Şeful ONU declară că nu există o soluţie militară în Siria"
MBartConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MBartConfig
:members:
MBartTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MBartTokenizer
:members: build_inputs_with_special_tokens, prepare_seq2seq_batch
MBartForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MBartForConditionalGeneration
:members: forward

View File

@@ -0,0 +1,177 @@
MobileBERT
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The MobileBERT model was proposed in `MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
<https://arxiv.org/abs/2004.02984>`__ by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny
Zhou. It's a bidirectional transformer based on the BERT model, which is compressed and accelerated using several
approaches.
The abstract from the paper is the following:
*Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds
of millions of parameters. However, these models suffer from heavy model sizes and high latency such that they cannot
be deployed to resource-limited mobile devices. In this paper, we propose MobileBERT for compressing and accelerating
the popular BERT model. Like the original BERT, MobileBERT is task-agnostic, that is, it can be generically applied
to various downstream NLP tasks via simple fine-tuning. Basically, MobileBERT is a thin version of BERT_LARGE, while
equipped with bottleneck structures and a carefully designed balance between self-attentions and feed-forward
networks. To train MobileBERT, we first train a specially designed teacher model, an inverted-bottleneck incorporated
BERT_LARGE model. Then, we conduct knowledge transfer from this teacher to MobileBERT. Empirical studies show that
MobileBERT is 4.3x smaller and 5.5x faster than BERT_BASE while achieving competitive results on well-known
benchmarks. On the natural language inference tasks of GLUE, MobileBERT achieves a GLUEscore o 77.7
(0.6 lower than BERT_BASE), and 62 ms latency on a Pixel 4 phone. On the SQuAD v1.1/v2.0 question answering task,
MobileBERT achieves a dev F1 score of 90.0/79.2 (1.5/2.1 higher than BERT_BASE).*
Tips:
- MobileBERT is a model with absolute position embeddings so it's usually advised to pad the inputs on
the right rather than the left.
- MobileBERT is similar to BERT and therefore relies on the masked language modeling (MLM) objective.
It is therefore efficient at predicting masked tokens and at NLU in general, but is not optimal for
text generation. Models trained with a causal language modeling (CLM) objective are better in that regard.
The original code can be found `here <https://github.com/google-research/mobilebert>`__.
MobileBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertConfig
:members:
MobileBertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertTokenizer
:members:
MobileBertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertTokenizerFast
:members:
MobileBert specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_mobilebert.MobileBertForPreTrainingOutput
:members:
.. autoclass:: transformers.modeling_tf_mobilebert.TFMobileBertForPreTrainingOutput
:members:
MobileBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertModel
:members: forward
MobileBertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForPreTraining
:members: forward
MobileBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForMaskedLM
:members: forward
MobileBertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForNextSentencePrediction
:members: forward
MobileBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForSequenceClassification
:members: forward
MobileBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForMultipleChoice
:members: forward
MobileBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForTokenClassification
:members: forward
MobileBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForQuestionAnswering
:members: forward
TFMobileBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertModel
:members: call
TFMobileBertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForPreTraining
:members: call
TFMobileBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForMaskedLM
:members: call
TFMobileBertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForNextSentencePrediction
:members: call
TFMobileBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForSequenceClassification
:members: call
TFMobileBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForMultipleChoice
:members: call
TFMobileBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForTokenClassification
:members: call
TFMobileBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForQuestionAnswering
:members: call

View File

@@ -0,0 +1,96 @@
Pegasus
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** If you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=sshleifer&labels=&template=bug-report.md&title>`__
and assign @sshleifer.
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Pegasus model was proposed in `PEGASUS: Pre-training with Extracted Gap-sentences for
Abstractive Summarization <https://arxiv.org/pdf/1912.08777.pdf>`__ by Jingqing Zhang, Yao Zhao, Mohammad Saleh and
Peter J. Liu on Dec 18, 2019.
According to the abstract,
- Pegasus' pretraining task is intentionally similar to summarization: important sentences are removed/masked from an
input document and are generated together as one output sequence from the remaining sentences, similar to an
extractive summary.
- Pegasus achieves SOTA summarization performance on all 12 downstream tasks, as measured by ROUGE and human eval.
The Authors' code can be found `here <https://github.com/google-research/pegasus>`__.
Checkpoints
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All the `checkpoints <https://huggingface.co/models?search=pegasus>`__ are fine-tuned for summarization, besides
`pegasus-large`, whence the other checkpoints are fine-tuned:
- Each checkpoint is 2.2 GB on disk and 568M parameters.
- FP16 is not supported (help/ideas on this appreciated!).
- Summarizing xsum in fp32 takes about 400ms/sample, with default parameters on a v100 GPU.
- For XSUM, The paper reports rouge1,rouge2, rougeL of paper: 47.21/24.56/39.25. As of Aug 9, this port scores
46.91/24.34/39.1.
The gap is likely because of different alpha/length_penalty implementations in beam search.
Implementation Notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- All models are transformer encoder-decoders with 16 layers in each component.
- The implementation is completely inherited from :class:`~transformers.BartForConditionalGeneration`
- Some key configuration differences:
- static, sinusoidal position embeddings
- no :obj:`layernorm_embedding` (:obj`PegasusConfig.normalize_embedding=False`)
- the model starts generating with pad_token_id (which has 0 token_embedding) as the prefix.
- more beams are used (:obj:`num_beams=8`)
- All pretrained pegasus checkpoints are the same besides three attributes: :obj:`tokenizer.model_max_length` (maximum
input size), :obj:`max_length` (the maximum number of tokens to generate) and :obj:`length_penalty`.
- The code to convert checkpoints trained in the author's `repo <https://github.com/google-research/pegasus>`_ can be
found in ``convert_pegasus_tf_to_pytorch.py``.
Usage Example
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
from transformers import PegasusForConditionalGeneration, PegasusTokenizer
import torch
src_text = [
""" PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow."""
]
model_name = 'google/pegasus-xsum'
torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)
batch = tokenizer.prepare_seq2seq_batch(src_text, truncation=True, padding='longest').to(torch_device)
translated = model.generate(**batch)
tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
assert tgt_text[0] == "California's largest electricity provider has turned off power to hundreds of thousands of customers."
PegasusConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PegasusConfig
PegasusTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
warning: ``add_tokens`` does not work at the moment.
.. autoclass:: transformers.PegasusTokenizer
:members: __call__, prepare_seq2seq_batch
PegasusForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PegasusForConditionalGeneration

View File

@@ -0,0 +1,83 @@
ProphetNet
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** If you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
@patrickvonplaten
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ProphetNet model was proposed in `ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training, <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou on 13 Jan, 2020.
ProphetNet is an encoder-decoder model and can predict n-future tokens for "ngram" language modeling instead of just the next token.
The abstract from the paper is the following:
*In this paper, we present a new sequence-to-sequence pre-training model called ProphetNet, which introduces a novel self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism. Instead of the optimization of one-step ahead prediction in traditional sequence-to-sequence model, the ProphetNet is optimized by n-step ahead prediction which predicts the next n tokens simultaneously based on previous context tokens at each time step. The future n-gram prediction explicitly encourages the model to plan for the future tokens and prevent overfitting on strong local correlations. We pre-train ProphetNet using a base scale dataset (16GB) and a large scale dataset (160GB) respectively. Then we conduct experiments on CNN/DailyMail, Gigaword, and SQuAD 1.1 benchmarks for abstractive summarization and question generation tasks. Experimental results show that ProphetNet achieves new state-of-the-art results on all these datasets compared to the models using the same scale pre-training corpus.*
The Authors' code can be found `here <https://github.com/microsoft/ProphetNet>`__.
ProphetNetConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ProphetNetConfig
:members:
ProphetNetTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ProphetNetTokenizer
:members:
ProphetNet specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_prophetnet.ProphetNetSeq2SeqLMOutput
:members:
.. autoclass:: transformers.modeling_prophetnet.ProphetNetSeq2SeqModelOutput
:members:
.. autoclass:: transformers.modeling_prophetnet.ProphetNetDecoderModelOutput
:members:
.. autoclass:: transformers.modeling_prophetnet.ProphetNetDecoderLMOutput
:members:
ProphetNetModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ProphetNetModel
:members: forward
ProphetNetEncoder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ProphetNetEncoder
:members: forward
ProphetNetDecoder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ProphetNetDecoder
:members: forward
ProphetNetForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ProphetNetForConditionalGeneration
:members: forward
ProphetNetForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ProphetNetForCausalLM
:members: forward

View File

@@ -0,0 +1,90 @@
RAG
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
Retrieval-augmented generation ("RAG") models combine the powers of pretrained dense retrieval (DPR) and
sequence-to-sequence models. RAG models retrieve documents, pass them to a seq2seq model, then marginalize to generate
outputs. The retriever and seq2seq modules are initialized from pretrained models, and fine-tuned jointly, allowing
both retrieval and generation to adapt to downstream tasks.
It is based on the paper `Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
<https://arxiv.org/abs/2005.11401>`__ by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir
Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
The abstract from the paper is the following:
*Large pre-trained language models have been shown to store factual knowledge
in their parameters, and achieve state-of-the-art results when fine-tuned on
downstream NLP tasks. However, their ability to access and precisely manipulate
knowledge is still limited, and hence on knowledge-intensive tasks, their
performance lags behind task-specific architectures. Additionally, providing
provenance for their decisions and updating their world knowledge remain open
research problems. Pre-trained models with a differentiable access mechanism to
explicit nonparametric memory can overcome this issue, but have so far been only
investigated for extractive downstream tasks. We explore a general-purpose
fine-tuning recipe for retrieval-augmented generation (RAG) — models which combine
pre-trained parametric and non-parametric memory for language generation. We
introduce RAG models where the parametric memory is a pre-trained seq2seq model and
the non-parametric memory is a dense vector index of Wikipedia, accessed with
a pre-trained neural retriever. We compare two RAG formulations, one which
conditions on the same retrieved passages across the whole generated sequence, the
other can use different passages per token. We fine-tune and evaluate our models
on a wide range of knowledge-intensive NLP tasks and set the state-of-the-art
on three open domain QA tasks, outperforming parametric seq2seq models and
task-specific retrieve-and-extract architectures. For language generation tasks, we
find that RAG models generate more specific, diverse and factual language than a
state-of-the-art parametric-only seq2seq baseline.*
RagConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RagConfig
:members:
RagTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RagTokenizer
:members: prepare_seq2seq_batch
Rag specific outputs
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_rag.RetrievAugLMMarginOutput
:members:
.. autoclass:: transformers.modeling_rag.RetrievAugLMOutput
:members:
RagRetriever
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RagRetriever
:members:
RagModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RagModel
:members: forward
RagSequenceForGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RagSequenceForGeneration
:members: forward, generate
RagTokenForGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RagTokenForGeneration
:members: forward, generate

View File

@@ -1,20 +1,37 @@
Reformer
----------------------------------------------------
**DISCLAIMER:** This model is still a work in progress, if you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** This model is still a work in progress, if you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__.
Overview
~~~~~
The Reformer model was presented in `Reformer: The Efficient Transformer <https://https://arxiv.org/abs/2001.04451.pdf>`_ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
Here the abstract:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O(L^2) to O(Llog(L)), where L is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.*
The Reformer model was proposed in the paper `Reformer: The Efficient Transformer
<https://arxiv.org/abs/2001.04451.pdf>`__ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
The Authors' code can be found `here <https://github.com/google/trax/tree/master/trax/models/reformer>`_ .
The abstract from the paper is the following:
*Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can
be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of
Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its
complexity from O(L^2) to O(Llog(L)), where L is the length of the sequence. Furthermore, we use reversible residual
layers instead of the standard residuals, which allows storing activations only once in the training process instead of
N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models
while being much more memory-efficient and much faster on long sequences.*
The Authors' code can be found `here <https://github.com/google/trax/tree/master/trax/models/reformer>`__.
Axial Positional Encodings
~~~~~~~~~~~~~~~~~~~~
Axial Positional Encodings were first implemented in Google's `trax library <https://github.com/google/trax/blob/4d99ad4965bab1deba227539758d59f0df0fef48/trax/layers/research/position_encodings.py#L29>`_ and developed by the authors of this model's paper. In models that are treating very long input sequences, the conventional position id encodings store an embedings vector of size :math:`d` being the ``config.hidden_size`` for every position :math:`i, \ldots, n_s`, with :math:`n_s` being ``config.max_embedding_size``. *E.g.*, having a sequence length of :math:`n_s = 2^{19} \approx 0.5M` and a ``config.hidden_size`` of :math:`d = 2^{10} \approx 1000` would result in a position encoding matrix:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Axial Positional Encodings were first implemented in Google's `trax library
<https://github.com/google/trax/blob/4d99ad4965bab1deba227539758d59f0df0fef48/trax/layers/research/position_encodings.py#L29>`__
and developed by the authors of this model's paper. In models that are treating very long input sequences, the
conventional position id encodings store an embedings vector of size :math:`d` being the :obj:`config.hidden_size` for
every position :math:`i, \ldots, n_s`, with :math:`n_s` being :obj:`config.max_embedding_size`. This means that having
a sequence length of :math:`n_s = 2^{19} \approx 0.5M` and a ``config.hidden_size`` of :math:`d = 2^{10} \approx 1000`
would result in a position encoding matrix:
.. math::
X_{i,j}, \text{ with } i \in \left[1,\ldots, d\right] \text{ and } j \in \left[1,\ldots, n_s\right]
@@ -42,73 +59,127 @@ Therefore the following holds:
X^{2}_{i - d^1, l}, & \text{if } i \ge d^1 \text{ with } l = \lfloor\frac{j}{n_s^1}\rfloor
\end{cases}
Intuitively, this means that a position embedding vector :math:`x_j \in \mathbb{R}^{d}` is now the composition of two factorized embedding vectors: :math:`x^1_{k, l} + x^2_{l, k}`, where as the ``config.max_embedding_size`` dimension :math:`j` is factorized into :math:`k \text{ and } l`.
This design ensures that each position embedding vector :math:`x_j` is unique.
Intuitively, this means that a position embedding vector :math:`x_j \in \mathbb{R}^{d}` is now the composition of two
factorized embedding vectors: :math:`x^1_{k, l} + x^2_{l, k}`, where as the :obj:`config.max_embedding_size` dimension
:math:`j` is factorized into :math:`k \text{ and } l`. This design ensures that each position embedding vector
:math:`x_j` is unique.
Using the above example again, axial position encoding with :math:`d^1 = 2^5, d^2 = 2^5, n_s^1 = 2^9, n_s^2 = 2^{10}` can drastically reduced the number of parameters to :math:`2^{14} + 2^{15} \approx 49000` parameters.
In practice, the parameter ``config.axial_pos_embds_dim`` is set to ``list``:math:`(d^1, d^2)` which sum has to be equal to ``config.hidden_size`` and ``config.axial_pos_shape`` is set to ``list``:math:`(n_s^1, n_s^2)` and which product has to be equal to ``config.max_embedding_size`` which during training has to be equal to the ``sequence length`` of the ``input_ids``.
Using the above example again, axial position encoding with :math:`d^1 = 2^5, d^2 = 2^5, n_s^1 = 2^9, n_s^2 = 2^{10}`
can drastically reduced the number of parameters to :math:`2^{14} + 2^{15} \approx 49000` parameters.
In practice, the parameter :obj:`config.axial_pos_embds_dim` is set to a tuple :math:`(d^1, d^2)` which sum has to
be equal to :obj:`config.hidden_size` and :obj:`config.axial_pos_shape` is set to a tuple :math:`(n_s^1, n_s^2)` which
product has to be equal to :obj:`config.max_embedding_size`, which during training has to be equal to the
`sequence length` of the :obj:`input_ids`.
LSH Self Attention
~~~~~~~~~~~~~~~~~~~~
In Locality sensitive hashing (LSH) self attention the key and query projection weights are tied. Therefore, the key query embedding vectors are also tied.
LSH self attention uses the locality sensitive
hashing mechanism proposed in `Practical and Optimal LSH for Angular Distance <https://arxiv.org/abs/1509.02897>`_ to assign each of the tied key query embedding vectors to one of ``config.num_buckets`` possible buckets. The premise is that the more "similar" key query embedding vectors (in terms of *cosine similarity*) are to each other, the more likely they are assigned to the same bucket.
The accuracy of the LSH mechanism can be improved by increasing ``config.num_hashes`` or directly the argument ``num_hashes`` of the forward function so that the output of the LSH self attention better approximates the output of the "normal" full self attention.
The buckets are then sorted and chunked into query key embedding vector chunks each of length ``config.lsh_chunk_length``. For each chunk, the query embedding vectors attend to its key vectors (which are tied to themselves) and to the key embedding vectors of ``config.lsh_num_chunks_before`` previous neighboring chunks and ``config.lsh_num_chunks_after`` following neighboring chunks.
For more information, see the `original Paper <https://arxiv.org/abs/2001.04451>`_ or this great `blog post <https://www.pragmatic.ml/reformer-deep-dive/>`_.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In Locality sensitive hashing (LSH) self attention the key and query projection weights are tied. Therefore, the key
query embedding vectors are also tied. LSH self attention uses the locality sensitive hashing mechanism proposed in
`Practical and Optimal LSH for Angular Distance <https://arxiv.org/abs/1509.02897>`__ to assign each of the tied key
query embedding vectors to one of :obj:`config.num_buckets` possible buckets. The premise is that the more "similar"
key query embedding vectors (in terms of *cosine similarity*) are to each other, the more likely they are assigned to
the same bucket.
Note that ``config.num_buckets`` can also be factorized into a ``list``:math:`(n_{\text{buckets}}^1, n_{\text{buckets}}^2)`. This way instead of assigning the query key embedding vectors to one of :math:`(1,\ldots, n_{\text{buckets}})` they are assigned to one of :math:`(1-1,\ldots, n_{\text{buckets}}^1-1, \ldots, 1-n_{\text{buckets}}^2, \ldots, n_{\text{buckets}}^1-n_{\text{buckets}}^2)`. This is crucial for very long sequences to save memory.
The accuracy of the LSH mechanism can be improved by increasing :obj:`config.num_hashes` or directly the argument
:obj:`num_hashes` of the forward function so that the output of the LSH self attention better approximates the output
of the "normal" full self attention. The buckets are then sorted and chunked into query key embedding vector chunks
each of length :obj:`config.lsh_chunk_length`. For each chunk, the query embedding vectors attend to its key vectors
(which are tied to themselves) and to the key embedding vectors of :obj:`config.lsh_num_chunks_before` previous
neighboring chunks and :obj:`config.lsh_num_chunks_after` following neighboring chunks.
It is recommended to leave ``config.num_buckets=None``, so that depending on the sequence length, a good value for ``num_buckets`` are calculated on the fly.
For more information, see the `original Paper <https://arxiv.org/abs/2001.04451>`__ or this great `blog post
<https://www.pragmatic.ml/reformer-deep-dive/>`__.
Using LSH self attention, the memory and time complexity of the query-key matmul operation can be reduced from :math:`\mathcal{O}(n_s \times n_s)` to :math:`\mathcal{O}(n_s \times \log(n_s))`, which usually represents the memory and time bottleneck in a transformer model, with :math:`n_s` being the sequence length.
Note that :obj:`config.num_buckets` can also be factorized into a list
:math:`(n_{\text{buckets}}^1, n_{\text{buckets}}^2)`. This way instead of assigning the query key embedding vectors to
one of :math:`(1,\ldots, n_{\text{buckets}})` they are assigned to one of
:math:`(1-1,\ldots, n_{\text{buckets}}^1-1, \ldots, 1-n_{\text{buckets}}^2, \ldots, n_{\text{buckets}}^1-n_{\text{buckets}}^2)`.
This is crucial for very long sequences to save memory.
When training a model from scratch, it is recommended to leave :obj:`config.num_buckets=None`, so that depending on the
sequence length a good value for :obj:`num_buckets` is calculated on the fly. This value will then automatically be
saved in the config and should be reused for inference.
Using LSH self attention, the memory and time complexity of the query-key matmul operation can be reduced from
:math:`\mathcal{O}(n_s \times n_s)` to :math:`\mathcal{O}(n_s \times \log(n_s))`, which usually represents the memory
and time bottleneck in a transformer model, with :math:`n_s` being the sequence length.
Local Self Attention
~~~~~~~~~~~~~~~~~~~~
Local self attention is essentially a "normal" self attention layer with
key, query and value projections, but is chunked so that in each chunk of length ``config.local_chunk_length`` the query embedding vectors only attends to the key embedding vectors in its chunk and to the key embedding vectors of ``config.local_num_chunks_before`` previous neighboring chunks and ``config.local_num_chunks_after`` following neighboring chunks.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Using Local self attention, the memory and time complexity of the query-key matmul operation can be reduced from :math:`\mathcal{O}(n_s \times n_s)` to :math:`\mathcal{O}(n_s \times \log(n_s))`, which usually represents the memory and time bottleneck in a transformer model, with :math:`n_s` being the sequence length.
Local self attention is essentially a "normal" self attention layer with key, query and value projections, but is
chunked so that in each chunk of length :obj:`config.local_chunk_length` the query embedding vectors only attends to
the key embedding vectors in its chunk and to the key embedding vectors of :obj:`config.local_num_chunks_before`
previous neighboring chunks and :obj:`config.local_num_chunks_after` following neighboring chunks.
Using Local self attention, the memory and time complexity of the query-key matmul operation can be reduced from
:math:`\mathcal{O}(n_s \times n_s)` to :math:`\mathcal{O}(n_s \times \log(n_s))`, which usually represents the memory
and time bottleneck in a transformer model, with :math:`n_s` being the sequence length.
Training
~~~~~~~~~~~~~~~~~~~~
During training, we must ensure that the sequence length is set to a value that can be divided by the least common multiple of ``config.lsh_chunk_length`` and ``config.local_chunk_length`` and that the parameters of the Axial Positional Encodings are correctly set as described above. Reformer is very memory efficient so that the model can easily be trained on sequences as long as 64000 tokens.
For training, the ``ReformerModelWithLMHead`` should be used as follows:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
::
During training, we must ensure that the sequence length is set to a value that can be divided by the least common
multiple of :obj:`config.lsh_chunk_length` and :obj:`config.local_chunk_length` and that the parameters of the Axial
Positional Encodings are correctly set as described above. Reformer is very memory efficient so that the model can
easily be trained on sequences as long as 64000 tokens.
For training, the :class:`~transformers.ReformerModelWithLMHead` should be used as follows:
.. code-block::
input_ids = tokenizer.encode('This is a sentence from the training data', return_tensors='pt')
loss = model(input_ids, labels=input_ids)[0]
ReformerConfig
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ReformerConfig
:members:
ReformerTokenizer
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ReformerTokenizer
:members:
:members: save_vocabulary
ReformerModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ReformerModel
:members:
:members: forward
ReformerModelWithLMHead
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ReformerModelWithLMHead
:members:
:members: forward
ReformerForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ReformerForMaskedLM
:members: forward
ReformerForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ReformerForSequenceClassification
:members: forward
ReformerForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ReformerForQuestionAnswering
:members: forward

View File

@@ -0,0 +1,40 @@
RetriBERT
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The RetriBERT model was proposed in the blog post `Explain Anything Like I'm Five: A Model for Open Domain Long Form
Question Answering <https://yjernite.github.io/lfqa.html>`__. RetriBERT is a small model that uses either a single or
pair of BERT encoders with lower-dimension projection for dense semantic indexing of text.
Code to train and use the model can be found `here
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
RetriBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RetriBertConfig
:members:
RetriBertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RetriBertTokenizer
:members:
RetriBertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RetriBertTokenizerFast
:members:
RetriBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RetriBertModel
:members: forward

View File

@@ -1,9 +1,12 @@
RoBERTa
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
The RoBERTa model was proposed in `RoBERTa: A Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_
by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer,
Veselin Stoyanov. It is based on Google's BERT model released in 2018.
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The RoBERTa model was proposed in `RoBERTa: A Robustly Optimized BERT Pretraining Approach
<https://arxiv.org/abs/1907.11692>`_ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer
Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. It is based on Google's BERT model released in 2018.
It builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining
objective and training with much larger mini-batches and learning rates.
@@ -24,22 +27,23 @@ Tips:
- This implementation is the same as :class:`~transformers.BertModel` with a tiny embeddings tweak as well as a
setup for Roberta pretrained models.
- RoBERTa has the same architecture as BERT, but uses a byte-level BPE as a tokenizer (same as GPT-2) and uses a
different pre-training scheme.
- RoBERTa doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token `tokenizer.sep_token` (or `</s>`)
- `Camembert <./camembert.html>`__ is a wrapper around RoBERTa. Refer to this page for usage examples.
different pretraining scheme.
- RoBERTa doesn't have :obj:`token_type_ids`, you don't need to indicate which token belongs to which segment. Just
separate your segments with the separation token :obj:`tokenizer.sep_token` (or :obj:`</s>`)
- :doc:`CamemBERT <camembert>` is a wrapper around RoBERTa. Refer to this page for usage examples.
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_.
RobertaConfig
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaConfig
:members:
RobertaTokenizer
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
@@ -47,62 +51,98 @@ RobertaTokenizer
RobertaTokenizerFast
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaTokenizerFast
:members: build_inputs_with_special_tokens
RobertaModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaModel
:members:
:members: forward
RobertaForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaForCausalLM
:members: forward
RobertaForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaForMaskedLM
:members:
:members: forward
RobertaForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaForSequenceClassification
:members:
:members: forward
RobertaForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaForMultipleChoice
:members: forward
RobertaForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaForTokenClassification
:members:
:members: forward
RobertaForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaForQuestionAnswering
:members: forward
TFRobertaModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaModel
:members:
:members: call
TFRobertaForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaForMaskedLM
:members:
:members: call
TFRobertaForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaForSequenceClassification
:members:
:members: call
TFRobertaForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaForMultipleChoice
:members: call
TFRobertaForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaForTokenClassification
:members:
:members: call
TFRobertaForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaForQuestionAnswering
:members: call

View File

@@ -0,0 +1,103 @@
SqueezeBERT
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The SqueezeBERT model was proposed in
`SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
<https://arxiv.org/abs/2006.11316>`__
by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, Kurt W. Keutzer.
It's a bidirectional transformer similar to the BERT model.
The key difference between the BERT architecture and the SqueezeBERT architecture
is that SqueezeBERT uses `grouped convolutions <https://blog.yani.io/filter-group-tutorial>`__
instead of fully-connected layers for the Q, K, V and FFN layers.
The abstract from the paper is the following:
*Humans read and write hundreds of billions of messages every day. Further, due to the availability of
large datasets, large computing systems, and better neural network models, natural language processing (NLP)
technology has made significant strides in understanding, proofreading, and organizing these messages.
Thus, there is a significant opportunity to deploy NLP in myriad applications to help web users,
social networks, and businesses. In particular, we consider smartphones and other mobile devices as
crucial platforms for deploying NLP models at scale. However, today's highly-accurate NLP neural network
models such as BERT and RoBERTa are extremely computationally expensive, with BERT-base taking 1.7 seconds
to classify a text snippet on a Pixel 3 smartphone. In this work, we observe that methods such as grouped
convolutions have yielded significant speedups for computer vision networks, but many of these techniques
have not been adopted by NLP neural network designers. We demonstrate how to replace several operations in
self-attention layers with grouped convolutions, and we use this technique in a novel network architecture
called SqueezeBERT, which runs 4.3x faster than BERT-base on the Pixel 3 while achieving competitive
accuracy on the GLUE test set. The SqueezeBERT code will be released.*
Tips:
- SqueezeBERT is a model with absolute position embeddings so it's usually advised to pad the inputs on
the right rather than the left.
- SqueezeBERT is similar to BERT and therefore relies on the masked language modeling (MLM) objective.
It is therefore efficient at predicting masked tokens and at NLU in general, but is not optimal for
text generation. Models trained with a causal language modeling (CLM) objective are better in that regard.
- For best results when finetuning on sequence classification tasks, it is recommended to start with the
`squeezebert/squeezebert-mnli-headless` checkpoint.
SqueezeBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SqueezeBertConfig
:members:
SqueezeBertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SqueezeBertTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
SqueezeBertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SqueezeBertTokenizerFast
:members:
SqueezeBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SqueezeBertModel
:members:
SqueezeBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SqueezeBertForMaskedLM
:members:
SqueezeBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SqueezeBertForSequenceClassification
:members:
SqueezeBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SqueezeBertForMultipleChoice
:members:
SqueezeBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SqueezeBertForTokenClassification
:members:
SqueezeBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SqueezeBertForQuestionAnswering
:members:

View File

@@ -1,105 +1,124 @@
T5
----------------------------------------------------
**DISCLAIMER:** This model is still a work in progress, if you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** This model is still a work in progress, if you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__.
Overview
~~~~~
The T5 model was presented in `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer <https://arxiv.org/pdf/1910.10683.pdf>`_ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu in
Here the abstract:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice.
In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format.
Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks.
By combining the insights from our exploration with scale and our new "Colossal Clean Crawled Corpus", we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.*
The T5 model was presented in `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
<https://arxiv.org/pdf/1910.10683.pdf>`_ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang,
Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
The Authors' code can be found `here <https://github.com/google-research/text-to-text-transfer-transformer>`_ .
The abstract from the paper is the following:
*Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream
task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning
has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of
transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a
text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer
approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration
with scale and our new "Colossal Clean Crawled Corpus", we achieve state-of-the-art results on many benchmarks covering
summarization, question answering, text classification, and more. To facilitate future work on transfer learning for
NLP, we release our dataset, pre-trained models, and code.*
Tips:
- T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which
each task is converted into a text-to-text format. T5 works well on a variety of tasks out-of-the-box by prepending a
different prefix to the input corresponding to each task, e.g., for translation: *translate English to German: ...*,
for summarization: *summarize: ...*.
For more information about which prefix to use, it is easiest to look into Appendix D of the `paper
<https://arxiv.org/pdf/1910.10683.pdf>`__.
- For sequence-to-sequence generation, it is recommended to use :obj:`T5ForConditionalGeneration.generate()``. This
method takes care of feeding the encoded input via cross-attention layers to the decoder and auto-regressively
generates the decoder output.
- T5 uses relative scalar embeddings. Encoder input padding can be done on the left and on the right.
The original code can be found `here <https://github.com/google-research/text-to-text-transfer-transformer>`__.
Training
~~~~~~~~~~~~~~~~~~~~
T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher forcing.
This means that for training we always need an input sequence and a target sequence.
The input sequence is fed to the model using ``input_ids``. The target sequence is shifted to the right, *i.e.* prepended by a start-sequence token and fed to the decoder using the `decoder_input_ids`. In teacher-forcing style, the target sequence is then appended by the EOS token and corresponds to the ``lm_labels``. The PAD token is hereby used as the start-sequence token.
T5 can be trained / fine-tuned both in a supervised and unsupervised fashion.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher
forcing. This means that for training we always need an input sequence and a target sequence. The input sequence is fed
to the model using :obj:`input_ids``. The target sequence is shifted to the right, i.e., prepended by a start-sequence
token and fed to the decoder using the :obj:`decoder_input_ids`. In teacher-forcing style, the target sequence is then
appended by the EOS token and corresponds to the :obj:`labels`. The PAD token is hereby used as the start-sequence
token. T5 can be trained / fine-tuned both in a supervised and unsupervised fashion.
- Unsupervised denoising training
In this setup spans of the input sequence are masked by so-called sentinel tokens (*a.k.a* unique mask tokens)
and the output sequence is formed as a concatenation of the same sentinel tokens and the *real* masked tokens.
Each sentinel token represents a unique mask token for this sentence and should start with ``<extra_id_1>``, ``<extra_id_2>``, ... up to ``<extra_id_100>``. As a default 100 sentinel tokens are available in ``T5Tokenizer``.
*E.g.* the sentence "The cute dog walks in the park" with the masks put on "cute dog" and "the" should be processed as follows:
Each sentinel token represents a unique mask token for this sentence and should start with :obj:`<extra_id_0>`,
:obj:`<extra_id_1>`, ... up to :obj:`<extra_id_99>`. As a default, 100 sentinel tokens are available in
:class:`~transformers.T5Tokenizer`.
For instance, the sentence "The cute dog walks in the park" with the masks put on "cute dog" and "the" should be
processed as follows:
::
.. code-block::
input_ids = tokenizer.encode('The <extra_id_1> walks in <extra_id_2> park', return_tensors='pt')
lm_labels = tokenizer.encode('<extra_id_1> cute dog <extra_id_2> the <extra_id_3> </s>', return_tensors='pt')
input_ids = tokenizer('The <extra_id_0> walks in <extra_id_1> park', return_tensors='pt').input_ids
labels = tokenizer('<extra_id_0> cute dog <extra_id_1> the <extra_id_2>', return_tensors='pt').input_ids
# the forward function automatically creates the correct decoder_input_ids
model(input_ids=input_ids, lm_labels=lm_labels)
loss = model(input_ids=input_ids, labels=labels, return_dict=True).loss
- Supervised training
In this setup the input sequence and output sequence are standard sequence to sequence input output mapping.
In translation, *e.g.* the input sequence "The house is wonderful." and output sequence "Das Haus ist wunderbar." should
be processed as follows:
In this setup the input sequence and output sequence are standard sequence-to-sequence input output mapping.
In translation, for instance with the input sequence "The house is wonderful." and output sequence "Das Haus ist
wunderbar.", the sentences should be processed as follows:
::
.. code-block::
input_ids = tokenizer.encode('translate English to German: The house is wonderful. </s>', return_tensors='pt')
lm_labels = tokenizer.encode('Das Haus ist wunderbar. </s>', return_tensors='pt')
input_ids = tokenizer('translate English to German: The house is wonderful.', return_tensors='pt').input_ids
labels = tokenizer('Das Haus ist wunderbar.', return_tensors='pt').input_ids
# the forward function automatically creates the correct decoder_input_ids
model(input_ids=input_ids, lm_labels=lm_labels)
Tips
~~~~~~~~~~~~~~~~~~~~
- T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised
and supervised tasks and for which each task is converted into a text-to-text format.
T5 works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g.: for translation: *translate English to German: ..., summarize: ...*.
For more information about which prefix to use, it is easiest to look into Appendix D of the `paper <https://arxiv.org/pdf/1910.10683.pdf>`_ .
- For sequence to sequence generation, it is recommended to use ``T5ForConditionalGeneration.generate()``. The method takes care of feeding the encoded input via cross-attention layers to the decoder and auto-regressively generates the decoder output.
- T5 uses relative scalar embeddings. Encoder input padding can be done on the left and on the right.
The original code can be found `here <https://github.com/google-research/text-to-text-transfer-transformer>`_.
loss = model(input_ids=input_ids, labels=labels, return_dict=True).loss
T5Config
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.T5Config
:members:
T5Tokenizer
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.T5Tokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
create_token_type_ids_from_sequences, prepare_seq2seq_batch, save_vocabulary
T5Model
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.T5Model
:members:
:members: forward
T5ForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.T5ForConditionalGeneration
:members:
:members: forward
TFT5Model
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFT5Model
:members:
:members: call
TFT5ForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFT5ForConditionalGeneration
:members:
:members: call

View File

@@ -1,15 +1,14 @@
Transformer XL
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Transformer-XL model was proposed in
`Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__
by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
It's a causal (uni-directional) transformer with relative positioning (sinusoïdal) embeddings which can reuse
previously computed hidden-states to attend to longer context (memory).
This model also uses adaptive softmax inputs and outputs (tied).
The Transformer-XL model was proposed in `Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
<https://arxiv.org/abs/1901.02860>`__ by Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan
Salakhutdinov. It's a causal (uni-directional) transformer with relative positioning (sinusoïdal) embeddings which can
reuse previously computed hidden-states to attend to longer context (memory). This model also uses adaptive softmax
inputs and outputs (tied).
The abstract from the paper is the following:
@@ -30,53 +29,62 @@ Tips:
The original implementation trains on SQuAD with padding on the left, therefore the padding defaults are set to left.
- Transformer-XL is one of the few models that has no sequence length limit.
The original code can be found `here <https://github.com/kimiyoung/transformer-xl>`_.
The original code can be found `here <https://github.com/kimiyoung/transformer-xl>`__.
TransfoXLConfig
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TransfoXLConfig
:members:
TransfoXLTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TransfoXLTokenizer
:members: save_vocabulary
TransfoXLTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~
TransfoXL specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TransfoXLTokenizerFast
.. autoclass:: transformers.modeling_transfo_xl.TransfoXLModelOutput
:members:
.. autoclass:: transformers.modeling_transfo_xl.TransfoXLLMHeadModelOutput
:members:
.. autoclass:: transformers.modeling_tf_transfo_xl.TFTransfoXLModelOutput
:members:
.. autoclass:: transformers.modeling_tf_transfo_xl.TFTransfoXLLMHeadModelOutput
:members:
TransfoXLModel
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TransfoXLModel
:members:
:members: forward
TransfoXLLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TransfoXLLMHeadModel
:members:
:members: forward
TFTransfoXLModel
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFTransfoXLModel
:members:
:members: call
TFTransfoXLLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFTransfoXLLMHeadModel
:members:
:members: call

View File

@@ -1,15 +1,15 @@
XLM
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The XLM model was proposed in `Cross-lingual Language Model Pretraining <https://arxiv.org/abs/1901.07291>`_
by Guillaume Lample*, Alexis Conneau*. It's a transformer pre-trained using one of the following objectives:
The XLM model was proposed in `Cross-lingual Language Model Pretraining <https://arxiv.org/abs/1901.07291>`__ by
Guillaume Lample, Alexis Conneau. It's a transformer pretrained using one of the following objectives:
- a causal language modeling (CLM) objective (next token prediction),
- a masked language modeling (MLM) objective (Bert-like), or
- a Translation Language Modeling (TLM) object (extension of Bert's MLM to multiple language inputs)
- a masked language modeling (MLM) objective (BERT-like), or
- a Translation Language Modeling (TLM) object (extension of BERT's MLM to multiple language inputs)
The abstract from the paper is the following:
@@ -27,83 +27,120 @@ Tips:
- XLM has many different checkpoints, which were trained using different objectives: CLM, MLM or TLM. Make sure to
select the correct objective for your task (e.g. MLM checkpoints are not suitable for generation).
- XLM has multilingual checkpoints which leverage a specific `lang` parameter. Check out the
`multi-lingual <../multilingual.html>`__ page for more information.
- XLM has multilingual checkpoints which leverage a specific :obj:`lang` parameter. Check out the
:doc:`multi-lingual <../multilingual>` page for more information.
The original code can be found `here <https://github.com/facebookresearch/XLM/>`_.
The original code can be found `here <https://github.com/facebookresearch/XLM/>`__.
XLMConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMConfig
:members:
XLMTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
XLM specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_xlm.XLMForQuestionAnsweringOutput
:members:
XLMModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMModel
:members:
:members: forward
XLMWithLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMWithLMHeadModel
:members:
:members: forward
XLMForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMForSequenceClassification
:members:
:members: forward
XLMForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMForMultipleChoice
:members: forward
XLMForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMForTokenClassification
:members: forward
XLMForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMForQuestionAnsweringSimple
:members:
:members: forward
XLMForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMForQuestionAnswering
:members:
:members: forward
TFXLMModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMModel
:members:
:members: call
TFXLMWithLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMWithLMHeadModel
:members:
:members: call
TFXLMForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMForSequenceClassification
:members:
:members: call
TFXLMForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMForMultipleChoice
:members: call
TFXLMForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMForTokenClassification
:members: call
TFXLMForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMForQuestionAnsweringSimple
:members:
:members: call

View File

@@ -0,0 +1,63 @@
XLM-ProphetNet
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** If you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
@patrickvonplaten
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The XLM-ProphetNet model was proposed in `ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training, <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou on 13 Jan, 2020.
XLM-ProphetNet is an encoder-decoder model and can predict n-future tokens for "ngram" language modeling instead of just the next token. Its architecture is identical to ProhpetNet, but the model was trained on the multi-lingual "wiki100" Wikipedia dump.
The abstract from the paper is the following:
*In this paper, we present a new sequence-to-sequence pre-training model called ProphetNet, which introduces a novel self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism. Instead of the optimization of one-step ahead prediction in traditional sequence-to-sequence model, the ProphetNet is optimized by n-step ahead prediction which predicts the next n tokens simultaneously based on previous context tokens at each time step. The future n-gram prediction explicitly encourages the model to plan for the future tokens and prevent overfitting on strong local correlations. We pre-train ProphetNet using a base scale dataset (16GB) and a large scale dataset (160GB) respectively. Then we conduct experiments on CNN/DailyMail, Gigaword, and SQuAD 1.1 benchmarks for abstractive summarization and question generation tasks. Experimental results show that ProphetNet achieves new state-of-the-art results on all these datasets compared to the models using the same scale pre-training corpus.*
The Authors' code can be found `here <https://github.com/microsoft/ProphetNet>`__.
XLMProphetNetConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMProphetNetConfig
:members:
XLMProphetNetTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMProphetNetTokenizer
:members:
XLMProphetNetModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMProphetNetModel
XLMProphetNetEncoder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMProphetNetEncoder
XLMProphetNetDecoder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMProphetNetDecoder
XLMProphetNetForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMProphetNetForConditionalGeneration
XLMProphetNetForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMProphetNetForCausalLM

View File

@@ -1,10 +1,14 @@
XLM-RoBERTa
------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
The XLM-RoBERTa model was proposed in `Unsupervised Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__
by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán,
Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019.
It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data.
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The XLM-RoBERTa model was proposed in `Unsupervised Cross-lingual Representation Learning at Scale
<https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume
Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's
RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl
data.
The abstract from the paper is the following:
@@ -22,24 +26,24 @@ and XNLI benchmarks. We will make XLM-R code, data, and models publicly availabl
Tips:
- XLM-R is a multilingual model trained on 100 different languages. Unlike some XLM multilingual models, it does
not require `lang` tensors to understand which language is used, and should be able to determine the correct
- XLM-RoBERTa is a multilingual model trained on 100 different languages. Unlike some XLM multilingual models, it does
not require :obj:`lang` tensors to understand which language is used, and should be able to determine the correct
language from the input ids.
- This implementation is the same as RoBERTa. Refer to the `documentation of RoBERTa <./roberta.html>`__ for usage
- This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa <roberta>` for usage
examples as well as the information relative to the inputs and outputs.
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/xlmr>`_.
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/xlmr>`__.
XLMRobertaConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaConfig
:members:
XLMRobertaTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
@@ -47,63 +51,91 @@ XLMRobertaTokenizer
XLMRobertaModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaModel
:members:
:members: forward
XLMRobertaForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaForCausalLM
:members: forward
XLMRobertaForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaForMaskedLM
:members:
:members: forward
XLMRobertaForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaForSequenceClassification
:members:
:members: forward
XLMRobertaForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaForMultipleChoice
:members:
:members: forward
XLMRobertaForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaForTokenClassification
:members:
:members: forward
XLMRobertaForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaForQuestionAnswering
:members: forward
TFXLMRobertaModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaModel
:members:
:members: call
TFXLMRobertaForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaForMaskedLM
:members:
:members: call
TFXLMRobertaForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaForSequenceClassification
:members:
:members: call
TFXLMRobertaForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaForMultipleChoice
:members: call
TFXLMRobertaForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaForTokenClassification
:members:
:members: call
TFXLMRobertaForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaForQuestionAnswering
:members: call

View File

@@ -1,14 +1,14 @@
XLNet
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The XLNet model was proposed in `XLNet: Generalized Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`_
by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
XLnet is an extension of the Transformer-XL model pre-trained using an autoregressive method
to learn bidirectional contexts by maximizing the expected likelihood over all permutations
of the input sequence factorization order.
The XLNet model was proposed in `XLNet: Generalized Autoregressive Pretraining for Language Understanding
<https://arxiv.org/abs/1906.08237>`_ by Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov,
Quoc V. Le. XLnet is an extension of the Transformer-XL model pre-trained using an autoregressive method to learn
bidirectional contexts by maximizing the expected likelihood over all permutations of the input sequence factorization
order.
The abstract from the paper is the following:
@@ -24,104 +24,161 @@ a large margin, including question answering, natural language inference, sentim
Tips:
- The specific attention pattern can be controlled at training and test time using the `perm_mask` input.
- Due to the difficulty of training a fully auto-regressive model over various factorization order,
XLNet is pretrained using only a sub-set of the output tokens as target which are selected
with the `target_mapping` input.
- To use XLNet for sequential decoding (i.e. not in fully bi-directional setting), use the `perm_mask` and
`target_mapping` inputs to control the attention span and outputs (see examples in `examples/text-generation/run_generation.py`)
- The specific attention pattern can be controlled at training and test time using the :obj:`perm_mask` input.
- Due to the difficulty of training a fully auto-regressive model over various factorization order, XLNet is pretrained
using only a sub-set of the output tokens as target which are selected with the :obj:`target_mapping` input.
- To use XLNet for sequential decoding (i.e. not in fully bi-directional setting), use the :obj:`perm_mask` and
:obj:`target_mapping` inputs to control the attention span and outputs (see examples in
`examples/text-generation/run_generation.py`)
- XLNet is one of the few models that has no sequence length limit.
The original code can be found `here <https://github.com/zihangdai/xlnet/>`_.
The original code can be found `here <https://github.com/zihangdai/xlnet/>`__.
XLNetConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetConfig
:members:
XLNetTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
XLNet specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_xlnet.XLNetModelOutput
:members:
.. autoclass:: transformers.modeling_xlnet.XLNetLMHeadModelOutput
:members:
.. autoclass:: transformers.modeling_xlnet.XLNetForSequenceClassificationOutput
:members:
.. autoclass:: transformers.modeling_xlnet.XLNetForMultipleChoiceOutput
:members:
.. autoclass:: transformers.modeling_xlnet.XLNetForTokenClassificationOutput
:members:
.. autoclass:: transformers.modeling_xlnet.XLNetForQuestionAnsweringSimpleOutput
:members:
.. autoclass:: transformers.modeling_xlnet.XLNetForQuestionAnsweringOutput
:members:
.. autoclass:: transformers.modeling_tf_xlnet.TFXLNetModelOutput
:members:
.. autoclass:: transformers.modeling_tf_xlnet.TFXLNetLMHeadModelOutput
:members:
.. autoclass:: transformers.modeling_tf_xlnet.TFXLNetForSequenceClassificationOutput
:members:
.. autoclass:: transformers.modeling_tf_xlnet.TFXLNetForMultipleChoiceOutput
:members:
.. autoclass:: transformers.modeling_tf_xlnet.TFXLNetForTokenClassificationOutput
:members:
.. autoclass:: transformers.modeling_tf_xlnet.TFXLNetForQuestionAnsweringSimpleOutput
:members:
XLNetModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetModel
:members:
:members: forward
XLNetLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetLMHeadModel
:members:
:members: forward
XLNetForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetForSequenceClassification
:members:
XLNetForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetForTokenClassification
:members:
:members: forward
XLNetForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetForMultipleChoice
:members:
:members: forward
XLNetForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetForTokenClassification
:members: forward
XLNetForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetForQuestionAnsweringSimple
:members:
:members: forward
XLNetForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetForQuestionAnswering
:members:
:members: forward
TFXLNetModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetModel
:members:
:members: call
TFXLNetLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetLMHeadModel
:members:
:members: call
TFXLNetForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetForSequenceClassification
:members:
:members: call
TFLNetForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetForMultipleChoice
:members: call
TFXLNetForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetForTokenClassification
:members: call
TFXLNetForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetForQuestionAnsweringSimple
:members:
:members: call

View File

@@ -1,55 +0,0 @@
# Model upload and sharing
Starting with `v2.2.2`, you can now upload and share your fine-tuned models with the community, using the <abbr title="Command-line interface">CLI</abbr> that's built-in to the library.
**First, create an account on [https://huggingface.co/join](https://huggingface.co/join)**. Optionally, join an existing organization or create a new one. Then:
```shell
transformers-cli login
# log in using the same credentials as on huggingface.co
```
Upload your model:
```shell
transformers-cli upload ./path/to/pretrained_model/
# ^^ Upload folder containing weights/tokenizer/config
# saved via `.save_pretrained()`
transformers-cli upload ./config.json [--filename folder/foobar.json]
# ^^ Upload a single file
# (you can optionally override its filename, which can be nested inside a folder)
```
If you want your model to be namespaced by your organization name rather than your username, add the following flag to any command:
```shell
--organization organization_name
```
Your model will then be accessible through its identifier, a concatenation of your username (or organization name) and the folder name above:
```python
"username/pretrained_model"
# or if an org:
"organization_name/pretrained_model"
```
**Please add a README.md model card** to the repo under `model_cards/` with: model description, training params (dataset, preprocessing, hardware used, hyperparameters), evaluation results, intended uses & limitations, etc.
Your model now has a page on huggingface.co/models 🔥
Anyone can load it from code:
```python
tokenizer = AutoTokenizer.from_pretrained("namespace/pretrained_model")
model = AutoModel.from_pretrained("namespace/pretrained_model")
```
List all your files on S3:
```shell
transformers-cli s3 ls
```
You can also delete unneeded files:
```shell
transformers-cli s3 rm …
```

View File

@@ -0,0 +1,222 @@
Model sharing and uploading
=======================================================================================================================
In this page, we will show you how to share a model you have trained or fine-tuned on new data with the community on
the `model hub <https://huggingface.co/models>`__.
.. note::
You will need to create an account on `huggingface.co <https://huggingface.co/join>`__ for this.
Optionally, you can join an existing organization or create a new one.
Prepare your model for uploading
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We have seen in the :doc:`training tutorial <training>`: how to fine-tune a model on a given task. You have probably
done something similar on your task, either using the model directly in your own training loop or using the
:class:`~.transformers.Trainer`/:class:`~.transformers.TFTrainer` class. Let's see how you can share the result on
the `model hub <https://huggingface.co/models>`__.
Basic steps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
..
When #5258 is merged, we can remove the need to create the directory.
First, pick a directory with the name you want your model to have on the model hub (its full name will then be
`username/awesome-name-you-picked` or `organization/awesome-name-you-picked`) and create it with either
.. code-block::
mkdir path/to/awesome-name-you-picked
or in python
.. code-block::
import os
os.makedirs("path/to/awesome-name-you-picked")
then you can save your model and tokenizer with:
.. code-block::
model.save_pretrained("path/to/awesome-name-you-picked")
tokenizer.save_pretrained("path/to/awesome-name-you-picked")
Or, if you're using the Trainer API
.. code-block::
trainer.save_model("path/to/awesome-name-you-picked")
tokenizer.save_pretrained("path/to/awesome-name-you-picked")
Make your model work on all frameworks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
..
TODO Sylvain: make this automatic during the upload
You probably have your favorite framework, but so will other users! That's why it's best to upload your model with both
PyTorch `and` TensorFlow checkpoints to make it easier to use (if you skip this step, users will still be able to load
your model in another framework, but it will be slower, as it will have to be converted on the fly). Don't worry, it's super easy to do (and in a future version,
it will all be automatic). You will need to install both PyTorch and TensorFlow for this step, but you don't need to
worry about the GPU, so it should be very easy. Check the
`TensorFlow installation page <https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available>`__
and/or the `PyTorch installation page <https://pytorch.org/get-started/locally/#start-locally>`__ to see how.
First check that your model class exists in the other framework, that is try to import the same model by either adding
or removing TF. For instance, if you trained a :class:`~transformers.DistilBertForSequenceClassification`, try to
type
.. code-block::
from transformers import TFDistilBertForSequenceClassification
and if you trained a :class:`~transformers.TFDistilBertForSequenceClassification`, try to
type
.. code-block::
from transformers import DistilBertForSequenceClassification
This will give back an error if your model does not exist in the other framework (something that should be pretty rare
since we're aiming for full parity between the two frameworks). In this case, skip this and go to the next step.
Now, if you trained your model in PyTorch and have to create a TensorFlow version, adapt the following code to your
model class:
.. code-block::
tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
tf_model.save_pretrained("path/to/awesome-name-you-picked")
and if you trained your model in TensorFlow and have to create a PyTorch version, adapt the following code to your
model class:
.. code-block::
pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
pt_model.save_pretrained("path/to/awesome-name-you-picked")
That's all there is to it!
Check the directory before uploading
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Make sure there are no garbage files in the directory you'll upload. It should only have:
- a `config.json` file, which saves the :doc:`configuration <main_classes/configuration>` of your model ;
- a `pytorch_model.bin` file, which is the PyTorch checkpoint (unless you can't have it for some reason) ;
- a `tf_model.h5` file, which is the TensorFlow checkpoint (unless you can't have it for some reason) ;
- a `special_tokens_map.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save;
- a `tokenizer_config.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save;
- files named `vocab.json`, `vocab.txt`, `merges.txt`, or similar, which contain the vocabulary of your tokenizer, part of your :doc:`tokenizer <main_classes/tokenizer>` save;
- maybe a `added_tokens.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save.
Other files can safely be deleted.
Upload your model with the CLI
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Now go in a terminal and run the following command. It should be in the virtual enviromnent where you installed 🤗
Transformers, since that command :obj:`transformers-cli` comes from the library.
.. code-block::
transformers-cli login
Then log in using the same credentials as on huggingface.co. To upload your model, just type
.. code-block::
transformers-cli upload path/to/awesome-name-you-picked/
This will upload the folder containing the weights, tokenizer and configuration we prepared in the previous section.
By default you will be prompted to confirm that you want these files to be uploaded. If you are uploading multiple models and need to script that process, you can add `-y` to bypass the prompt. For example:
.. code-block::
transformers-cli upload -y path/to/awesome-name-you-picked/
If you want to upload a single file (a new version of your model, or the other framework checkpoint you want to add),
just type:
.. code-block::
transformers-cli upload path/to/awesome-name-you-picked/that-file
or
.. code-block::
transformers-cli upload path/to/awesome-name-you-picked/that-file --filename awesome-name-you-picked/new_name
if you want to change its filename.
This uploads the model to your personal account. If you want your model to be namespaced by your organization name
rather than your username, add the following flag to any command:
.. code-block::
--organization organization_name
so for instance:
.. code-block::
transformers-cli upload path/to/awesome-name-you-picked/ --organization organization_name
Your model will then be accessible through its identifier, which is, as we saw above,
`username/awesome-name-you-picked` or `organization/awesome-name-you-picked`.
Add a model card
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To make sure everyone knows what your model can do, what its limitations and potential bias or ethetical
considerations, please add a README.md model card to the 🤗 Transformers repo under `model_cards/`. It should then be
placed in a subfolder with your username or organization, then another subfolder named like your model
(`awesome-name-you-picked`). Or just click on the "Create a model card on GitHub" button on the model page, it will
get you directly to the right location. If you need one, `here <https://github.com/huggingface/model_card>`__ is a
model card template (meta-suggestions are welcome).
If your model is fine-tuned from another model coming from the model hub (all 🤗 Transformers pretrained models do),
don't forget to link to its model card so that people can fully trace how your model was built.
If you have never made a pull request to the 🤗 Transformers repo, look at the
:doc:`contributing guide <contributing>` to see the steps to follow.
.. Note::
You can also send your model card in the folder you uploaded with the CLI by placing it in a `README.md` file
inside `path/to/awesome-name-you-picked/`.
Using your model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Your model now has a page on huggingface.co/models 🔥
Anyone can load it from code:
.. code-block::
tokenizer = AutoTokenizer.from_pretrained("namespace/awesome-name-you-picked")
model = AutoModel.from_pretrained("namespace/awesome-name-you-picked")
Additional commands
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can list all the files you uploaded on the hub like this:
.. code-block::
transformers-cli s3 ls
You can also delete unneeded files with
.. code-block::
transformers-cli s3 rm awesome-name-you-picked/filename

View File

@@ -0,0 +1,787 @@
Summary of the models
=======================================================================================================================
This is a summary of the models available in 🤗 Transformers. It assumes youre familiar with the original
`transformer model <https://arxiv.org/abs/1706.03762>`_. For a gentle introduction check the `annotated transformer
<http://nlp.seas.harvard.edu/2018/04/03/attention.html>`_. Here we focus on the high-level differences between the
models. You can check them more in detail in their respective documentation. Also checkout the
:doc:`pretrained model page </pretrained_models>` to see the checkpoints available for each type of model and all `the
community models <https://huggingface.co/models>`_.
Each one of the models in the library falls into one of the following categories:
* :ref:`autoregressive-models`
* :ref:`autoencoding-models`
* :ref:`seq-to-seq-models`
* :ref:`multimodal-models`
* :ref:`retrieval-based-models`
Autoregressive models are pretrained on the classic language modeling task: guess the next token having read all the
previous ones. They correspond to the decoder of the original transformer model, and a mask is used on top of the full
sentence so that the attention heads can only see what was before in the next, and not whats after. Although those
models can be fine-tuned and achieve great results on many tasks, the most natural application is text generation.
A typical example of such models is GPT.
Autoencoding models are pretrained by corrupting the input tokens in some way and trying to reconstruct the original
sentence. They correspond to the encoder of the original transformer model in the sense that they get access to the
full inputs without any mask. Those models usually build a bidirectional representation of the whole sentence. They can
be fine-tuned and achieve great results on many tasks such as text generation, but their most natural application is
sentence classification or token classification. A typical example of such models is BERT.
Note that the only difference between autoregressive models and autoencoding models is in the way the model is
pretrained. Therefore, the same architecture can be used for both autoregressive and autoencoding models. When a given
model has been used for both types of pretraining, we have put it in the category corresponding to the article where it was first
introduced.
Sequence-to-sequence models use both the encoder and the decoder of the original transformer, either for translation
tasks or by transforming other tasks to sequence-to-sequence problems. They can be fine-tuned to many tasks but their
most natural applications are translation, summarization and question answering. The original transformer model is an
example of such a model (only for translation), T5 is an example that can be fine-tuned on other tasks.
Multimodal models mix text inputs with other kinds (e.g. images) and are more specific to a given task.
.. _autoregressive-models:
Autoregressive models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As mentioned before, these models rely on the decoder part of the original transformer and use an attention mask so
that at each position, the model can only look at the tokens before the attention heads.
Original GPT
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=openai-gpt">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-openai--gpt-blueviolet">
</a>
<a href="model_doc/gpt.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-openai--gpt-blueviolet">
</a>
`Improving Language Understanding by Generative Pre-Training <https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf>`_,
Alec Radford et al.
The first autoregressive model based on the transformer architecture, pretrained on the Book Corpus dataset.
The library provides versions of the model for language modeling and multitask language modeling/multiple choice
classification.
GPT-2
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=gpt2">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-gpt2-blueviolet">
</a>
<a href="model_doc/gpt2.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-gpt2-blueviolet">
</a>
`Language Models are Unsupervised Multitask Learners <https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf>`_,
Alec Radford et al.
A bigger and better version of GPT, pretrained on WebText (web pages from outgoing links in Reddit with 3 karmas or
more).
The library provides versions of the model for language modeling and multitask language modeling/multiple choice
classification.
CTRL
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=ctrl">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-ctrl-blueviolet">
</a>
<a href="model_doc/ctrl.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-ctrl-blueviolet">
</a>
`CTRL: A Conditional Transformer Language Model for Controllable Generation <https://arxiv.org/abs/1909.05858>`_,
Nitish Shirish Keskar et al.
Same as the GPT model but adds the idea of control codes. Text is generated from a prompt (can be empty) and one (or
several) of those control codes which are then used to influence the text generation: generate with the style of
wikipedia article, a book or a movie review.
The library provides a version of the model for language modeling only.
Transformer-XL
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=transfo-xl">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-transfo--xl-blueviolet">
</a>
<a href="model_doc/transformerxl.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-transfo--xl-blueviolet">
</a>
`Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`_,
Zihang Dai et al.
Same as a regular GPT model, but introduces a recurrence mechanism for two consecutive segments (similar to a regular
RNNs with two consecutive inputs). In this context, a segment is a number of consecutive tokens (for instance 512) that
may span across multiple documents, and segments are fed in order to the model.
Basically, the hidden states of the previous segment are concatenated to the current input to compute the attention
scores. This allows the model to pay attention to information that was in the previous segment as well as the current
one. By stacking multiple attention layers, the receptive field can be increased to multiple previous segments.
This changes the positional embeddings to positional relative embeddings (as the regular positional embeddings would
give the same results in the current input and the current hidden state at a given position) and needs to make some
adjustments in the way attention scores are computed.
The library provides a version of the model for language modeling only.
.. _reformer:
Reformer
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=reformer">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-reformer-blueviolet">
</a>
<a href="model_doc/reformer.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-reformer-blueviolet">
</a>
`Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451>`_,
Nikita Kitaev et al .
An autoregressive transformer model with lots of tricks to reduce memory footprint and compute time. Those tricks
include:
* Use :ref:`Axial position encoding <axial-pos-encoding>` (see below for more details). Its a mechanism to avoid
having a huge positional encoding matrix (when the sequence length is very big) by factorizing it into smaller
matrices.
* Replace traditional attention by :ref:`LSH (local-sensitive hashing) attention <lsh-attention>` (see below for more
details). It's a technique to avoid computing the full product query-key in the attention layers.
* Avoid storing the intermediate results of each layer by using reversible transformer layers to obtain them during
the backward pass (subtracting the residuals from the input of the next layer gives them back) or recomputing them
for results inside a given layer (less efficient than storing them but saves memory).
* Compute the feedforward operations by chunks and not on the whole batch.
With those tricks, the model can be fed much larger sentences than traditional transformer autoregressive models.
**Note:** This model could be very well be used in an autoencoding setting, there is no checkpoint for such a
pretraining yet, though.
The library provides a version of the model for language modeling only.
XLNet
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=xlnet">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-xlnet-blueviolet">
</a>
<a href="model_doc/xlnet.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xlnet-blueviolet">
</a>
`XLNet: Generalized Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`_,
Zhilin Yang et al.
XLNet is not a traditional autoregressive model but uses a training strategy that builds on that. It permutes the
tokens in the sentence, then allows the model to use the last n tokens to predict the token n+1. Since this is all done
with a mask, the sentence is actually fed in the model in the right order, but instead of masking the first n tokens
for n+1, XLNet uses a mask that hides the previous tokens in some given permutation of 1,...,sequence length.
XLNet also uses the same recurrence mechanism as Transformer-XL to build long-term dependencies.
The library provides a version of the model for language modeling, token classification, sentence classification,
multiple choice classification and question answering.
.. _autoencoding-models:
Autoencoding models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As mentioned before, these models rely on the encoder part of the original transformer and use no mask so the model can
look at all the tokens in the attention heads. For pretraining, targets are the original sentences and inputs are their corrupted versions.
BERT
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=bert">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-bert-blueviolet">
</a>
<a href="model_doc/bert.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-bert-blueviolet">
</a>
`BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`_,
Jacob Devlin et al.
Corrupts the inputs by using random masking, more precisely, during pretraining, a given percentage of tokens (usually
15%) is masked by:
* a special mask token with probability 0.8
* a random token different from the one masked with probability 0.1
* the same token with probability 0.1
The model must predict the original sentence, but has a second objective: inputs are two sentences A and B (with a
separation token in between). With probability 50%, the sentences are consecutive in the corpus, in the remaining 50%
they are not related. The model has to predict if the sentences are consecutive or not.
The library provides a version of the model for language modeling (traditional or masked), next sentence prediction,
token classification, sentence classification, multiple choice classification and question answering.
ALBERT
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=albert">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-albert-blueviolet">
</a>
<a href="model_doc/albert.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-albert-blueviolet">
</a>
`ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_,
Zhenzhong Lan et al.
Same as BERT but with a few tweaks:
* Embedding size E is different from hidden size H justified because the embeddings are context independent (one
embedding vector represents one token), whereas hidden states are context dependent (one hidden state represents a
sequence of tokens) so it's more logical to have H >> E. Also, the embedding matrix is large since it's V x E (V
being the vocab size). If E < H, it has less parameters.
* Layers are split in groups that share parameters (to save memory).
* Next sentence prediction is replaced by a sentence ordering prediction: in the inputs, we have two sentences A and B
(that are consecutive) and we either feed A followed by B or B followed by A. The model must predict if they have
been swapped or not.
The library provides a version of the model for masked language modeling, token classification, sentence
classification, multiple choice classification and question answering.
RoBERTa
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=roberta">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-roberta-blueviolet">
</a>
<a href="model_doc/roberta.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-roberta-blueviolet">
</a>
`RoBERTa: A Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_,
Yinhan Liu et al.
Same as BERT with better pretraining tricks:
* dynamic masking: tokens are masked differently at each epoch, whereas BERT does it once and for all
* no NSP (next sentence prediction) loss and instead of putting just two sentences together, put a chunk of
contiguous texts together to reach 512 tokens (so the sentences are in an order than may span several documents)
* train with larger batches
* use BPE with bytes as a subunit and not characters (because of unicode characters)
The library provides a version of the model for masked language modeling, token classification, sentence
classification, multiple choice classification and question answering.
DistilBERT
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=distilbert">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-distilbert-blueviolet">
</a>
<a href="model_doc/distilbert.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-distilbert-blueviolet">
</a>
`DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`_,
Victor Sanh et al.
Same as BERT but smaller. Trained by distillation of the pretrained BERT model, meaning it's been trained to predict
the same probabilities as the larger model. The actual objective is a combination of:
* finding the same probabilities as the teacher model
* predicting the masked tokens correctly (but no next-sentence objective)
* a cosine similarity between the hidden states of the student and the teacher model
The library provides a version of the model for masked language modeling, token classification, sentence classification
and question answering.
XLM
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=xlm">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-xlm-blueviolet">
</a>
<a href="model_doc/xlm.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xlm-blueviolet">
</a>
`Cross-lingual Language Model Pretraining <https://arxiv.org/abs/1901.07291>`_, Guillaume Lample and Alexis Conneau
A transformer model trained on several languages. There are three different type of training for this model and the
library provides checkpoints for all of them:
* Causal language modeling (CLM) which is the traditional autoregressive training (so this model could be in the
previous section as well). One of the languages is selected for each training sample, and the model input is a
sentence of 256 tokens, that may span over several documents in one of those languages.
* Masked language modeling (MLM) which is like RoBERTa. One of the languages is selected for each training sample,
and the model input is a sentence of 256 tokens, that may span over several documents in one of those languages, with
dynamic masking of the tokens.
* A combination of MLM and translation language modeling (TLM). This consists of concatenating a sentence in two
different languages, with random masking. To predict one of the masked tokens, the model can use both, the
surrounding context in language 1 and the context given by language 2.
Checkpoints refer to which method was used for pretraining by having `clm`, `mlm` or `mlm-tlm` in their names. On top
of positional embeddings, the model has language embeddings. When training using MLM/CLM, this gives the model an
indication of the language used, and when training using MLM+TLM, an indication of the language used for each part.
The library provides a version of the model for language modeling, token classification, sentence classification and
question answering.
XLM-RoBERTa
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=xlm-roberta">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-xlm--roberta-blueviolet">
</a>
<a href="model_doc/xlmroberta.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xlm--roberta-blueviolet">
</a>
`Unsupervised Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`_, Alexis Conneau et
al.
Uses RoBERTa tricks on the XLM approach, but does not use the translation language modeling objective. It only uses
masked language modeling on sentences coming from one language. However, the model is trained on many more languages
(100) and doesn't use the language embeddings, so it's capable of detecting the input language by itself.
The library provides a version of the model for masked language modeling, token classification, sentence
classification, multiple choice classification and question answering.
FlauBERT
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=flaubert">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-flaubert-blueviolet">
</a>
<a href="model_doc/flaubert.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-flaubert-blueviolet">
</a>
`FlauBERT: Unsupervised Language Model Pre-training for French <https://arxiv.org/abs/1912.05372>`_, Hang Le et al.
Like RoBERTa, without the sentence ordering prediction (so just trained on the MLM objective).
The library provides a version of the model for language modeling and sentence classification.
ELECTRA
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=electra">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-electra-blueviolet">
</a>
<a href="model_doc/electra.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-electra-blueviolet">
</a>
`ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators <https://arxiv.org/abs/2003.10555>`_,
Kevin Clark et al.
ELECTRA is a transformer model pretrained with the use of another (small) masked language model. The inputs are
corrupted by that language model, which takes an input text that is randomly masked and outputs a text in which ELECTRA
has to predict which token is an original and which one has been replaced. Like for GAN training, the small language
model is trained for a few steps (but with the original texts as objective, not to fool the ELECTRA model like in a
traditional GAN setting) then the ELECTRA model is trained for a few steps.
The library provides a version of the model for masked language modeling, token classification and sentence
classification.
Funnel Transformer
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=funnel">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-funnel-blueviolet">
</a>
<a href="model_doc/funnel.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-funnel-blueviolet">
</a>
`Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
<https://arxiv.org/abs/2006.03236>`_, Zihang Dai et al.
Funnel Transformer is a transformer model using pooling, a bit like a ResNet model: layers are grouped in blocks, and
at the beginning of each block (except the first one), the hidden states are pooled among the sequence dimension. This
way, their length is divided by 2, which speeds up the computation of the next hidden states. All pretrained models
have three blocks, which means the final hidden state has a sequence length that is one fourth of the original sequence
length.
For tasks such as classification, this is not a problem, but for tasks like masked language modeling or token
classification, we need a hidden state with the same sequence length as the original input. In those cases, the final
hidden states are upsampled to the input sequence length and go through two additional layers. That's why there are two
versions of each checkpoint. The version suffixed with "-base" contains only the three blocks, while the version
without that suffix contains the three blocks and the upsampling head with its additional layers.
The pretrained models available use the same pretraining objective as ELECTRA.
The library provides a version of the model for masked language modeling, token classification, sentence
classification, multiple choice classification and question answering.
.. _longformer:
Longformer
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=longformer">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-longformer-blueviolet">
</a>
<a href="model_doc/longformer.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-longformer-blueviolet">
</a>
`Longformer: The Long-Document Transformer <https://arxiv.org/abs/2004.05150>`_, Iz Beltagy et al.
A transformer model replacing the attention matrices by sparse matrices to go faster. Often, the local context (e.g.,
what are the two tokens left and right?) is enough to take action for a given token. Some preselected input tokens are
still given global attention, but the attention matrix has way less parameters, resulting in a speed-up. See the
:ref:`local attention section <local-attention>` for more information.
It is pretrained the same way a RoBERTa otherwise.
**Note:** This model could be very well be used in an autoregressive setting, there is no checkpoint for such a
pretraining yet, though.
The library provides a version of the model for masked language modeling, token classification, sentence
classification, multiple choice classification and question answering.
.. _seq-to-seq-models:
Sequence-to-sequence models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As mentioned before, these models keep both the encoder and the decoder of the original transformer.
BART
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=bart">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-bart-blueviolet">
</a>
<a href="model_doc/bart.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-bart-blueviolet">
</a>
`BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
<https://arxiv.org/abs/1910.13461>`_, Mike Lewis et al.
Sequence-to-sequence model with an encoder and a decoder. Encoder is fed a corrupted version of the tokens, decoder is
fed the original tokens (but has a mask to hide the future words like a regular transformers decoder). For the encoder, on the
pretraining tasks, a composition of the following transformations are applied:
* mask random tokens (like in BERT)
* delete random tokens
* mask a span of k tokens with a single mask token (a span of 0 tokens is an insertion of a mask token)
* permute sentences
* rotate the document to make it start at a specific token
The library provides a version of this model for conditional generation and sequence classification.
Pegasus
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=pegasus">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-pegasus-blueviolet">
</a>
<a href="model_doc/pegasus.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-pegasus-blueviolet">
</a>
`PEGASUS: Pre-training with Extracted Gap-sentences forAbstractive Summarization
<https://arxiv.org/pdf/1912.08777.pdf>`_, Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019.
Sequence-to-sequence model with the same encoder-decoder model architecture as BART. Pegasus is pre-trained jointly on two self-supervised objective functions: Masked Language Modeling (MLM) and a novel summarization specific pre-training objective, called Gap Sentence Generation (GSG).
* MLM: encoder input tokens are randomely replaced by a mask tokens and have to be predicted by the encoder (like in BERT)
* GSG: whole encoder input sentences are replaced by a second mask token and fed to the decoder, but which has a causal mask to hide the future words like a regular auto-regressive transformer decoder.
In contrast to BART, Pegasus' pretraining task is intentionally similar to summarization: important sentences are masked and are generated together as one output sequence from the remaining sentences, similar to an extractive summary.
The library provides a version of this model for conditional generation, which should be used for summarization.
MarianMT
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=marian">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-marian-blueviolet">
</a>
<a href="model_doc/marian.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-marian-blueviolet">
</a>
`Marian: Fast Neural Machine Translation in C++ <https://arxiv.org/abs/1804.00344>`_, Marcin Junczys-Dowmunt et al.
A framework for translation models, using the same models as BART
The library provides a version of this model for conditional generation.
T5
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=t5">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-t5-blueviolet">
</a>
<a href="model_doc/t5.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-t5-blueviolet">
</a>
`Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`_,
Colin Raffel et al.
Uses the traditional transformer model (with a slight change in the positional embeddings, which are learned at
each layer). To be able to operate on all NLP tasks, it transforms them into text-to-text problems by using specific
prefixes: “summarize: ”, “question: ”, “translate English to German: ” and so forth.
The pretraining includes both supervised and self-supervised training. Supervised training is conducted on downstream
tasks provided by the GLUE and SuperGLUE benchmarks (converting them into text-to-text tasks as explained above).
Self-supervised training uses corrupted tokens, by randomly removing 15% of the tokens and
replacing them with individual sentinel tokens (if several consecutive tokens are marked for removal, the whole group is replaced with a single sentinel token). The input of the encoder is the corrupted sentence, the input of the decoder is the
original sentence and the target is then the dropped out tokens delimited by their sentinel tokens.
For instance, if we have the sentence “My dog is very cute .”, and we decide to remove the tokens: "dog", "is" and "cute", the encoder
input becomes “My <x> very <y> .” and the target input becomes “<x> dog is <y> cute .<z>”
The library provides a version of this model for conditional generation.
MBart
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=mbart">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-mbart-blueviolet">
</a>
<a href="model_doc/mbart.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-mbart-blueviolet">
</a>
`Multilingual Denoising Pre-training for Neural Machine Translation <https://arxiv.org/abs/2001.08210>`_ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov
Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
The model architecture and pre-training objective is same as BART, but MBart is trained on 25 languages
and is intended for supervised and unsupervised machine translation. MBart is one of the first methods
for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages,
The library provides a version of this model for conditional generation.
The `mbart-large-en-ro checkpoint <https://huggingface.co/facebook/mbart-large-en-ro>`_ can be used for english -> romanian translation.
The `mbart-large-cc25 <https://huggingface.co/facebook/mbart-large-cc25>`_ checkpoint can be finetuned for other translation and summarization tasks, using code in ```examples/seq2seq/``` , but is not very useful without finetuning.
.. _multimodal-models:
ProphetNet
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=prophetnet">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-prophetnet-blueviolet">
</a>
<a href="model_doc/prophetnet.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-prophetnet-blueviolet">
</a>
`ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training, <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou.
ProphetNet introduces a novel *sequence-to-sequence* pre-training objective, called *future n-gram prediction*. In future n-gram prediction, the model predicts the next n tokens simultaneously based on previous context tokens at each time step instead instead of just the single next token. The future n-gram prediction explicitly encourages the model to plan for the future tokens and prevent overfitting on strong local correlations.
The model architecture is based on the original Transformer, but replaces the "standard" self-attention mechanism in the decoder by a a main self-attention mechanism and a self and n-stream (predict) self-attention mechanism.
The library provides a pre-trained version of this model for conditional generation and a fine-tuned version for summarization.
XLM-ProphetNet
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=xprophetnet">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-xprophetnet-blueviolet">
</a>
<a href="model_doc/xlmprophetnet.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xprophetnet-blueviolet">
</a>
`ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training, <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou.
XLM-ProphetNet's model architecture and pre-training objective is same as ProphetNet, but XLM-ProphetNet was pre-trained on the cross-lingual dataset `XGLUE <https://arxiv.org/abs/2004.01401>`__.
The library provides a pre-trained version of this model for multi-lingual conditional generation and fine-tuned versions for headline generation and question generation, respectively.
Multimodal models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
There is one multimodal model in the library which has not been pretrained in the self-supervised fashion like the
others.
MMBT
-----------------------------------------------------------------------------------------------------------------------
`Supervised Multimodal Bitransformers for Classifying Images and Text <https://arxiv.org/abs/1909.02950>`_, Douwe Kiela
et al.
A transformers model used in multimodal settings, combining a text and an image to make predictions. The transformer
model takes as inputs the embeddings of the tokenized text and the final activations of a pretrained on images resnet
(after the pooling layer) that goes through a linear layer (to go from number of features at the end of the
resnet to the hidden state dimension of the transformer).
The different inputs are concatenated, and on top of the positional embeddings, a segment embedding is added to let the
model know which part of the input vector corresponds to the text and which to the image.
The pretrained model only works for classification.
..
More information in this :doc:`model documentation </model_doc/mmbt.html>`.
TODO: write this page
.. _retrieval-based-models:
Retrieval-based models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Some models use documents retrieval during (pre)training and inference for open-domain question answering, for example.
DPR
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=dpr">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-dpr-blueviolet">
</a>
<a href="model_doc/dpr.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-dpr-blueviolet">
</a>
`Dense Passage Retrieval for Open-Domain Question Answering <https://arxiv.org/abs/2004.04906>`_,
Vladimir Karpukhin et al.
Dense Passage Retrieval (DPR) - is a set of tools and models for state-of-the-art open-domain question-answering research.
DPR consists in three models:
* Question encoder: encode questions as vectors
* Context encoder: encode contexts as vectors
* Reader: extract the answer of the questions inside retrieved contexts, along with a relevance score (high if the inferred span actually answers the question).
DPR's pipeline (not implemented yet) uses a retrieval step to find the top k contexts given a certain question, and then it calls the reader with the question and the retrieved documents to get the answer.
RAG
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=rag">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-rag-blueviolet">
</a>
<a href="model_doc/rag.html">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-rag-blueviolet">
</a>
`Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks <https://arxiv.org/abs/2005.11401>`_,
Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela
Retrieval-augmented generation ("RAG") models combine the powers of pretrained dense retrieval (DPR) and Seq2Seq models.
RAG models retrieve docs, pass them to a seq2seq model, then marginalize to generate outputs.
The retriever and seq2seq modules are initialized from pretrained models, and fine-tuned jointly, allowing both retrieval and generation to adapt to downstream tasks.
The two models RAG-Token and RAG-Sequence are available for generation.
More technical aspects
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Full vs sparse attention
-----------------------------------------------------------------------------------------------------------------------
Most transformer models use full attention in the sense that the attention matrix is square. It can be a big
computational bottleneck when you have long texts. Longformer and reformer are models that try to be more efficient and
use a sparse version of the attention matrix to speed up training.
.. _lsh-attention:
**LSH attention**
:ref:`Reformer <reformer>` uses LSH attention. In the softmax(QK^t), only the biggest elements (in the softmax
dimension) of the matrix QK^t are going to give useful contributions. So for each query q in Q, we can consider only
the keys k in K that are close to q. A hash function is used to determine if q and k are close. The attention mask is
modified to mask the current token (except at the first position), because it will give a query and a key equal (so very
similar to each other). Since the hash can be a bit random, several hash functions are used in practice (determined by
a n_rounds parameter) and then are averaged together.
.. _local-attention:
**Local attention**
:ref:`Longformer <longformer>` uses local attention: often, the local context (e.g., what are the two tokens to the left and
right?) is enough to take action for a given token. Also, by stacking attention layers that have a small window, the
last layer will have a receptive field of more than just the tokens in the window, allowing them to build a
representation of the whole sentence.
Some preselected input tokens are also given global attention: for those few tokens, the attention matrix can access
all tokens and this process is symmetric: all other tokens have access to those specific tokens (on top of the ones in
their local window). This is shown in Figure 2d of the paper, see below for a sample attention mask:
.. image:: imgs/local_attention_mask.png
:scale: 50 %
:align: center
Using those attention matrices with less parameters then allows the model to have inputs having a bigger sequence
length.
Other tricks
-----------------------------------------------------------------------------------------------------------------------
.. _axial-pos-encoding:
**Axial positional encodings**
:ref:`Reformer <reformer>` uses axial positional encodings: in traditional transformer models, the positional encoding
E is a matrix of size :math:`l` by :math:`d`, :math:`l` being the sequence length and :math:`d` the dimension of the
hidden state. If you have very long texts, this matrix can be huge and take way too much space on the GPU. To alleviate that, axial positional encodings consist of factorizing that big matrix E in two smaller matrices E1 and
E2, with dimensions :math:`l_{1} \times d_{1}` and :math:`l_{2} \times d_{2}`, such that :math:`l_{1} \times l_{2} = l`
and :math:`d_{1} + d_{2} = d` (with the product for the lengths, this ends up being way smaller). The embedding for
time step :math:`j` in E is obtained by concatenating the embeddings for timestep :math:`j \% l1` in E1 and
:math:`j // l1` in E2.

View File

@@ -1,5 +1,5 @@
Multi-lingual models
================================================
=======================================================================================================================
Most of the models available in this library are mono-lingual models (English, Chinese and German). A few
multi-lingual models are available and have a different mechanisms than mono-lingual models.
@@ -8,13 +8,13 @@ This page details the usage of these models.
The two models that currently support multiple languages are BERT and XLM.
XLM
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
XLM has a total of 10 different checkpoints, only one of which is mono-lingual. The 9 remaining model checkpoints can
be split in two categories: the checkpoints that make use of language embeddings, and those that don't
XLM & Language Embeddings
------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
This section concerns the following checkpoints:
@@ -36,10 +36,11 @@ Here is an example using the ``xlm-clm-enfr-1024`` checkpoint (Causal language m
.. code-block::
import torch
from transformers import XLMTokenizer, XLMWithLMHeadModel
>>> import torch
>>> from transformers import XLMTokenizer, XLMWithLMHeadModel
tokenizer = XLMTokenizer.from_pretrained("xlm-clm-1024-enfr")
>>> tokenizer = XLMTokenizer.from_pretrained("xlm-clm-enfr-1024")
>>> model = XLMWithLMHeadModel.from_pretrained("xlm-clm-enfr-1024")
The different languages this model/tokenizer handles, as well as the ids of these languages are visible using the
@@ -47,16 +48,15 @@ The different languages this model/tokenizer handles, as well as the ids of thes
.. code-block::
# Continuation of the previous script
print(tokenizer.lang2id) # {'en': 0, 'fr': 1}
>>> print(tokenizer.lang2id)
{'en': 0, 'fr': 1}
These ids should be used when passing a language parameter during a model pass. Let's define our inputs:
.. code-block::
# Continuation of the previous script
input_ids = torch.tensor([tokenizer.encode("Wikipedia was used to")]) # batch size of 1
>>> input_ids = torch.tensor([tokenizer.encode("Wikipedia was used to")]) # batch size of 1
We should now define the language embedding by using the previously defined language id. We want to create a tensor
@@ -64,27 +64,25 @@ filled with the appropriate language ids, of the same size as input_ids. For eng
.. code-block::
# Continuation of the previous script
language_id = tokenizer.lang2id['en'] # 0
langs = torch.tensor([language_id] * input_ids.shape[1]) # torch.tensor([0, 0, 0, ..., 0])
>>> language_id = tokenizer.lang2id['en'] # 0
>>> langs = torch.tensor([language_id] * input_ids.shape[1]) # torch.tensor([0, 0, 0, ..., 0])
# We reshape it to be of size (batch_size, sequence_length)
langs = langs.view(1, -1) # is now of shape [1, sequence_length] (we have a batch size of 1)
>>> # We reshape it to be of size (batch_size, sequence_length)
>>> langs = langs.view(1, -1) # is now of shape [1, sequence_length] (we have a batch size of 1)
You can then feed it all as input to your model:
.. code-block::
# Continuation of the previous script
outputs = model(input_ids, langs=langs)
>>> outputs = model(input_ids, langs=langs)
The example `run_generation.py <https://github.com/huggingface/transformers/blob/master/examples/text-generation/run_generation.py>`__
can generate text using the CLM checkpoints from XLM, using the language embeddings.
XLM without Language Embeddings
------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
This section concerns the following checkpoints:
@@ -96,7 +94,7 @@ sentence representations, differently from previously-mentioned XLM checkpoints.
BERT
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
BERT has two checkpoints that can be used for multi-lingual tasks:
@@ -107,7 +105,7 @@ These checkpoints do not require language embeddings at inference time. They sho
used in the context and infer accordingly.
XLM-RoBERTa
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
XLM-RoBERTa was trained on 2.5TB of newly created clean CommonCrawl data in 100 languages. It provides strong
gains over previously released multi-lingual models like mBERT or XLM on downstream taks like classification,

151
docs/source/perplexity.rst Normal file
View File

@@ -0,0 +1,151 @@
Perplexity of fixed-length models
=======================================================================================================================
Perplexity (PPL) is one of the most common metrics for evaluating language
models. Before diving in, we should note that the metric applies specifically
to classical language models (sometimes called autoregressive or causal
language models) and is not well defined for masked language models like BERT
(see :doc:`summary of the models <model_summary>`).
Perplexity is defined as the exponentiated average log-likelihood of a
sequence. If we have a tokenized sequence :math:`X = (x_0, x_1, \dots, x_t)`,
then the perplexity of :math:`X` is,
.. math::
\text{PPL}(X)
= \exp \left\{ {-\frac{1}{t}\sum_i^t \log p_\theta (x_i|x_{<i}) } \right\}
where :math:`\log p_\theta (x_i|x_{<i})` is the log-likelihood of the ith
token conditioned on the preceding tokens :math:`x_{<i}` according to our
model. Intuitively, it can be thought of as an evaluation of the model's
ability to predict uniformly among the set of specified tokens in a corpus.
Importantly, this means that the tokenization procedure has a direct impact
on a model's perplexity which should always be taken into consideration when
comparing different models.
This is also equivalent to the exponentiation of the cross-entropy between
the data and model predictions. For more intuition about perplexity and its
relationship to Bits Per Character (BPC) and data compression, check out this
`fantastic blog post on The Gradient
<https://thegradient.pub/understanding-evaluation-metrics-for-language-models/>`_.
Calculating PPL with fixed-length models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If we weren't limited by a model's context size, we would evaluate the
model's perplexity by autoregressively factorizing a sequence and
conditioning on the entire preceding subsequence at each step, as shown
below.
.. image:: imgs/ppl_full.gif
:width: 600
:alt: Full decomposition of a sequence with unlimited context length
When working with approximate models, however, we typically have a constraint
on the number of tokens the model can process. The largest version
of :doc:`GPT-2 <model_doc/gpt2>`, for example, has a fixed length of 1024
tokens, so we cannot calculate :math:`p_\theta(x_t|x_{<t})` directly when
:math:`t` is greater than 1024.
Instead, the sequence is typically broken into subsequences equal to the
model's maximum input size. If a model's max input size is :math:`k`, we
then approximate the likelihood of a token :math:`x_t` by conditioning only
on the :math:`k-1` tokens that precede it rather than the entire context.
When evaluating the model's perplexity of a sequence, a tempting but
suboptimal approach is to break the sequence into disjoint chunks and
add up the decomposed log-likelihoods of each segment independently.
.. image:: imgs/ppl_chunked.gif
:width: 600
:alt: Suboptimal PPL not taking advantage of full available context
This is quick to compute since the perplexity of each segment can be computed
in one forward pass, but serves as a poor approximation of the
fully-factorized perplexity and will typically yield a higher (worse) PPL
because the model will have less context at most of the prediction steps.
Instead, the PPL of fixed-length models should be evaluated with a
sliding-window strategy. This involves repeatedly sliding the
context window so that the model has more context when making each
prediction.
.. image:: imgs/ppl_sliding.gif
:width: 600
:alt: Sliding window PPL taking advantage of all available context
This is a closer approximation to the true decomposition of the
sequence probability and will typically yield a more favorable score.
The downside is that it requires a separate forward pass for each token in
the corpus. A good practical compromise is to employ a strided sliding
window, moving the context by larger strides rather than sliding by 1 token a
time. This allows computation to procede much faster while still giving the
model a large context to make predictions at each step.
Example: Calculating perplexity with GPT-2 in 🤗 Transformers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Let's demonstrate this process with GPT-2.
.. code-block:: python
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
device = 'cuda'
model_id = 'gpt2-large'
model = GPT2LMHeadModel.from_pretrained(model_id).to(device)
tokenizer = GPT2TokenizerFast.from_pretrained(model_id)
We'll load in the WikiText-2 dataset and evaluate the perplexity using a few
different sliding-window strategies. Since this dataset is small and we're
just doing one forward pass over the set, we can just load and encode the
entire dataset in memory.
.. code-block:: python
from nlp import load_dataset
test = load_dataset('wikitext', 'wikitext-2-raw-v1', split='test')
encodings = tokenizer('\n\n'.join(test['text']), return_tensors='pt')
With 🤗 Transformers, we can simply pass the ``input_ids`` as the ``labels``
to our model, and the average log-likelihood for each token is returned as
the loss. With our sliding window approach, however, there is overlap in the
tokens we pass to the model at each iteration. We don't want the
log-likelihood for the tokens we're just treating as context to be included
in our loss, so we can set these targets to ``-100`` so that they are
ignored. The following is an example of how we could do this with a stride of
``512``. This means that the model will have at least 512 tokens for context
when calculating the conditional likelihood of any one token (provided there
are 512 preceding tokens available to condition on).
.. code-block:: python
max_length = model.config.n_positions
stride = 512
lls = []
for i in tqdm(range(0, encodings.input_ids.size(1), stride)):
begin_loc = max(i + stride - max_length, 0)
end_loc = i + stride
input_ids = encodings.input_ids[:,begin_loc:end_loc].to(device)
target_ids = input_ids.clone()
target_ids[:,:-stride] = -100
with torch.no_grad():
outputs = model(input_ids, labels=target_ids)
log_likelihood = outputs[0] * stride
lls.append(log_likelihood)
ppl = torch.exp(torch.stack(lls).sum() / i)
Running this with the stride length equal to the max input length is
equivalent to the suboptimal, non-sliding-window strategy we discussed above.
The smaller the stride, the more context the model will have in making each
prediction, and the better the reported perplexity will typically be.
When we run the above with ``stride = 1024``, i.e. no overlap, the resulting
PPL is ``19.64``, which is about the same as the ``19.93`` reported in the
GPT-2 paper. By using ``stride = 512`` and thereby employing our striding
window strategy, this jumps down to ``16.53``. This is not only a more
favorable score, but is calculated in a way that is closer to the true
autoregressive decomposition of a sequence likelihood.

Some files were not shown because too many files have changed in this diff Show More