Compare commits

...

926 Commits

Author SHA1 Message Date
Sylvain Gugger
1ba08dc221 Release: v3.3.1 2020-09-29 14:17:34 -04:00
Sylvain Gugger
8546dc55c2 Fix Trainer tests in a multiGPU env (#7458) 2020-09-29 14:06:41 -04:00
Sylvain Gugger
d0fd7154c5 Catch import datasets common errors (#7456) 2020-09-29 13:42:09 -04:00
Sylvain Gugger
f1220c5fe2 Add a code of conduct (#7433) 2020-09-29 13:38:47 -04:00
Teven
9e9a1fb8c7 Adding gradient checkpointing to GPT2 (#7446)
* GPT2 gradient checkpointing

* find_unused_parameters removed if checkpointing

* find_unused_parameters removed if checkpointing

* Update src/transformers/configuration_gpt2.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Added a test for generation with checkpointing

* Update src/transformers/configuration_gpt2.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-29 12:26:26 -04:00
Sylvain Gugger
52e8392b7e Add automatic best model loading to Trainer (#7431)
* Add automatic best model loading to Trainer

* Some small fixes

* Formatting
2020-09-29 10:41:18 -04:00
Sylvain Gugger
1fc4de69ed Document new features of make fixup (#7434) 2020-09-29 03:56:57 -04:00
GmailB
205bf0b7ea Update README.md (#7444)
Hi, just corrected the example code, add 2 links and fixed some typos
2020-09-29 03:18:01 -04:00
Sam Shleifer
74d8d69bd4 [s2s] consistent output format across eval scripts (#7435) 2020-09-28 23:20:03 -04:00
Typicasoft
671b278e25 Create README.md (#7436)
* Create README.md

MagBERT-NER : Added widget (Text)

* Rename model_cards/README.md to model_cards/TypicaAI/magbert-ner/README.md
2020-09-28 18:25:25 -04:00
Manuel Romero
a1a8ffa512 Update README.md (#7429)
Add links to models fine-tuned on a downstream task
2020-09-28 13:40:09 -04:00
Stas Bekman
f62f2ffdcc [makefile] 10x speed up checking/fixing (#7403)
* [makefile] check/fix only modified since branching files

* fix phonies

* parametrize dirs

* have only one source for dirs to check

* look ma, no autoformatters here
2020-09-28 10:45:42 -04:00
Lysandre
16c213820e Update docs to version v3.3.0 2020-09-28 16:32:00 +02:00
Lysandre
0613f05226 Release: v3.3.0 2020-09-28 16:24:43 +02:00
Sylvain Gugger
ca3fc36de3 Reorganize documentation navbar (#7423)
* Reorganize documentation navbar

* Update css to have clear sections
2020-09-28 16:22:58 +02:00
Lysandre Debut
7f4115c099 Pull request template (#7392)
co-authored-by: sgugger <sylvain.gugger@gmail.com>

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
2020-09-28 09:51:49 -04:00
Sylvain Gugger
0611eab5e3 Document RAG again (#7377)
Do not merge before Monday
2020-09-28 08:31:46 -04:00
Sylvain Gugger
7563d5a3cf Catch PyTorch warning when saving/loading scheduler (#7401) 2020-09-28 08:20:10 -04:00
Boris Dayma
1749ca317e docs: fix model sharing file names (#5855)
* docs: fix model sharing file names

* Update docs/source/model_sharing.rst

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* docs(model_sharing.rst): fix new line

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-28 08:17:30 -04:00
Patrick von Platen
8279471506 correct RAG model cards (#7420) 2020-09-28 11:08:39 +02:00
Marcin Zabłocki
4083a55ab0 Flos fix (#7384) 2020-09-28 04:09:26 -04:00
Ola Piktus
ae3e84f3ba [RAG] Clean Rag readme in examples (#7413)
* Improve README + consolidation script

* Reformat README

* Reformat README

Co-authored-by: Your Name <you@example.com>
2020-09-28 10:06:39 +02:00
Sam Shleifer
748425d47d [T5] allow config.decoder_layers to control decoder size (#7409)
* Working assymmetrical T5

* rename decoder_layers -> num_decoder_layers

* Fix docstring

* Allow creation of asymmetric t5 students
2020-09-28 03:08:04 -04:00
Sam Shleifer
7296fea1d6 [s2s] rougeLSum expects \n between sentences (#7410)
Co-authored-by: Swetha Mandava <smandava@nvidia.com>
2020-09-27 16:27:19 -04:00
Suraj Patil
eab5f59682 [s2s] add create student script (#7290)
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-27 15:10:46 -04:00
Patrick von Platen
e50a931c11 [Longformer, Bert, Roberta, ...] Fix multi gpu training (#7272)
* fix multi-gpu

* fix longformer

* force to delete unnecessary layers

* fix notifications

* fix warning

* fix roberta

* fix tests

* remove hasattr

* fix tests

* fix roberta

* merge and clean authorized keys
2020-09-25 20:33:21 +02:00
Patrick von Platen
2c8ecdf8a8 fix rag retriever save pretrained (#7399) 2020-09-25 19:47:12 +02:00
Patrick von Platen
1a14687e6f Update README.md 2020-09-25 19:43:48 +02:00
Patrick von Platen
3327c2b0f6 Update README.md 2020-09-25 19:43:36 +02:00
Ola Piktus
fe326bd5cf Remove dependency on examples/seq2seq from rag (#7395)
Co-authored-by: Your Name <you@example.com>
2020-09-25 18:20:49 +02:00
Sylvain Gugger
ad39271ae8 Fix FP16 and attention masks in FunnelTransformer (#7374)
* Fix #7371

* Fix training

* Fix test values

* Apply the fix to TF as well
2020-09-25 12:20:39 -04:00
Patrick von Platen
4e5b036bdd Update README.md 2020-09-25 18:16:46 +02:00
Patrick von Platen
55eccfbb49 Update README.md 2020-09-25 18:16:44 +02:00
Sylvain Gugger
e2e77f02c2 Fix BartModel output documentation (#7390) 2020-09-25 11:48:13 -04:00
Sylvain Gugger
bbb07830ff Speedup check_copies script (#7394) 2020-09-25 11:47:22 -04:00
Stas Bekman
8859c4f841 [code quality] new make target that combines style and quality targets (#7310)
* [code quality] merge style and quality targets

Any reason why we don't run `flake8` in `make style`? I find myself needing to run `make style` and `make quality` all the time, but I need the latter just for the last 2 checks. Since we have no control over the source code why bother with separating checking and fixing - let's just have one target that fixes and then performs the remaining checks, as we know the first two have been done already.

This PR suggests to merge the 2 targets into one efficient target.

I will edit the docs if this change resonates with the team.

* move checks into style, re-use target

* better name

* add fixup target

* document new target
2020-09-25 11:37:40 -04:00
Sam Shleifer
38a1b03f4d Remove unhelpful bart warning (#7391) 2020-09-25 11:01:07 -04:00
Patrick von Platen
5ff0d6d7d0 Update README.md 2020-09-25 16:58:29 +02:00
Quentin Lhoest
cf1c88e092 [RAG] Fix retrieval offset in RAG's HfIndex and better integration tests (#7372)
* Fix retrieval offset in RAG's HfIndex

* update slow tests

* style

* fix new test

* style

* add better tests

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-09-25 16:12:46 +02:00
Patrick von Platen
571c7a11c1 [Rag] Fix wrong usage of num_beams and bos_token_id in Rag Sequence generation (#7386)
* fix_rag_sequence

* add second bug fix
2020-09-25 14:35:49 +02:00
Suraj Patil
415071b4c2 doc changes (#7385) 2020-09-25 08:00:36 -04:00
Patrick von Platen
2dd652d757 [RAG] Add missing doc and attention_mask to rag (#7382)
* add docs

* add missing docs and attention_mask in fine-tune
2020-09-25 11:23:55 +02:00
Lysandre Debut
7cdd9da5bf Check config type using type instead of isinstance (#7363)
* Check config type instead of instance


Bad merge

* Remove for loops

* Style
2020-09-25 05:09:09 -04:00
Sam Shleifer
3c6bf8998f modeling_bart: 3 small cleanups that dont change outputs (#7381)
* Mbart passing

* boom boom

* cleaner assert

* add assert

* Fix tests
2020-09-25 04:24:14 -04:00
Suraj Patil
9e68d075a4 Seq2SeqTrainer (#6769)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-24 18:46:58 -04:00
Sam Shleifer
d9d0f1140b [s2s] distributed eval allows num_return_sequences > 1 (#7254) 2020-09-24 17:30:09 -04:00
Patrick von Platen
0804d077c6 correct attention mask (#7373) 2020-09-24 23:22:04 +02:00
Stas Bekman
a8cbc4269c [fsmt] build/test scripts (#7257)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-24 17:10:26 -04:00
Sylvain Gugger
a8e7982f84 Remove mentions of RAG from the docs (#7376)
* Remove mentions of  RAG from the docs

* Deactivate check
2020-09-24 17:07:14 -04:00
Stas Bekman
eadd870b2f [seq2seq] make it easier to run the scripts (#7274) 2020-09-24 15:23:48 -04:00
Lysandre Debut
8d3bb781ee Formatter (#7368)
* Formatter

* Docs
2020-09-24 10:59:21 -04:00
Teven
7dfdf793bb Fixing case in which Trainer hung while saving model in distributed training (#7365)
* remote debugging

* remote debugging

* moved _store_flos call

* moved _store_flos call

* moved _store_flos call

* removed debugging artefacts
2020-09-24 09:56:40 -04:00
Sylvain Gugger
0ccb6f5c6d Clean RAG docs and template docs (#7348)
* Clean RAG docs and template docs

* Fix typo

* Better doc
2020-09-24 09:24:41 -04:00
Sylvain Gugger
27174bd4fe Make PyTorch model files independent from each other (#7352) 2020-09-24 08:53:54 -04:00
Julien Plu
d161ed1682 Update the TF models to remove their interdependencies (#7238)
* Refacto the models to remove their interdependencies

* Fix Flaubert model

* Fix Flaubert

* Fix XLM

* Fix Albert

* Fix Roberta

* Fix Albert

* Fix Flaubert

* Apply style + remove unused imports

* Fix Distilbert

* remove unused import

* fix Distilbert

* Fix Flaubert

* Apply style

* Fix Flaubert

* Add the copy comments for the check_copies script

* Fix MobileBert model name

* Address Morgan's comments

* Fix typo

* Oops typo
2020-09-24 08:30:59 -04:00
Jabin Huang
0cffa424f8 Updata tokenization_auto.py (#6870)
Updata tokenization_auto.py to handle Inherited tokenizer
2020-09-24 06:52:10 -04:00
Daquan Lin
03fb8e79c6 Update modeling_tf_longformer.py (#7359)
correct a very small mistake
2020-09-24 11:37:29 +02:00
Sylvain Gugger
1ff5bd38a3 Check decorator order (#7326)
* Check decorator order

* Adapt for parametrized decorators

* Fix typos
2020-09-24 04:54:37 -04:00
Sylvain Gugger
0be5f4a00c Expand a bit the documentation doc (#7350) 2020-09-24 04:34:18 -04:00
Sam Shleifer
38f1703795 wip: Code to add lang tags to marian model cards (#6586) 2020-09-23 18:11:06 -04:00
Theo Linnemann
129fdae040 Remove reference to args in XLA check (#7344)
Previously, the TFTrainingArguments object did a check to see if XLA was enabled, but did this by referencing `self.args.xla`, when it should be `self.xla`, because it is the args object. This can be verified a few lines above, where the XLA field is set.
2020-09-23 13:56:21 -04:00
Felipe Curti
d266613635 [Benchmarks] Change all args to from no_... to their positive form (#7075)
* Changed name to all no_... arguments and all references to them, inverting the boolean condition

* Change benchmark tests to use new Benchmark Args

* Update src/transformers/benchmark/benchmark_args_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/benchmark/benchmark.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix Style. Add --no options in help

* fix some part of tests

* Update src/transformers/benchmark/benchmark_args_utils.py

* Update src/transformers/benchmark/benchmark_args_utils.py

* Update src/transformers/benchmark/benchmark_args_utils.py

* fix all tests

* make style

* add backwards compability

* make backwards compatible

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: fmcurti <fcurti@DESKTOP-RRQURBM.localdomain>
2020-09-23 13:25:24 -04:00
Doug Blank
8c697d58ef Ensure that integrations are imported before transformers or ml libs (#7330)
* Ensure that intergrations are imported before transformers or ml libs

* Black reformatter wanted a newline

* isort requests

* black requests

* flake8 requests
2020-09-23 13:23:45 -04:00
Sylvain Gugger
3323146e90 Models doc (#7345)
* Clean up model documentation

* Formatting

* Preparation work

* Long lines

* Main work on rst files

* Cleanup all config files

* Syntax fix

* Clean all tokenizers

* Work on first models

* Models beginning

* FaluBERT

* All PyTorch models

* All models

* Long lines again

* Fixes

* More fixes

* Update docs/source/model_doc/bert.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update docs/source/model_doc/electra.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Last fixes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-23 13:20:45 -04:00
Wissam Antoun
58405a527b Fixed evaluation_strategy on epoch end bug (#7340)
* Fixed evaluation_strategy on epoch end bug

move the evaluation script outside the the iteration loop

* black formatting
2020-09-23 13:17:00 -04:00
Stas Bekman
28cf873036 [testing] skip decorators: docs, tests, bugs (#7334)
* skip decorators: docs, tests, bugs

* another important note

* style

* bloody style

* add @pytest.mark.parametrize

* add note

* no idea what it wants :(
2020-09-23 05:16:19 -04:00
Stas Bekman
df53643807 [code quality] fix confused flake8 (#7309)
* fix confused flake

We run `black  --target-version py35 ...` but flake8 doesn't know that, so currently with py38 flake8 fails suggesting that black should have reformatted 63 files. Indeed if I run:

```
black --line-length 119 --target-version py38 examples templates tests src utils
```
it indeed reformats 63 files.

The only solution I found is to create a black config file as explained at https://github.com/psf/black#configuration-format, which is what this PR adds.

Now flake8 knows that py35 is the standard and no longer gets confused regardless of the user's python version.

* adjust the other files that will now rely on black's config file
2020-09-22 22:12:36 -04:00
Sam Shleifer
78387cc63e [s2s] only save metrics.json from rank zero (#7331) 2020-09-22 18:27:28 -04:00
Sam Shleifer
e53138a1b9 [s2s] add src_lang kwarg for distributed eval (#7300) 2020-09-22 18:26:37 -04:00
blinovpd
a9c7849cfa [model_cards] blinoff/roberta-base-russian-v0 (#7317) 2020-09-22 18:26:13 -04:00
Sylvain Gugger
f5518e5631 Formatting 2020-09-22 14:55:12 -04:00
Chady Kamar
17099ebd58 Add num workers cli arg (#7322)
* Add dataloader_num_workers to TrainingArguments

This argument is meant to be used to set the
number of workers for the PyTorch DataLoader.

* Pass num_workers argument on DataLoader init
2020-09-22 14:44:42 -04:00
Sam Shleifer
25b0463d0b [s2s] add supported architecures to MD (#7252) 2020-09-22 13:09:35 -04:00
Pavel Soriano
d6bc72c469 Fixed results of SQuAD-FR evaluation (#7313)
The score for the F1 metric was reported as the Exact Match and vice-versa.
2020-09-22 12:39:07 -04:00
Huang Lianzhe
6303b5a718 [Bug Fix] The actual batch_size is inconsistent with the settings. (#7235)
* [bug fix] fixed the bug that the actual batch_size is inconsistent with the parameter settings

* reformat

* reformat

* reformat

* add support for dict and BatchEncoding

* add support for dict and BatchEncoding

* add documentation for DataCollatorForNextSentencePrediction

* Some more nits for the docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Some more nits for the docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Some more nits for the docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Some more nits for the docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Some more nits for the docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* rename variables

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-22 12:31:21 -04:00
Ola Piktus
c754c41c61 RAG (#6813)
* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* Formatting / renaming prior to actual work

* First commit

* improve comments

* Retrieval evaluation scripts

* refactor to include modeling outputs + MPI retriever

* Fix rag-token model + refactor

* Various fixes + finetuning logic

* use_bos fix

* Retrieval refactor

* Finetuning refactoring and cleanup

* Add documentation and cleanup

* Remove set_up_rag_env.sh file

* Fix retrieval wit HF index

* Fix import errors

* Fix quality errors

* Refactor as per suggestions in https://github.com/huggingface/transformers/pull/6813#issuecomment-687208867

* fix quality

* Fix RAG Sequence generation

* minor cleanup plus initial tests

* fix test

* fix tests 2

* Comments fix

* post-merge fixes

* Improve readme + post-rebase refactor

* Extra dependencied for tests

* Fix tests

* Fix tests 2

* Refactor test requirements

* Fix tests 3

* Post-rebase refactor

* rename nlp->datasets

* RAG integration tests

* add tokenizer to slow integration test and allow retriever to run on cpu

* add tests; fix position ids warning

* change structure

* change structure

* add from encoder generator

* save working solution

* make all integration tests pass

* add RagTokenizer.save/from_pretrained and RagRetriever.save/from_pretrained

* don't save paths

* delete unnecessary imports

* pass config to AutoTokenizer.from_pretrained for Rag tokenizers

* init wiki_dpr only once

* hardcode legacy index and passages paths (todo: add the right urls)

* finalize config

* finalize retriver api and config api

* LegacyIndex index download refactor

* add dpr to autotokenizer

* make from pretrained more flexible

* fix ragfortokengeneration

* small name changes in tokenizer

* add labels to models

* change default index name

* add retrieval tests

* finish token generate

* align test with previous version and make all tests pass

* add tests

* finalize tests

* implement thoms suggestions

* add first version of test

* make first tests work

* make retriever platform agnostic

* naming

* style

* add legacy index URL

* docstrings + simple retrieval test for distributed

* clean model api

* add doc_ids to retriever's outputs

* fix retrieval tests

* finish model outputs

* finalize model api

* fix generate problem for rag

* fix generate for other modles

* fix some tests

* save intermediate

* set generate to default

* big refactor generate

* delete rag_api

* correct pip faiss install

* fix auto tokenization test

* fix faiss install

* fix test

* move the distributed logic to examples

* model page

* docs

* finish tests

* fix dependencies

* fix import in __init__

* Refactor eval_rag and finetune scripts

* start docstring

* add psutil to test

* fix tf test

* move require torch to top

* fix retrieval test

* align naming

* finish automodel

* fix repo consistency

* test ragtokenizer save/load

* add rag model output docs

* fix ragtokenizer save/load from pretrained

* fix tokenizer dir

* remove torch in retrieval

* fix docs

* fixe finetune scripts

* finish model docs

* finish docs

* remove auto model for now

* add require torch

* remove solved todos

* integrate sylvains suggestions

* sams comments

* correct mistake on purpose

* improve README

* Add generation test cases

* fix rag token

* clean token generate

* fix test

* add note to test

* fix attention mask

* add t5 test for rag

* Fix handling prefix in finetune.py

* don't overwrite index_name

Co-authored-by: Patrick Lewis <plewis@fb.com>
Co-authored-by: Aleksandra Piktus <piktus@devfair0141.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5102.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5067.h2.fair>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
2020-09-22 18:29:58 +02:00
Sylvain Gugger
1ee2194fb6 Mark big downloads slow (#7325)
* Make big downloads as slow

* Add import

* Right order for slow decorator

* More slow tests
2020-09-22 12:21:52 -04:00
Julien Plu
585217c87f Add generic text classification example in TF (#5716)
* Add new example with nlp

* Update README

* replace nlp by datasets

* Update examples/text-classification/README.md

Add Lysandre's suggestion.

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-22 12:05:05 -04:00
Lysandre
6e21f24220 Documentation version 2020-09-22 18:04:39 +02:00
Lysandre
3ebb1b3a2b Release: v3.2.0 2020-09-22 17:36:51 +02:00
Sylvain Gugger
01f0fd0bab Fixes for LayoutLM (#7318) 2020-09-22 10:37:11 -04:00
Julien Plu
702a76ff92 Create an XLA parameter and fix the mixed precision (#7311)
* Create an XLA parameter and fix mixed precision creation

* Fix issue brought by intellisense

* Complete docstring
2020-09-22 10:19:34 -04:00
Sylvain Gugger
596342c2b9 Support for Windows in check_copies (#7316) 2020-09-22 10:17:48 -04:00
Sylvain Gugger
89edf504bf Add possibility to evaluate every epoch (#7302)
* Add possibility to evaluate every epoch

* Remove multitype arg

* Remove needless import

* Use a proper enum

* Apply suggestions from @LysandreJik

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* One else and formatting

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-22 09:52:29 -04:00
Sylvain Gugger
21ca148090 is_pretokenized -> is_split_into_words (#7236)
* is_pretokenized -> is_split_into_words

* Fix tests
2020-09-22 09:34:35 -04:00
Julien Plu
324f361e91 Fix saving TF custom models (#7291)
* Fix #7277

* Apply style

* Add a full training pipeline test

* Apply style
2020-09-22 09:31:13 -04:00
Minghao Li
cd9a0585ea Add LayoutLM Model (#7064)
* first version

* finish test docs readme model/config/tokenization class

* apply make style and make quality

* fix layoutlm GitHub link

* fix conflict in index.rst and add layoutlm to pretrained_models.rst

* fix bug in test_parents_and_children_in_mappings

* reformat modeling_auto.py and tokenization_auto.py

* fix bug in test_modeling_layoutlm.py

* Update docs/source/model_doc/layoutlm.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/layoutlm.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove inh, add tokenizer fast, and update some doc

* copy and rename necessary class from modeling_bert to modeling_layoutlm

* Update src/transformers/configuration_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/configuration_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/configuration_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/configuration_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* add mish to activations.py, import ACT2FN and import logging from utils

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-22 09:28:02 -04:00
Sylvain Gugger
244e1b5ba3 Fix #7304 (#7305) 2020-09-22 09:20:03 -04:00
Lysandre Debut
e46108817e Adds FSMT to LM head AutoModel (#7312) 2020-09-22 06:35:51 -04:00
Stas Bekman
e2964b8a19 [fsmt] no need to pass device (#7292) 2020-09-22 05:39:06 -04:00
Sylvain Gugger
e4b94d8e58 Copy code from Bert to Roberta and add safeguard script (#7219)
* Copy code from Bert to Roberta and add safeguard script

* Fix docstring

* Comment code

* Formatting

* Update src/transformers/modeling_roberta.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add test and fix bugs

* Fix style and make new comand

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-22 05:02:27 -04:00
Sam Shleifer
656c27c3a3 [s2s] save hostname with repo info (#7301)
* save hostname
2020-09-21 17:26:24 -04:00
Thomas Winters
34a1b75f01 Added RobBERT-v2 model card (#7286)
* Added RobBERT-v2 model card

* minor Tweaks

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-21 16:17:28 -04:00
jjacampos
6513d16a48 IXAmBERT model card (#7283)
This PR includes the model card for the IXAmBERT model which has been recently uploaded to the huggingface repository.
2020-09-21 16:15:31 -04:00
Stas Bekman
af4b98ed97 [s2s] adjust finetune + test to work with fsmt (#7263) 2020-09-21 15:13:19 -04:00
Stas Bekman
8d562a2d1a [s2s] s/alpha_loss_encoder/alpha_encoder_loss/ (#7298)
fix to match `distillation.py:        self.alpha_encoder_loss`
2020-09-21 14:14:26 -04:00
Stas Bekman
cbb2f75a16 [s2s tests] fix test_run_eval_search (#7297) 2020-09-21 14:00:40 -04:00
Suraj Patil
7a88ed6c2a [model card] distlbart-mnli model cards (#7278) 2020-09-21 12:26:18 -04:00
Sylvain Gugger
63276b76d4 Fix #7284 (#7289) 2020-09-21 10:31:26 -04:00
Raphaël Bournhonesque
8d464374ba Disable missing weight warning (#7282) 2020-09-21 09:14:48 -04:00
Stas Bekman
8ff88d25e9 [fsmt] rewrite SinusoidalPositionalEmbedding + USE_CUDA test fixes + new TranslationPipeline test (#7224)
* fix USE_CUDA, add pipeline

* USE_CUDA fix

* recode SinusoidalPositionalEmbedding into nn.Embedding subclass

was needed for torchscript to work - this is now part of the state_dict, so will have to remove these keys during save_pretrained

* back out (ci debug)

* restore

* slow last?

* facilitate not saving certain keys and test

* remove no longer used keys

* style

* fix logging import

* cleanup

* Update src/transformers/modeling_utils.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* fix bug in max_positional_embeddings

* rename keys to keys_to_never_save per suggestion, improve the setup

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-21 09:13:35 -04:00
Dat Quoc Nguyen
67c4b0c517 Add model cards for new pre-trained BERTweet-COVID19 models (#7269)
Two new pre-trained models "vinai/bertweet-covid19-base-cased" and "vinai/bertweet-covid19-base-uncased" are resulted by further pre-training the pre-trained model "vinai/bertweet-base" on a  corpus of 23M COVID-19 English Tweets for 40 epochs.
2020-09-21 06:12:51 -04:00
Patrick von Platen
0cbe1139b1 Update README.md 2020-09-21 11:53:08 +02:00
Lysandre
aae4edb5f0 Addressing review comment 2020-09-21 11:37:00 +02:00
Suraj Patil
43b9d93875 [example/glue] fix compute_metrics_fn for bart like models (#7248)
* fix compute_metrics_fn

* p.predictions -> preds

* apply suggestions
2020-09-21 05:34:20 -04:00
guillaume-be
39062d05f0 Fixed target_mapping preparation for XLNet when batch size > 1 (incl. beam search) (#7267) 2020-09-21 04:53:52 -04:00
Nadir El Manouzi
4b3e55bdcc Add "Fine-tune ALBERT for sentence-pair classification" notebook to the community notebooks (#7255) 2020-09-21 04:25:22 -04:00
Stas Bekman
7cbf0f722d examples/seq2seq/__init__.py mutates sys.path (#7194) 2020-09-20 16:54:42 -04:00
Manuel Romero
a4faeceaed Fix typo in model name (#7268) 2020-09-20 19:12:30 +02:00
Stas Bekman
47ab3e8262 @slow has to be last (#7251)
Found an issue when `@slow` isn't the last decorator (gets ignored!), so documenting this significance.
2020-09-20 09:17:29 -04:00
Stas Bekman
4f6e525742 model card improvements (#7221) 2020-09-19 17:02:05 -04:00
Stas Bekman
eb074af75e fsmt tiny model card + script (#7244) 2020-09-19 14:37:12 -04:00
Manuel Romero
1d90d0f386 Add title to model card (#7240) 2020-09-19 02:10:45 -04:00
Manuel Romero
c9b7ef042f Create README.md (#7239) 2020-09-19 02:09:29 -04:00
Sam Shleifer
83dba10b8f [s2s] distributed_eval.py saves better speed info (#7242) 2020-09-18 15:46:01 -04:00
Dat Quoc Nguyen
af2322c7a0 Add new pre-trained models BERTweet and PhoBERT (#6129)
* Add BERTweet and PhoBERT models

* Update modeling_auto.py

Re-add `bart` to LM_MAPPING

* Update tokenization_auto.py

Re-add `from .configuration_mobilebert import MobileBertConfig`
not sure why it's replaced by `from transformers.configuration_mobilebert import MobileBertConfig`

* Add BERTweet and PhoBERT to pretrained_models.rst

* Update tokenization_auto.py

Remove BertweetTokenizer and PhobertTokenizer out of tokenization_auto.py (they are currently not supported by AutoTokenizer.

* Update BertweetTokenizer - without nltk

* Update model card for BERTweet

* PhoBERT - with Auto mode - without import fastBPE

* PhoBERT - with Auto mode - without import fastBPE

* BERTweet - with Auto mode - without import fastBPE

* Add PhoBERT and BERTweet to TF modeling auto

* Improve Docstrings for PhobertTokenizer and BertweetTokenizer

* Update PhoBERT and BERTweet model cards

* Fixed a merge conflict in tokenization_auto

* Used black to reformat BERTweet- and PhoBERT-related files

* Used isort to reformat BERTweet- and PhoBERT-related files

* Reformatted BERTweet- and PhoBERT-related files based on flake8

* Updated test files

* Updated test files

* Updated tf test files

* Updated tf test files

* Updated tf test files

* Updated tf test files

* Update commits from huggingface

* Delete unnecessary files

* Add tokenizers to auto and init files

* Add test files for tokenizers

* Revised model cards

* Update save_vocabulary function in BertweetTokenizer and PhobertTokenizer and test files

* Revised test files

* Update orders of Phobert and Bertweet tokenizers in auto tokenization file
2020-09-18 13:16:43 -04:00
Patrick von Platen
9397436ea5 Create README.md 2020-09-18 16:52:00 +02:00
Patrick von Platen
7eeca4d399 Create README.md 2020-09-18 16:44:02 +02:00
Patrick von Platen
31516c776a Update README.md 2020-09-18 16:37:14 +02:00
Patrick von Platen
4c14669a78 Update README.md 2020-09-18 16:35:11 +02:00
Yih-Dar
3a03bab9db Fix a few countings (steps / epochs) in trainer_tf.py (#7175) 2020-09-18 09:28:56 -04:00
Stefan Schweter
ee9eae4e06 token-classification: update url of GermEval 2014 dataset (#6571) 2020-09-18 06:18:06 -04:00
Julien Chaumond
eef8d94d19 [model_cards]
We use ISO 639-1 cc @gentaiscool
2020-09-18 12:09:24 +02:00
Patrick von Platen
afd6a9f827 Create README.md 2020-09-18 11:41:12 +02:00
Patrick von Platen
9f1544b9e0 Create README.md 2020-09-18 11:37:20 +02:00
Sameer Zahid
5c1d5ea667 Fixed typo in README (#7233) 2020-09-18 04:52:43 -04:00
Yuta Hayashibe
7719ecd19f Fix a typo (#7225) 2020-09-18 04:23:33 -04:00
Manuel Romero
4a26e8ac5f Create README.md (#7205) 2020-09-18 03:24:30 -04:00
Manuel Romero
94320c5b81 Add customized text to widget (#7204) 2020-09-18 03:24:23 -04:00
Manuel Romero
3aefb24b20 Create README.md (#7209) 2020-09-18 03:24:10 -04:00
Manuel Romero
a22e7a8dd4 Create README.md (#7210) 2020-09-18 03:23:58 -04:00
Manuel Romero
c028b26481 Create README.md (#7212) 2020-09-18 03:23:49 -04:00
Genta Indra Winata
c7cdd7b4fd Create README.md for indobert-lite-base-p1 (#7182) 2020-09-18 03:22:32 -04:00
Genta Indra Winata
bfb9150b8f Create README.md for indobert-lite-large-p1 (#7184)
* Create README.md

* Update README.md
2020-09-18 03:22:11 -04:00
Genta Indra Winata
d193593403 Create README.md (#7183) 2020-09-18 03:21:54 -04:00
Genta Indra Winata
e65d846674 Create README.md (#7185) 2020-09-18 03:21:39 -04:00
Genta Indra Winata
e27d86d48d Create README.md for indobert-large-p2 model card (#7181) 2020-09-18 03:21:28 -04:00
Genta Indra Winata
881c0783e9 Create README.md for indobert-large-p1 model card (#7180) 2020-09-18 03:21:16 -04:00
Genta Indra Winata
e0d58a5c87 Create README.md (#7179) 2020-09-18 03:20:59 -04:00
Genta Indra Winata
1313a1d2a8 Create README.md for indobert-base-p2 (#7178) 2020-09-18 03:20:29 -04:00
tuner007
cf24f43e76 Create README.md (#7095)
Create model card for Pegasus QA
2020-09-18 03:19:45 -04:00
Sam Shleifer
67d9fc50d9 [s2s] remove double assert (#7223) 2020-09-17 18:32:31 -04:00
Stas Bekman
edbaad2c5c [model cards] fix metadata - 3rd attempt (#7218) 2020-09-17 16:57:06 -04:00
Stas Bekman
999a1c957a skip failing FSMT CUDA tests until investigated (#7220) 2020-09-17 16:53:14 -04:00
Stas Bekman
51c4adf54c [model cards] fix dataset yaml (#7216) 2020-09-17 15:29:39 -04:00
Sam Shleifer
a5638b2b3a [s2s] dynamic batch size with --max_tokens_per_batch (#7030) 2020-09-17 15:19:34 -04:00
Stas Bekman
efeab6a3f1 [s2s] run_eval/run_eval_search tweaks (#7192)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-17 14:26:38 -04:00
Stas Bekman
9c5bcab5b0 [model cards] fix yaml in cards (#7207) 2020-09-17 14:11:17 -04:00
Sohee Yang
e643a29722 Change to use relative imports in some files & Add python prompt symbols to example codes (#7202)
* Move 'from transformers' statements to relative imports in some files

* Add python prompt symbols in front of the example codes

* Reformat the code

* Add one missing space

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-17 12:30:45 -04:00
Stas Bekman
0fe6e435b6 [model cards] ported allenai Deep Encoder, Shallow Decoder models (#7153)
* [model cards] ported allenai Deep Encoder, Shallow Decoder models

* typo

* fix references

* add allenai/wmt19-de-en-6-6 model cards

* fill-in the missing info for the build script as provided by the searcher.
2020-09-17 17:58:49 +02:00
Stas Bekman
1eeb206bef [ported model] FSMT (FairSeq MachineTranslation) (#6940)
* ready for PR

* cleanup

* correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST

* fix

* perfectionism

* revert change from another PR

* odd, already committed this one

* non-interactive upload workaround

* backup the failed experiment

* store langs in config

* workaround for localizing model path

* doc clean up as in https://github.com/huggingface/transformers/pull/6956

* style

* back out debug mode

* document: run_eval.py --num_beams 10

* remove unneeded constant

* typo

* re-use bart's Attention

* re-use EncoderLayer, DecoderLayer from bart

* refactor

* send to cuda and fp16

* cleanup

* revert (moved to another PR)

* better error message

* document run_eval --num_beams

* solve the problem of tokenizer finding the right files when model is local

* polish, remove hardcoded config

* add a note that the file is autogenerated to avoid losing changes

* prep for org change, remove unneeded code

* switch to model4.pt, update scores

* s/python/bash/

* missing init (but doesn't impact the finetuned model)

* cleanup

* major refactor (reuse-bart)

* new model, new expected weights

* cleanup

* cleanup

* full link

* fix model type

* merge porting notes

* style

* cleanup

* have to create a DecoderConfig object to handle vocab_size properly

* doc fix

* add note (not a public class)

* parametrize

* - add bleu scores integration tests

* skip test if sacrebleu is not installed

* cache heavy models/tokenizers

* some tweaks

* remove tokens that aren't used

* more purging

* simplify code

* switch to using decoder_start_token_id

* add doc

* Revert "major refactor (reuse-bart)"

This reverts commit 226dad15ca6a9ef4e26178526e878e8fc5c85874.

* decouple from bart

* remove unused code #1

* remove unused code #2

* remove unused code #3

* update instructions

* clean up

* move bleu eval to examples

* check import only once

* move data+gen script into files

* reuse via import

* take less space

* add prepare_seq2seq_batch (auto-tested)

* cleanup

* recode test to use json instead of yaml

* ignore keys not needed

* use the new -y in transformers-cli upload -y

* [xlm tok] config dict: fix str into int to match definition (#7034)

* [s2s] --eval_max_generate_length (#7018)

* Fix CI with change of name of nlp (#7054)

* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last

* extending to support allen_nlp wmt models

- allow a specific checkpoint file to be passed
- more arg settings
- scripts for allen_nlp models

* sync with changes

* s/fsmt-wmt/wmt/ in model names

* s/fsmt-wmt/wmt/ in model names (p2)

* s/fsmt-wmt/wmt/ in model names (p3)

* switch to a better checkpoint

* typo

* make non-optional args such - adjust tests where possible or skip when there is no other choice

* consistency

* style

* adjust header

* cards moved (model rename)

* use best custom hparams

* update info

* remove old cards

* cleanup

* s/stas/facebook/

* update scores

* s/allen_nlp/allenai/

* url maps aren't needed

* typo

* move all the doc / build /eval generators to their own scripts

* cleanup

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* fix indent

* duplicated line

* style

* use the correct add_start_docstrings

* oops

* resizing can't be done with the core approach, due to 2 dicts

* check that the arg is a list

* style

* style

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-17 11:31:29 -04:00
Sylvain Gugger
492bb6aa48 Trainer multi label (#7191)
* Trainer accep multiple labels

* Missing import

* Fix dosctrings
2020-09-17 08:15:37 -04:00
RafaelWO
709745927b Transformer-XL: Remove unused parameters (#7087)
* Removed 'tgt_len' and 'ext_len' from Transfomer-XL

 * Some changes are still to be done

* Removed 'tgt_len' and 'ext_len' from Transfomer-XL (2)

 * Removed comments
 * Fixed quality

* Changed warning to info
2020-09-17 06:10:34 -04:00
Dhaval Taunk
c183d81e27 added multilabel text classification notebook using distilbert to community notebooks (#7201)
* added multilabel classification using distilbert notebook to community notebooks

* added multilabel classification using distilbert notebook to community notebooks
2020-09-17 05:58:57 -04:00
Stas Bekman
79111b77d2 remove deprecated flag (#7171)
```
/home/circleci/.local/lib/python3.6/site-packages/isort/main.py:915: UserWarning: W0501: The following deprecated CLI flags were used and ignored: --recursive!
  "W0501: The following deprecated CLI flags were used and ignored: "
```
2020-09-17 05:52:12 -04:00
Stas Bekman
0cdafbf7ec remove duplicated code (#7173) 2020-09-17 05:51:40 -04:00
Sam Shleifer
45b0b1ff2f [s2s] fix kwarg typo (#7196) 2020-09-16 21:58:57 -04:00
Sam Shleifer
0203ad43bc [s2s] distributed eval cleanup (#7186) 2020-09-16 15:38:37 -04:00
sgugger
3babef815c Formatting 2020-09-16 14:57:09 -04:00
Stas Bekman
42049b8e12 use the correct add_start_docstrings (#7174) 2020-09-16 14:40:35 -04:00
Stas Bekman
fdaf8ab349 [s2s run_eval] new features (#7109)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-16 13:59:57 -04:00
Antoine Louis
df165065c3 [model_cards] antoiloui/belgpt2 🇧🇪 (#7166)
* Create README.md

* Update README.md
2020-09-16 12:16:01 -04:00
Sylvain Gugger
108c9aefcc Update README (#7133)
* Rewrite and update README

* Typo and migration guide

* Apply suggestions from code review

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address Clem's comments

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-09-16 12:12:12 -04:00
Donna Choi
9e376e156a Add condition (#7161) 2020-09-16 09:15:10 -04:00
Stas Bekman
f8590c56e6 [doc] improve/expand the Parametrization section (#7156) 2020-09-16 08:45:50 -04:00
Stas Bekman
d3391c87fe build/eval/gen-card scripts for fsmt (#7155)
* build/eval/gen-card scripts for fsmt

* adjust for model renames
2020-09-16 08:41:26 -04:00
Xi Ye
08bfc1718a fix the warning message of overflowed sequence (#7151) 2020-09-16 07:40:57 -04:00
Julien Plu
af8425b749 Refactoring the TF activations functions (#7150)
* Refactoring the activations functions into a common file

* Apply style

* remove unused import

* fix tests

* Fix tests.
2020-09-16 07:03:47 -04:00
Stas Bekman
b00cafbde5 [docs] add testing documentation (#7101)
* [docs] add testing documentation

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* tweaks as suggested

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* tweaks

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* more tweaks

* suggestions from @LysandreJik

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-15 19:25:25 -04:00
Patrick von Platen
85ffda96fc fix encoder decoder kwargs (#7131) 2020-09-15 21:10:07 +02:00
Yih-Dar
4c62c6021a fix ZeroDivisionError and epoch counting (#7125)
* fix ZeroDivisionError and epoch counting

* Add test for num_train_epochs calculation in trainer.py

* Remove @require_non_multigpu for test_num_train_epochs_in_training
2020-09-15 11:51:50 -04:00
Patrick von Platen
7af2791d77 Create README.md 2020-09-15 16:47:36 +02:00
Sylvain Gugger
153ec2f154 Funnel model cards (#7147) 2020-09-15 10:40:57 -04:00
Sylvain Gugger
7186ca6240 Multi predictions trainer (#7126)
* Allow multiple outputs

* Formatting

* Move the unwrapping before metrics

* Fix typo

* Add test for non-supported config options
2020-09-15 10:27:24 -04:00
Pedro Lima
52d250f6aa [model_cards] pvl/labse_bert model card
From **Language-Agnostic BERT Sentence Embedding**

https://ai.googleblog.com/2020/08/language-agnostic-bert-sentence.html
2020-09-15 08:54:12 -04:00
tuner007
84d64805b0 Create README.md (#7097)
Model card for PEGASUS finetuned for paraphrasing task
2020-09-15 08:48:25 -04:00
Philip May
52bb7ccce5 German electra model card v3 update (#7089)
* changed eval table model order

* Update install

* update mc
2020-09-15 08:48:13 -04:00
Siddharth Jain
1a85299a5e Tiny typo fix (#7143) 2020-09-15 08:18:42 -04:00
Paul O'Leary McCann
e29c3f1b11 Add quotes to paths in MeCab arguments (#7142)
Without quotes directories with spaces in them will fail to be processed
correctly.
2020-09-15 19:04:50 +08:00
Yih-Dar
cb061e78e1 Fix TF Trainer loss calculation (#6998)
* create branch for issue #6968

* First attempt to fix incorrect tf trainer loss calculation

* Fix training loss in metric

* fix tf trainer evaluation loss

* apply count_instances_in_batch() for eval and test datasets

* prototype of using a new argument in trainer_tf.py to fix loss issue

* some renaming and fix, in particular for evaluation methods

* fix bugs to have a running version

* change to @staticmethod

* apply style
2020-09-15 05:41:00 -04:00
Stas Bekman
b0cbcdb05b [logging] remove no longer needed verbosity override (#7100) 2020-09-15 04:01:14 -04:00
Sylvain Gugger
2bf70e2150 Fix reproducible tests in Trainer (#7119)
* Fix reproducible tests in Trainer

* Deal with multiple GPUs
2020-09-15 03:32:44 -04:00
Sam Shleifer
9e89390ce1 [QOL] add signature for prepare_seq2seq_batch (#7108) 2020-09-14 20:33:08 -04:00
Sam Shleifer
33d479d2b2 [s2s] distributed eval in one command (#7124) 2020-09-14 15:57:56 -04:00
sgugger
206b78d485 Pin version of TF and torch 2020-09-14 14:08:51 -04:00
Kevin Canwen Xu
90cde2e938 Add Mirror Option for Downloads (#6679)
* Add Tuna Mirror for Downloads from China

* format fix

* Use preset instead of hardcoding URL

* Fix

* make style

* update the mirror option doc

* update the mirror
2020-09-14 23:50:22 +08:00
Antonio V Mendoza
e0e0675ac7 Demoing LXMERT with raw images by incorporating the FRCNN model for roi-pooled extraction and bounding-box predction on the GQA answer set. (#6986)
* adding demo

* Update examples/lxmert/requirements.txt

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update examples/lxmert/checkpoint.sh

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* added user input for .py demo

* updated model loading, data extrtaction, checkpoints, and lots of other automation

* adding normalizing for bounding boxes

* Update requirements.txt

* some optimizations for extracting data

* added data extracting file

* added data extraction file

* minor fixes to reqs and readme

* Style

* remove options

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-09-14 10:07:04 -04:00
sgugger
5636cbb25d Extra ) 2020-09-14 09:37:55 -04:00
Sylvain Gugger
ccc8e30c8a Clean up autoclass doc (#7081) 2020-09-14 09:26:41 -04:00
Stas Bekman
3ca1874ca4 [examples testing] restore code (#7099)
For some reason https://github.com/huggingface/transformers/pull/5512 re-added temp dir creation code that was removed by
https://github.com/huggingface/transformers/pull/6494 defeating the purpose of that PR for those tests.
2020-09-14 08:54:23 -04:00
Stas Bekman
4d39148419 fix deprecation warnings (#7033)
* fix deprecation warnings

* remove tests/test_tokenization_common.py's test_padding_to_max_length

* revert test_padding_to_max_length
2020-09-14 07:51:19 -04:00
Stas Bekman
576eec98e0 ignore FutureWarning in tests (#7079) 2020-09-14 07:50:51 -04:00
Bartosz Telenczuk
15d18e0307 fix link to paper (#7116) 2020-09-14 07:43:40 -04:00
Lysandre Debut
bb3106f741 Temporarily skip failing tests due to dependency change (#7118)
* Temporarily skip failing tests due to dependency change

* Remove trace
2020-09-14 07:42:13 -04:00
Sam Shleifer
0fab39695a [s2s distill] allow pegasus-12-12 (#7104) 2020-09-14 00:03:59 -04:00
Sam Shleifer
de9e297964 [s2s] distributed eval cleanup (#7110) 2020-09-13 23:40:38 -04:00
Sam Shleifer
54395d87a6 Update xsum length penalty to better values (#7107) 2020-09-13 20:48:47 -04:00
Sam Shleifer
e7f8d2ab64 [s2s] two stage run_distributed_eval.py (#7105) 2020-09-13 17:28:18 -04:00
Sam Shleifer
0ec63afec2 fix bug in pegasus converter (#7094) 2020-09-13 15:11:47 -04:00
Sam Shleifer
b76cb1c3df [s2s] run_eval supports --prefix clarg. (#6953) 2020-09-12 01:08:21 -04:00
李明浩
563ffb3dc3 Create README.md (#7066) 2020-09-11 15:21:05 -04:00
李明浩
1ad49cde3a Create README.md (#7067) 2020-09-11 15:20:54 -04:00
Sagor Sarker
4753816e39 added bangla-bert-base model card and also modified other model cards (#7071)
* added bangla-bert-base

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-11 15:17:25 -04:00
Suraj Patil
0a8c17d53c [T5Tokenizer] remove prefix_tokens (#7078) 2020-09-11 14:18:45 -04:00
Sylvain Gugger
4cbd50e611 Compute loss method (#7074) 2020-09-11 12:06:31 -04:00
Sylvain Gugger
ae736163d0 Add tests and fix various bugs in ModelOutput (#7073)
* Add tests and fix various bugs in ModelOutput

* Update tests/test_model_output.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-09-11 12:01:33 -04:00
Sylvain Gugger
e841b75dec Automate the lists in auto-xxx docs (#7061)
* More readable dict

* More nlp -> datasets

* Revert "More nlp -> datasets"

This reverts commit 3cd1883d226c63c4a686fc1fed35f2cd586ebe45.

* Automate the lists in auto-xxx docs

* More readable dict

* Revert "More nlp -> datasets"

This reverts commit 3cd1883d226c63c4a686fc1fed35f2cd586ebe45.

* Automate the lists in auto-xxx docs

* nlp -> datasets

* Fix new key
2020-09-11 10:42:09 -04:00
Sylvain Gugger
0054a48cdd Add dep on datasets (#7058) 2020-09-11 04:43:19 -04:00
Patrick von Platen
221d4c63a3 clean naming (#7068) 2020-09-11 09:57:53 +02:00
Stas Bekman
8fcbe486e1 these tests require non-multigpu env (#7059)
* these tests require non-multigpu env

* cleanup

* clarify
2020-09-10 18:52:55 -04:00
Sam Shleifer
77950c485a [wip/s2s] DistributedSortishSampler (#7056) 2020-09-10 15:23:44 -04:00
Sylvain Gugger
514486739c Fix CI with change of name of nlp (#7054)
* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last
2020-09-10 14:51:08 -04:00
Sam Shleifer
e9a2f772bc [s2s] --eval_max_generate_length (#7018) 2020-09-10 14:11:34 -04:00
Stas Bekman
df4594a9da [xlm tok] config dict: fix str into int to match definition (#7034) 2020-09-10 19:31:01 +02:00
Julien Chaumond
d6c08b07a0 [AutoTokenizer] Correct error message 2020-09-10 17:19:01 +02:00
Patrick von Platen
db38f7ce29 [BertGeneration, Docs] Fix another old name in docs (#7050)
* correct docs for bert generation

* upload
2020-09-10 17:12:33 +02:00
Patrick von Platen
3bd95b0faf correct docs for bert generation (#7048) 2020-09-10 17:08:40 +02:00
Patrick von Platen
eb2feb5d90 Create README.md 2020-09-10 17:05:50 +02:00
Ashwin Geet Dsa
66a5a6fda8 fix to ensure that returned tensors after the tokenization is Long (#7039)
* fix to ensure that returned tensors after the tokenization is Long

* fix to ensure that returned tensors after the tokenization is Long

Co-authored-by: Ashwin Geet Dsa <adsa@grvingt-6.nancy.grid5000.fr>
2020-09-10 11:04:03 -04:00
Patrick von Platen
9ccdb1d517 Update README.md 2020-09-10 17:01:19 +02:00
Patrick von Platen
60698936fc Create README.md 2020-09-10 17:00:10 +02:00
Patrick von Platen
e0c3bc8ee0 Create README.md 2020-09-10 16:51:15 +02:00
Patrick von Platen
c356b9878d Create README.md 2020-09-10 16:45:44 +02:00
Patrick von Platen
5afd3f6196 Create README.md 2020-09-10 16:44:47 +02:00
Sylvain Gugger
15a189049e Add TF Funnel Transformer (#7029)
* Add TF Funnel Transformer

* Proper dummy input

* Formatting

* Update src/transformers/modeling_tf_funnel.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* One review comment forgotten

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-10 10:41:56 -04:00
Patrick von Platen
7fd1febf38 Add "Leveraging Pretrained Checkpoints for Generation" Seq2Seq models. (#6594)
* add conversion script

* improve conversion script

* make style

* add tryout files

* fix

* update

* add causal bert

* better names

* add tokenizer file as well

* finish causal_bert

* fix small bugs

* improve generate

* change naming

* renaming

* renaming

* renaming

* remove leftover files

* clean files

* add fix tokenizer

* finalize

* correct slow test

* update docs

* small fixes

* fix link

* adapt check repo

* apply sams and sylvains recommendations

* fix import

* implement Lysandres recommendations

* fix logger warn
2020-09-10 16:40:51 +02:00
Sylvain Gugger
d1691d90e5 Samell fixed in tf template (#7044) 2020-09-10 10:36:02 -04:00
Patrick von Platen
63e539459d Update README.md 2020-09-10 16:34:28 +02:00
Patrick von Platen
054db06b1b Create README.md 2020-09-10 16:30:46 +02:00
Lysandre Debut
b482ad474a Fix template (#7040) 2020-09-10 08:45:52 -04:00
Yu Liu
762cba3bda Albert pretrain datasets/ datacollator (#6168)
* add dataset for albert pretrain

* datacollator for albert pretrain

* naming, comprehension, file reading change

* data cleaning is no needed after this modification

* delete prints

* fix a bug

* file structure change

* add tests for albert datacollator

* remove random seed

* add back len and get item function

* sample file for testing and test code added

* format change for black

* more format change

* Style

* var assignment issue resolve

* add back wrongly deleted DataCollatorWithPadding in init file

* Style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-09-10 07:56:29 -04:00
Johann C. Rocholl
49e9be0639 Fix confusing warnings during TF2 import from PyTorch (#6623)
1. Swapped missing_keys and unexpected_keys.

2. Copy&paste error caused these warnings to say "from TF 2.0" when it's actually "from PyTorch".
2020-09-10 05:31:59 -04:00
Stas Bekman
4ee1053dcf add -y to bypass prompt for transformers-cli upload (#7035) 2020-09-10 04:58:29 -04:00
Patrick von Platen
76818cc4c6 Create README.md 2020-09-09 16:26:35 +02:00
Lysandre Debut
15478c1287 Batch encore plus and overflowing tokens fails when non existing overflowing tokens for a sequence (#6677)
* Patch and test

* Fix tests
2020-09-09 06:55:17 -04:00
Henry Dashwood
9fd11bf1a8 replace torch.triu with onnx compatible code (#6929) 2020-09-09 04:56:40 -04:00
Julien Chaumond
ed71c21d6a [from_pretrained] Allow tokenizer_type ≠ model_type (#6995) 2020-09-09 04:22:59 -04:00
Stas Bekman
03e363f9ae [generation] consistently add eos tokens (#6982)
Currently beam search returns inconsistent outputs - if hypos have different lengths we get eos, if they are the same - we don't.

This PR makes the output consistent.

Also why not also replace:

```
            if sent_lengths[i] < max_length:
                decoded[i, sent_lengths[i]] = eos_token_id
```
with:
```
            decoded[i, sent_lengths[i]] = eos_token_id
```
Shouldn't eos always be there? If the data gets truncated, the caller needs to user a larger `max_length`.

Please correct me if my logic is flawed.
2020-09-09 04:08:36 -04:00
Stas Bekman
d0963486c1 adding TRANSFORMERS_VERBOSITY env var (#6961)
* introduce TRANSFORMERS_VERBOSITY env var + test + test helpers

* cleanup

* remove helper function
2020-09-09 04:08:01 -04:00
Sam Shleifer
f0fc0aea6b pegasus.rst: fix expected output (#7017) 2020-09-08 13:29:16 -04:00
Patrick von Platen
120176ea29 [Longformer] Fix longformer documentation (#7016)
* fix longformer

* allow position ids to not be initialized
2020-09-08 18:51:28 +02:00
Lysandre Debut
5c4eb4b1ac Fixing FLOPS merge by checking if torch is available (#7013)
* Should check if `torch` is available

* fixed samples_count error, distributed_concat arguments

* style

* Import torch at beginning of file

Co-authored-by: TevenLeScao <teven.lescao@gmail.com>
2020-09-08 10:51:58 -04:00
Teven
01d340adfa Floating-point operations logging in trainer (#6768)
* neFLOs calculation, logging, and reloading (#1)

* testing distributed consecutive batches

* fixed AttributeError from DataParallel

* removed verbosity

* rotate with use_mtime=True

* removed print

* fixed interaction with gradient accumulation

* indent formatting

* distributed neflo counting

* fixed typo

* fixed typo

* mean distributed losses

* exporting log history

* moved a few functions

* floating_point_ops clarification for transformers with parameter-reuse

* code quality

* double import

* made flo estimation more task-agnostic

* only logging flos if computed

* code quality

* unused import

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Sylvain review

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* black

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-08 10:00:56 -04:00
Sylvain Gugger
d155b38d6e Funnel transformer (#6908)
* Initial model

* Fix upsampling

* Add special cls token id and test

* Formatting

* Test and fist FunnelTokenizerFast

* Common tests

* Fix the check_repo script and document Funnel

* Doc fixes

* Add all models

* Write doc

* Fix test

* Initial model

* Fix upsampling

* Add special cls token id and test

* Formatting

* Test and fist FunnelTokenizerFast

* Common tests

* Fix the check_repo script and document Funnel

* Doc fixes

* Add all models

* Write doc

* Fix test

* Fix copyright

* Forgot some layers can be repeated

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/modeling_funnel.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* Update src/transformers/modeling_funnel.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments

* Update src/transformers/modeling_funnel.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Slow integration test

* Make small integration test

* Formatting

* Add checkpoint and separate classification head

* Formatting

* Expand list, fix link and add in pretrained models

* Styling

* Add the model in all summaries

* Typo fixes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-08 08:08:08 -04:00
Stuart Mesham
25afb4ea50 fixed trainer tr_loss memory leak (#6999)
* fixed trainer tr_loss memory leak

* detached returned training loss from computation graph in the Trainer class' training_step() method

* Revert "fixed trainer tr_loss memory leak"

This reverts commit 47226e4e
2020-09-08 08:07:33 -04:00
Manuel Romero
1b76936d1a Fix typo (#6994) 2020-09-08 04:22:57 -04:00
Philipp Schmid
8235426ee8 New Community NB "Fine tune GPT-2 with Trainer class" (#7005) 2020-09-08 03:42:20 -04:00
Stas Bekman
c18f5916a0 typo (#7001)
apologies for the tiny PRs, just sending those as I find them.
2020-09-08 01:22:20 -04:00
Mehrdad Farahani
60fc03290b README for HooshvareLab/bert-fa-base-uncased (#6990)
ParsBERT v2.0 is a fine-tuned and vocab-reconstructed version of ParsBERT, and it's able to be used in other scopes!

It includes these features:
- We added some unused-vocab for use in summarization and other scopes.
- We fine-tuned the model on vast styles of writing in the Persian language.
2020-09-07 16:43:50 -04:00
Jangwon Park
90ec78b514 Add missing arguments for BertWordPieceTokenizer (#5810) 2020-09-07 08:35:41 -04:00
Lysandre Debut
77cd0e13d2 Conversion scripts shouldn't have relative imports (#6991) 2020-09-07 08:31:06 -04:00
Lysandre
1650130b0f Remove misleading docstring 2020-09-07 14:16:59 +02:00
Stas Bekman
159ef07e4c match CI's version of flake8 (#6941)
my flake8 wasn't up-to-date enough `make quality` wasn't reporting the same things CI did - this PR adds the actual required version.

Thinking more about some of these minimal versions - CI will always install afresh and thus will always run the latest version. Is there a way to tell pip to always install the latest versions of certain dependencies on `pip install -i ".[dev]"`, rather than hardcoding the minimals which quickly become outdated?
2020-09-07 08:12:25 -04:00
Abed khooli
e9d0d4c75c Create README.md (#6974) 2020-09-07 07:31:22 -04:00
Stas Bekman
848fbe1e35 [gen utils] missing else case (#6980)
* [gen utils] missing else case

1. `else` is missing - I hit that case while porting a model. Probably needs to assert there?
2. also the comment on top seems to be outdated (just vocab_size is being set there)

* typo
2020-09-07 07:28:06 -04:00
tznurmin
f7e80721eb Fixed the default number of attention heads in Reformer Configuration (#6973) 2020-09-07 12:12:22 +02:00
Richard Bownes
e20d8895bd Create README.md model card (#6964)
* Create README.md

* Add some custom prompts

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-07 06:01:40 -04:00
Stas Bekman
b4a9c95f1b [testing] add dependency: parametrize (#6958)
unittest doesn't support pytest's super-handy `@pytest.mark.parametrize`, I researched and there are many proposed workarounds, most tedious at best. If we include https://pypi.org/project/parameterized/ in dev dependencies - it will provide a very easy to write parameterization in tests. Same as pytest's fixture, plus quite a few other ways. 

Example:
```
from parameterized import parameterized
@parameterized([
    (2, 2, 4),
    (2, 3, 8),
    (1, 9, 1),
    (0, 9, 0),
])
def test_pow(base, exponent, expected):
   assert_equal(math.pow(base, exponent), expected)
```
(extra `self`var if inside a test class)

To remind the pytest style is slightly different:
```
    @pytest.mark.parametrize("test_input,expected", [("3+5", 8), ("2+4", 6), ("6*9", 42)])
    def test_eval(test_input, expected):
```
More examples here: https://pypi.org/project/parameterized

May I suggest that it will make it much easier to write some types of tests?
2020-09-07 05:50:18 -04:00
Stas Bekman
acfaad74ab [docstring] missing arg (#6933)
* [docstring] missing arg

add the missing `tie_word_embeddings` entry

* cleanup

* Update src/transformers/configuration_reformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-07 05:36:16 -04:00
Stas Bekman
c3317e1f80 typo (#6959)
there is no var `decoder_input_ids`, but there is `input_ids` for decoder :)
2020-09-07 05:16:24 -04:00
Julien Chaumond
10c6f94adc [model_card] register jplu/tf-xlm-r-ner-40-lang as multilingual 2020-09-07 05:03:40 -04:00
Lysandre Debut
9ef9c39728 Cannot index None (#6984) 2020-09-07 04:56:08 -04:00
Sylvain Gugger
08de989a0a Trainer with grad accum (#6930)
* Add warning for gradient accumulation

* Formatting
2020-09-07 04:54:00 -04:00
Julien Chaumond
d4aa7284c8 [model_card] jplu/tf-xlm-r-ner-40-lang: Fix link
cc @jplu
2020-09-07 04:33:15 -04:00
Boris Dayma
995a958dd1 feat: allow prefix for any generative model (#5885)
* feat: allow padding_text for any generative model

* docs(pipelines.py): correct typo

* Update src/transformers/pipelines.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* feat: rename padding_text to prefix

* fix: cannot tokenize empty text

* fix: pass prefix arg to pipeline

* test: add prefix to text-generetation pipeline

* style: fix style

* style: clean code and variable name more explicit

* set arg docstring to optional

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-07 03:03:45 -04:00
Sam Shleifer
ce37be9d94 [s2s] warn if --fp16 for torch 1.6 (#6977) 2020-09-06 20:41:29 -04:00
Patrick von Platen
f72fe1f31a Correct wrong spacing in README 2020-09-06 13:26:56 +02:00
Steven Liu
d31031f603 create model card for astroGPT (#6960)
* create model card for astroGPT

* Hotlink to actual image file

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-05 12:50:19 -04:00
Naveenkhasyap
56742e9f61 Create Readme.MD for KanBERTo (#6942)
* Create Readme.MD for KanBERTo

KanBERTo language model readme for Kannada language.

* Update model_cards/Naveen-k/KanBERTo/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-04 18:24:32 -04:00
Stas Bekman
48ff6d5109 [doc] remove the implied defaults to :obj:None, s/True/ :obj:`True/, etc. (#6956)
* remove the implied defaults to :obj:`None`

* fix bug in the original

* replace to :obj:`True`, :obj:`False`
2020-09-04 18:22:25 -04:00
Stas Bekman
eff274d629 typo (#6952) 2020-09-04 16:14:37 -04:00
Sam Shleifer
a4fc0c80b1 [s2s] run_eval.py parses generate_kwargs (#6948) 2020-09-04 14:19:31 -04:00
Sam Shleifer
6078b12098 [s2s] distill: --normalize_hidden --supervise_forward (#6834) 2020-09-04 14:05:56 -04:00
Stas Bekman
c5d43a872f [docstring] misc arg doc corrections (#6932)
* correct bool types

fix docstring s/int/bool/

* fix description

* fix num_labels to match reality
2020-09-04 10:09:42 -04:00
Patrick von Platen
e3990d137a fix (#6946) 2020-09-04 16:08:54 +02:00
Yih-Dar
a75e319819 Fix mixed precision issue in TF DistilBert (#6915)
* Remove hard-coded uses of float32 to fix mixed precision use in TF Distilbert

* fix style

* fix gelu dtype issue in TF Distilbert

* fix numeric overflow while using half precision
2020-09-04 14:29:57 +02:00
Sam Shleifer
e95d262f25 [s2s] support early stopping based on loss, rather than rouge (#6927) 2020-09-03 17:31:35 -04:00
Sam Shleifer
207ed8cb78 [s2s] use --eval_beams command line arg (#6926) 2020-09-03 12:42:09 -04:00
krfricke
0f360d3d1c move wandb/comet logger init to train() to allow parallel logging (#6850)
* move wandb/comet logger init to train() to allow parallel logging

* Setup wandb/comet loggers on first call to log()
2020-09-03 11:49:14 -04:00
Sam Shleifer
39ed68d597 [s2s] allow task_specific_params=summarization_xsum (#6923) 2020-09-03 11:11:40 -04:00
Sam Shleifer
5a318f075a [s2s]: script to convert pl checkpoints to hf checkpoints (#6911)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-03 09:47:00 -04:00
brett koonce
b8e4906c97 tweak tar command in readme (#6919) 2020-09-03 09:29:01 -04:00
Stefan Engl
a66db7d828 Corrected link to paper (#6905) 2020-09-03 09:23:42 -04:00
David Mark Nemeskey
55d61ce8d6 Added a link to the thesis. (#6906) 2020-09-03 09:20:03 -04:00
abdullaholuk-loodos
653a79ccad Loodos model cards had errors on "Usage" section. It is fixed. Also "electra-base-turkish-uncased" model removed from s3 and re-uploaded as "electra-base-turkish-uncased-discriminator". Its README added. (#6921)
Co-authored-by: Abdullah Oluk <abdullaholuk123@gmail.com>
2020-09-03 09:13:43 -04:00
Julien Chaumond
5a3aec90a9 [model_card] link to correctly cased piaf dataset
cc @psorianom @rachelker
2020-09-03 08:57:32 -04:00
Sylvain Gugger
722b5807d8 Template updates (#6914) 2020-09-03 04:14:58 -04:00
Antonio V Mendoza
ea2c6f1afc Adding the LXMERT pretraining model (MultiModal languageXvision) to HuggingFace's suite of models (#5793)
* added template files for LXMERT and competed the configuration_lxmert.py

* added modeling, tokization, testing, and finishing touched for lxmert [yet to be tested]

* added model card for lxmert

* cleaning up lxmert code

* Update src/transformers/modeling_lxmert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_tf_lxmert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_tf_lxmert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_lxmert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* tested torch lxmert, changed documtention, updated outputs, and other small fixes

* Update src/transformers/convert_pytorch_checkpoint_to_tf2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/convert_pytorch_checkpoint_to_tf2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/convert_pytorch_checkpoint_to_tf2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* renaming, other small issues, did not change TF code in this commit

* added lxmert question answering model in pytorch

* added capability to edit number of qa labels for lxmert

* made answer optional for lxmert question answering

* add option to return hidden_states for lxmert

* changed default qa labels for lxmert

* changed config archive path

* squshing 3 commits: merged UI + testing improvments + more UI and testing

* changed some variable names for lxmert

* TF LXMERT

* Various fixes to LXMERT

* Final touches to LXMERT

* AutoTokenizer order

* Add LXMERT to index.rst and README.md

* Merge commit test fixes + Style update

* TensorFlow 2.3.0 sequential model changes variable names

Remove inherited test

* Update src/transformers/modeling_tf_pytorch_utils.py

* Update docs/source/model_doc/lxmert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/lxmert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_lxmert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added suggestions

* Fixes

* Final fixes for TF model

* Fix docs

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-03 04:02:25 -04:00
Puneetha Pai
4ebb52afdb test_tf_common: remove un_used mixin class parameters (#6866) 2020-09-02 10:54:40 -04:00
Stas Bekman
e71f32c0ef [testing] fix ambiguous test (#6898)
Since `generate()` does:
```
        num_beams = num_beams if num_beams is not None else self.config.num_beams
```
This test fails if `model.config.num_beams > 1` (which is the case in the model I'm porting).

This fix makes the test setup unambiguous by passing an explicit `num_beams=1` to `generate()`.

Thanks.
2020-09-02 16:18:17 +02:00
Sylvain Gugger
8f2723caf0 Output attention takes an s (#6903)
* Fix output_attention -> output_attentions

* Formatting

* One unsaved file
2020-09-02 08:11:45 -04:00
Yohei Tamura
485da7222f fix error class instantiation (#6634) 2020-09-02 07:36:32 -04:00
Suraj Patil
4230d30f77 [pipelines] Text2TextGenerationPipeline (#6744)
* add Text2TextGenerationPipeline

* remove max length warning

* remove comments

* remove input_length

* fix typo

* add tests

* use TFAutoModelForSeq2SeqLM

* doc

* typo

* add the doc below TextGenerationPipeline

* doc nit

* style

* delete comment
2020-09-02 07:34:35 -04:00
Prajjwal Bhargava
6b24281229 fix typo in comments (#6838) 2020-09-02 06:55:37 -04:00
Stas Bekman
7351ef83c1 [doc] typos (#6867)
* [doc] typos

fixed typos

* Update README.md
2020-09-02 06:51:51 -04:00
Harry Wang
ee1bff06f8 minor docs grammar fixes (#6889) 2020-09-02 06:45:19 -04:00
Patrick von Platen
8abd7f69fc fix warning for position ids (#6884) 2020-09-02 06:44:51 -04:00
Parthe Pandit
7cb0572c64 Update modeling_bert.py (#6897)
outptus -> outputs in example of BertForPreTraining
2020-09-02 06:39:01 -04:00
David Mark Nemeskey
e3c55ceb8d Model card for huBERT (#6893)
* Create README.md

Model card for huBERT.

* Update README.md

lowercase h

* Update model_cards/SZTAKI-HLT/hubert-base-cc/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-02 04:50:10 -04:00
Patrick von Platen
1889e96c8c fix QA example for PT (#6890) 2020-09-02 09:53:09 +02:00
Julien Chaumond
d822ab636b [model_cards] Fix file path for flexudy/t5-base-multi-sentence-doctor 2020-09-02 00:02:40 +02:00
Rohan Rajpal
ad5fb33c9a Create README.md (#6598) 2020-09-01 17:59:15 -04:00
Rohan Rajpal
f9dadcd85b Create README.md (#6602) 2020-09-01 17:58:43 -04:00
Igli Manaj
f5d69c75f7 Update multilingual passage rereanking model card (#6788)
Fix range of possible score, add inference .
2020-09-01 17:56:19 -04:00
Tom Grek
5d820f3ca6 Model card for primer/BART-Squad2 (#6801) 2020-09-01 17:52:32 -04:00
zolekode
8b884dadc6 added model card for flexudys t5 model (#6759)
Co-authored-by: zolekode <pascal.zoleko@fau.de>
2020-09-01 17:38:55 -04:00
hakan
bff6d517cd loodos turkish model cards added (#6840) 2020-09-01 17:35:24 -04:00
Manuel Romero
502d194b95 Create README.md (#6887)
Add language meta attribute
2020-09-01 17:09:10 -04:00
Manuel Romero
d082edf216 Create README.md (#6888)
Add language meta attribute
2020-09-01 17:09:02 -04:00
Abed khooli
dacbee9a50 Create README.md (#6886)
* Create README.md

model card for  akhooli/xlm-r-large-arabic-sent

* Update model_cards/akhooli/xlm-r-large-arabic-sent/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-01 17:06:15 -04:00
Abed khooli
e2971e61bd Create README.md (#6885) 2020-09-01 16:57:48 -04:00
Patrick von Platen
4d1a3ffde8 [EncoderDecoder] Add xlm-roberta to encoder decoder (#6878)
* finish xlm-roberta

* finish docs

* expose XLMRobertaForCausalLM
2020-09-01 21:56:39 +02:00
Patrick von Platen
311992630c Create README.md (#6883)
* Create README.md

* Update README.md
2020-09-01 19:24:45 +02:00
Jin Young (Daniel) Sohn
21d719238c Add cache_dir to save features TextDataset (#6879)
* Add cache_dir to save features TextDataset

This is in case the dataset is in a RO filesystem, for which is the case
in tests (GKE TPU tests).

* style
2020-09-01 11:42:17 -04:00
Lysandre Debut
1461aac8d7 Update docs stable version 2020-09-01 11:02:24 -04:00
Lysandre
3726754a6c v3.1.0 documentation 2020-09-01 14:39:07 +02:00
Lysandre
4b3ee9cbc5 Release: v3.1.0 2020-09-01 14:27:52 +02:00
Patrick von Platen
afc4ece462 [Generate] Facilitate PyTorch generate using ModelOutputs (#6735)
* fix generate for GPT2 Double Head

* fix gpt2 double head model

* fix  bart / t5

* also add for no beam search

* fix no beam search

* fix encoder decoder

* simplify t5

* simplify t5

* fix t5 tests

* fix BART

* fix transfo-xl

* fix conflict

* integrating sylvains and sams comments

* fix tf past_decoder_key_values

* fix enc dec test
2020-09-01 12:38:25 +02:00
Funtowicz Morgan
397f819615 Restore PaddingStrategy.MAX_LENGTH on QAPipeline while no v2. (#6875)
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-09-01 05:35:35 -04:00
Sam Shleifer
a32d85f0d4 delete reinit (#6862) 2020-09-01 03:43:27 -04:00
Sylvain Gugger
d5f1ffa0d8 Logging doc (#6852)
* Add logging doc

* Foamtting

* Update docs/source/main_classes/logging.rst

* Update src/transformers/utils/logging.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-01 03:16:34 -04:00
Stas Bekman
59a6a32a61 add a final report to all pytest jobs (#6861)
we had it added for one job, please add it to all pytest jobs - we need the output of what tests were run to debug the codecov issue. thank you!
2020-08-31 22:47:23 -04:00
Sam Shleifer
431ab19d7a [fix] typo in available in helper function (#6859) 2020-08-31 17:59:34 -04:00
Sam Shleifer
367235ee52 Bart can make decoder_input_ids from labels (#6758) 2020-08-31 16:16:47 -04:00
Sam Shleifer
b9772897ec [s2s] command line args for faster val steps (#6833) 2020-08-31 16:16:10 -04:00
Sam Shleifer
8af1970e45 Fix marian slow test (#6854) 2020-08-31 16:10:43 -04:00
Funtowicz Morgan
bbdba0a76d Update ONNX notebook to include section on quantization. (#6831)
* Update ONNX notebook to include section on quantization.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing ONNX team comments
2020-08-31 21:28:00 +02:00
Sylvain Gugger
a59bcefbb1 Split hp search methods (#6857)
* Split the run_hp_search by backend

* Unused import
2020-08-31 15:16:39 -04:00
krfricke
23f9611c16 Add checkpointing to Ray Tune HPO (#6747)
* Introduce HPO checkpointing for PBT

* Moved checkpoint saving

* Fixed checkpoint subdir pass

* Fixed style

* Enable/disable checkpointing, check conditions for various tune schedulers incl. PBT

* Adjust number of GPUs to number of jobs

* Avoid mode pickling in ray

* Move hp search to integrations
2020-08-31 14:38:46 -04:00
Sam Shleifer
61b7ba93f5 Marian distill scripts + integration test (#6799) 2020-08-31 13:48:26 -04:00
Jin Young (Daniel) Sohn
02d09c8fcc Only access loss tensor every logging_steps (#6802)
* Only access loss tensor every logging_steps

* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards

* Fix style (#6803)

* t5 model should make decoder_attention_mask (#6800)

* [s2s] Test hub configs in self-scheduled CI (#6809)

* [s2s] round runtime in run_eval (#6798)

* Pegasus finetune script: add --adafactor (#6811)

* [bart] rename self-attention -> attention (#6708)

* [tests] fix typos in inputs (#6818)

* Fixed open in colab link (#6825)

* Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827)

* BR_BERTo model card (#6793)

* clearly indicate shuffle=False (#6312)

* Clarify shuffle

* clarify shuffle

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>

* [s2s README] Add more dataset download instructions (#6737)

* Style

* Patch logging issue

* Set default logging level to `WARNING` instead of `INFO`

* TF Flaubert w/ pre-norm (#6841)

* Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644)

* add datacollator and dataset for next sentence prediction task

* bug fix (numbers of special tokens & truncate sequences)

* bug fix (+ dict inputs support for data collator)

* add padding for nsp data collator; renamed cached files to avoid conflict.

* add test for nsp data collator

* Style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* Fix in Adafactor docstrings (#6845)

* Fix resuming training for Windows (#6847)

* Only access loss tensor every logging_steps

* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards

* comments

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Thomas Ashish Cherian <6967017+PandaWhoCodes@users.noreply.github.com>
Co-authored-by: Zane Lim <zyuanlim@gmail.com>
Co-authored-by: Rodolfo De Nadai <rdenadai@gmail.com>
Co-authored-by: xujiaze13 <37360975+xujiaze13@users.noreply.github.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Huang Lianzhe <hlz@pku.edu.cn>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-31 11:35:51 -04:00
Sylvain Gugger
c48546c7f7 Fix resuming training for Windows (#6847) 2020-08-31 11:02:30 -04:00
Sylvain Gugger
d2f9cb838e Fix in Adafactor docstrings (#6845) 2020-08-31 10:52:47 -04:00
Huang Lianzhe
2de7ee0385 Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644)
* add datacollator and dataset for next sentence prediction task

* bug fix (numbers of special tokens & truncate sequences)

* bug fix (+ dict inputs support for data collator)

* add padding for nsp data collator; renamed cached files to avoid conflict.

* add test for nsp data collator

* Style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-31 08:25:00 -04:00
Lysandre Debut
895d394669 TF Flaubert w/ pre-norm (#6841) 2020-08-31 04:53:20 -04:00
Lysandre
4561f05c5f Set default logging level to WARNING instead of INFO 2020-08-31 09:56:25 +02:00
Lysandre
05c3214153 Patch logging issue 2020-08-31 09:37:08 +02:00
Sam Shleifer
dfa10a41ba [s2s README] Add more dataset download instructions (#6737) 2020-08-30 16:29:24 -04:00
xujiaze13
32fe44086c clearly indicate shuffle=False (#6312)
* Clarify shuffle

* clarify shuffle

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-30 19:26:10 +08:00
Rodolfo De Nadai
0eecaceac7 BR_BERTo model card (#6793) 2020-08-30 19:02:46 +08:00
Zane Lim
d176aaad7f Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827) 2020-08-30 18:21:49 +08:00
Thomas Ashish Cherian
a5847619e3 Fixed open in colab link (#6825) 2020-08-30 18:21:00 +08:00
Stas Bekman
563485bf95 [tests] fix typos in inputs (#6818) 2020-08-30 18:19:57 +08:00
Sam Shleifer
22933e661f [bart] rename self-attention -> attention (#6708) 2020-08-29 18:03:08 -04:00
Sam Shleifer
0f58903bb6 Pegasus finetune script: add --adafactor (#6811) 2020-08-29 17:43:32 -04:00
Sam Shleifer
ac47458a02 [s2s] round runtime in run_eval (#6798) 2020-08-29 17:36:31 -04:00
Sam Shleifer
5ab21b072f [s2s] Test hub configs in self-scheduled CI (#6809) 2020-08-28 17:05:52 -04:00
Sam Shleifer
3cac867fac t5 model should make decoder_attention_mask (#6800) 2020-08-28 15:22:33 -04:00
Sam Shleifer
20f7786453 Fix style (#6803) 2020-08-28 15:02:25 -04:00
Sam Shleifer
9336086ab5 prepare_seq2seq_batch makes labels/ decoder_input_ids made later. (#6654)
* broken test

* batch parity

* tests pass

* boom boom

* boom boom

* split out bart tokenizer tests

* fix tests

* boom boom

* Fixed dataset bug

* Fix marian

* Undo extra

* Get marian working

* Fix t5 tok tests

* Test passing

* Cleanup

* better assert msg

* require torch

* Fix mbart tests

* undo extra decoder_attn_mask change

* Fix import

* pegasus tokenizer can ignore src_lang kwargs

* unused kwarg test cov

* boom boom

* add todo for pegasus issue

* cover one word translation edge case

* Cleanup

* doc
2020-08-28 11:15:17 -04:00
RafaelWO
cb276b41de Transformer-XL: Improved tokenization with sacremoses (#6322)
* Improved tokenization with sacremoses

 * The TransfoXLTokenizer is now using sacremoses for tokenization
 * Added tokenization of comma-separated and floating point numbers.
 * Removed prepare_for_tokenization() from tokenization_transfo_xl.py because punctuation is handled by sacremoses
 * Added corresponding tests
 * Removed test comapring TransfoXLTokenizer and TransfoXLTokenizerFast
 * Added deprecation warning to TransfoXLTokenizerFast

* isort change

Co-authored-by: Teven <teven.lescao@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-08-28 09:56:17 -04:00
Ahmed Elnaggar
930153e7d2 Add ProtBert model card (#6764) 2020-08-28 12:12:28 +08:00
Stas Bekman
743d131d76 [style] set the minimal required version for black (#6784)
`make style` with `black` < 20.8b1 is a no go (in case some other package forced a lower version) - so make it explicit to avoid confusion
2020-08-28 11:38:09 +08:00
Sam Shleifer
fb78a90d6a PL: --adafactor option (#6776) 2020-08-27 22:19:46 -04:00
Stas Bekman
92ac2fa7d1 [transformers-cli] fix logger getter (#6777) 2020-08-27 20:01:17 -04:00
Lysandre
42fddacd1c Format 2020-08-27 18:31:51 +02:00
Stas Bekman
70fccc5cf3 new Makefile target: docs (#6510)
* [doc] multiple corrections to "Summary of the tasks"

* add a new "docs" target to validate docs and document it

* fix mixup
2020-08-27 12:25:16 -04:00
Stas Bekman
dbfe34f2f5 [test schedulers] adjust to test the first step's reading (#6429)
* [test schedulers] small improvement

* cleanup
2020-08-27 12:23:28 -04:00
Stas Bekman
e6b811f0a7 [testing] replace hardcoded paths to allow running tests from anywhere (#6523)
* [testing] replace hardcoded paths to allow running tests from anywhere

* fix the merge conflict
2020-08-27 12:22:18 -04:00
Sam Shleifer
9d1b4db2aa add nlp install (#6767) 2020-08-27 11:08:14 -04:00
Tom Grek
c225e872ed Fix it to work with BART (#6756) 2020-08-27 09:04:50 -04:00
Lysandre
0d2c111a0c Format 2020-08-27 14:56:47 +02:00
Julien Plu
6f289dc97a Fix the TF Trainer gradient accumulation and the TF NER example (#6713)
* Align TF NER example over the PT one

* Fix Dataset call

* Fix gradient accumulation training

* Apply style

* Address Sylvain's comments

* Address Sylvain's comments

* Apply style
2020-08-27 08:45:34 -04:00
Lysandre Debut
41aa2b4ef1 Adafactor docs (#6765) 2020-08-27 05:16:50 -04:00
Nikolai Yakovenko
971d1802d0 Add AdaFactor optimizer from fairseq (#6722)
* AdaFactor optimizer ported from fairseq. Tested for T5 finetuning and MLM -- reduced memory consumption compared to ADAM.

* update PR fixes, add basic test

* bug -- incorrect params in test

* bugfix -- import Adafactor into test

* bugfix -- removed accidental T5 include

* resetting T5 to master

* bugfix -- include Adafactor in __init__

* longer loop for adafactor test

* remove double error class declare

* lint

* black

* isort

* Update src/transformers/optimization.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* single docstring

* Cleanup docstring

Co-authored-by: Nikolai Y <nikolai.yakovenko@point72.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-27 04:58:13 -04:00
Sam Shleifer
4bd7be9a42 s2s distillation uses AutoModelForSeqToSeqLM (#6761) 2020-08-26 23:25:11 -04:00
Ahmed Elnaggar
05e7150a53 create ProtBert-BFD model card. (#6724) 2020-08-27 02:19:19 +02:00
Sam Shleifer
61518e2df3 [s2s] run_eval.py QOL improvements and cleanup(#6746) 2020-08-26 18:59:20 -04:00
Igli Manaj
434936f34a Model Card for Multilingual Passage Reranking BERT (#6755) 2020-08-26 18:00:27 -04:00
Joe Davison
10a34501f1 add __init__.py to utils (#6754) 2020-08-26 23:51:10 +02:00
Ali Safaya
61b9ed8074 Model card for kuisailab/albert-large-arabic (#6730)
* Create README.md

* Update README.md
2020-08-26 17:27:56 -04:00
Ali Safaya
8e0d51e4f2 Model card for kuisailab/albert-xlarge-arabic (#6731)
* Create README.md

* Update README.md
2020-08-26 17:27:42 -04:00
Ali Safaya
70c96a10e9 Model card for kuisailab/albert-base-arabic (#6729)
* Create README.md

* Update README.md
2020-08-26 17:27:34 -04:00
Sagor Sarker
cc4ba79f68 added model card for codeswitch-spaeng-sentiment-analysis-lince (#6727)
* added model card for codeswitch-spaeng-sentiment-analysis-lince model also update other model card

* fixed typo

* fixed typo

* fixed typo

* fixed typo

* fixed typo

* fixed typo

* fixed typo

* Update README.md
2020-08-26 17:26:32 -04:00
Tanmay Thakur
e10fb9cbe6 Create model card for lordtt13/COVID-SciBERT (#6718) 2020-08-26 17:22:25 -04:00
Adam Montgomerie
baeba53e88 Adding model cards for 5 models (#6703)
* Added model cards for 4 models

Added model cards for:
- roberta-base-bulgarian
- roberta-base-bulgarian-pos
- roberta-small-bulgarian
- roberta-small-bulgarian-pos

* fixed link text

* Update README.md

* Create README.md

* removed trailing bracket

* Add language metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-26 17:20:55 -04:00
Julien Chaumond
3242e4d942 [model_cards] Fix tiny typos 2020-08-26 23:16:06 +02:00
Joe Davison
99407f9d1e add xlm-roberta-large-xnli model card (#6723)
* add xlm-roberta-large-xnli model card

* update pt example

* typo
2020-08-26 16:05:59 -04:00
Patrick von Platen
858b7d5873 [TF Longformer] Improve Speed for TF Longformer (#6447)
* add tf graph compile tests

* fix conflict

* remove more tf transpose statements

* fix conflicts

* fix comment typos

* move function to class function

* fix black

* fix black

* make style
2020-08-26 14:55:41 -04:00
Lysandre
a75c64d80c Black 20 release 2020-08-26 17:20:22 +02:00
Lysandre
e78c110338 isort 5 2020-08-26 17:13:49 +02:00
Julien Plu
02e8cd5584 Fix optimizer (#6717) 2020-08-26 11:12:44 -04:00
Lysandre Debut
77abd1e79f Centralize logging (#6434)
* Logging

* Style

* hf_logging > utils.logging

* Address @thomwolf's comments

* Update test

* Update src/transformers/benchmark/benchmark_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Revert bad change

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-26 11:10:36 -04:00
Jay Yip
461ae86812 Fix tf boolean mask in graph mode (#6741) 2020-08-26 05:15:35 -04:00
Patrick von Platen
925f34bbbd Add "tie_word_embeddings" config param (#6692)
* add tie_word_embeddings

* correct word embeddings in modeling utils

* make style

* make config param only relevant for torch

* make style

* correct typo

* delete deprecated arg in transo-xl
2020-08-26 04:58:21 -04:00
Patrick von Platen
fa8ee8e855 fix torchscript docs (#6740) 2020-08-26 04:51:56 -04:00
Sylvain Gugger
64c7c2bc15 Install nlp for github actions test (#6728) 2020-08-25 14:58:38 -04:00
Sam Shleifer
624495706c T5Tokenizer adds EOS token if not already added (#5866)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-25 14:56:08 -04:00
Sam Shleifer
e11d923bfc Fix pegasus-xsum integration test (#6726) 2020-08-25 14:06:28 -04:00
Tomo Lazovich
7e6397a7d8 [squad] make examples and dataset accessible from SquadDataset object (#6710)
* [squad] make examples and dataset accessible from SquadDataset object

* [squad] add support for legacy cache files
2020-08-25 13:32:56 -04:00
Funtowicz Morgan
ac9702c284 Fix ONNX test_quantize unittest (#6716) 2020-08-25 13:24:40 -04:00
Zane Lim
074340339a Create README.md (#6721)
add model card for singbert large
2020-08-26 00:11:24 +08:00
Patrick von Platen
d17cce2270 add missing keys (#6719) 2020-08-25 11:38:51 -04:00
Arnav Sharma
a25c9fc8e1 Selected typo fix (#6687) 2020-08-25 15:39:02 +02:00
Funtowicz Morgan
625318f525 tensor.nonzero() is deprecated in PyTorch 1.6 (#6715)
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-08-25 08:12:54 -04:00
Sylvain Gugger
124c3d6adc Add tokenizer to Trainer (#6689) 2020-08-25 07:47:09 -04:00
Sylvain Gugger
abc0202194 More tests to Trainer (#6699)
* More tests to Trainer

* Add warning in the doc
2020-08-25 07:07:36 -04:00
Sylvain Gugger
f5bad031bc Use generators tqdm progressbars (#6696) 2020-08-25 07:06:58 -04:00
Sam Shleifer
a99d09c6f9 add new line to make examples run (#6706) 2020-08-25 06:26:29 -04:00
Joel Hanson
4db2fa77d7 Allow tests in examples to use cuda or fp16,if they are available (#5512)
* Allow tests in examples to use cuda or fp16,if they are available

The tests in examples didn't use the cuda or fp16 even if they where available.
- The text classification example (`run_glue.py`) didn't use the fp16 even if it was available but
  the device was take based on the availablity(cuda/cpu).
- The language-modeling example (`run_language_modeling.py`) was having `--no_cuda` argument
  which made the test to work without cuda. This example is having issue when running with fp16
  thus it not enabled (got an assertion error for perplexity due to it higher value).
- The cuda and fp16 is not enabled for question-answering example (`run_squad.py`) as it is having a
  difference in the f1 score.
- The text-generation example (`run_generation.py`) will take the cuda or fp16 whenever it is available.

Resolves some of: #5057

* Unwanted import of is_apex_available was removed

* Made changes to test examples file to have the pass --fp16 only if cuda and apex is avaliable
- run_glue.py: Removed the check for cuda and fp16.
- run_generation.py: Removed the check for cuda and fp16 also removed unwanted flag creation.

* Incorrectly sorted imports fixed

* The model needs to be converted to half precision

* Formatted single line if condition statement to multiline

* The torch_device also needed to be checked before running the test on examples
- The tests in examples which uses cuda should also depend from the USE_CUDA flag,
  similarly to the rest of the test suite. Even if we decide to set USE_CUDA to
  True by default, setting USE_CUDA to False should result in the examples not using CUDA

* Format some of the code in test_examples file

* The improper import of is_apex_available was sorted

* Formatted the code to keep the style standards

* The comma at the end of list giving a flake8 issue was fixed

* Import sort was fixed

* Removed the clean_test_dir function as its not used right now
2020-08-25 06:02:07 -04:00
Yohei Tamura
841f071569 Add typing.overload for convert_ids_tokens (#6637)
* add overload for type checker

* black
2020-08-25 04:57:08 -04:00
Quentin Lhoest
0f16dd0ac2 Add DPR to models summary (#6690)
* add dpr to models summary

* minor

* minor

* Update docs/source/model_summary.rst

qa -> question answering

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_summary.rst

qa -> question ansering (cont'd)

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-25 09:57:28 +02:00
Jay
4fca874ea9 Remove hard-coded uses of float32 to fix mixed precision use (#6648) 2020-08-25 15:42:32 +08:00
Sam Shleifer
0344428f79 [s2s] round bleu, rouge to 4 digits (#6704) 2020-08-25 00:33:11 -04:00
Zane Lim
b6512d2357 Add model card for singbert. (#6674)
* Add model card for singbert.

Adding a model card for singbert- bert for singlish and manglish.

* Update README.md

Add additional tags and model name.

* Update README.md

Fix tag for malay.

* Update model_cards/zanelim/singbert/README.md

Fix language

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>

* Add examples and custom widget input.

Add examples and custom widget input.

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-25 10:09:13 +08:00
Sylvain Gugger
d20cbb886b Fix hyperparameter_search doc (#6695) 2020-08-24 21:04:08 -04:00
Sam Shleifer
0ebc9699fa [fixdoc] Add import to pegasus usage doc (#6698) 2020-08-24 15:54:57 -04:00
Sylvain Gugger
6b4c617666 Move unused args to kwargs (#6694) 2020-08-24 13:20:03 -04:00
Stas Bekman
912a21ec78 remove BartForConditionalGeneration.generate (#6659)
As suggested here: https://github.com/huggingface/transformers/issues/6651#issuecomment-678594233
this removes generic `generate` doc with examples not-relevant to bart.
2020-08-25 00:42:34 +08:00
Stas Bekman
a8d6716ecb Create PULL_REQUEST_TEMPLATE.md (#6660)
* Create PULL_REQUEST_TEMPLATE.md

Proposing to copy this neat feature from pytorch. This is a small template that let's a PR submitter tell which issue that PR closes.

* Update .github/PULL_REQUEST_TEMPLATE.md

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-25 00:30:38 +08:00
Sylvain Gugger
8f98faf934 Lat fix for Ray HP search (#6691) 2020-08-24 12:15:00 -04:00
Sylvain Gugger
3a7fdd3f52 Add hyperparameter search to Trainer (#6576)
* Add optuna hyperparameter search to Trainer

* @julien-c suggestions

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Make compute_objective an arg function

* Formatting

* Rework to make it easier to add ray

* Formatting

* Initial support for Ray

* Formatting

* Polish and finalize

* Add trial id to checkpoint with Ray

* Smaller default

* Use GPU in ray if available

* Formatting

* Fix test

* Update install instruction

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Address review comments

* Formatting post-merge

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-08-24 11:48:45 -04:00
vblagoje
dd522da004 Fix PL token classification examples (#6682) 2020-08-24 11:30:06 -04:00
Sylvain Gugger
a573777901 Update repo to isort v5 (#6686)
* Run new isort

* More changes

* Update CI, CONTRIBUTING and benchmarks
2020-08-24 11:03:01 -04:00
Teven
d329c9b05d Fixed DataCollatorForLanguageModeling not accepting lists of lists (#6685)
* Fixed DataCollatorForLanguageModeling + PermutationLanguageModeling not accepting lists of lists

* Update data_collator.py

* black was grumpy
2020-08-24 15:31:44 +02:00
sgugger
0a850d210e Missing commit 2020-08-24 09:23:06 -04:00
Sylvain Gugger
b30879fe0c Don't reset the dataset type + plug for rm unused columns (#6683)
* Don't reset the type of the dataset

* Formatting

* Update trainer.py

Co-authored-by: Teven <teven.lescao@gmail.com>
2020-08-24 09:22:03 -04:00
Jared T Nielsen
1a779ad7ec Specify config filename (#6626) 2020-08-24 07:27:58 -04:00
Sagor Sarker
a622705ef3 added multiple model_cards for below models (#6666)
* Create README.md

* Update README.md

* Create README.md

* Update README.md

* added multiple codeswitch model
2020-08-24 05:08:32 -04:00
Patrick von Platen
16e38940bd Add Roberta2Roberta shared 2020-08-23 17:02:22 +02:00
Sam Shleifer
f230a64094 new paper bibtex (#6656) 2020-08-23 10:03:41 -04:00
Patrick von Platen
f235ee2164 Add Roberta2Roberta model card 2020-08-23 10:01:58 +02:00
Sagor Sarker
068df740bd added model_card for model codeswitch-hineng-lid-lince and codeswitch-spaeng-lid-lince (#6663)
* Create README.md

* Update README.md

* Create README.md

* Update README.md
2020-08-22 12:13:21 -04:00
Patrick von Platen
97bb2497ab Correct bug in bert2bert-cnn_dailymail
Model was trained with the wrong tokenizer. Retrained with correct tokenizer - thanks for spotting @lhoestq !
2020-08-22 13:44:20 +02:00
Manuel Romero
0f94151dc7 Add model card for electricidad-base-generator (#6650)
I works like a charm!
Look at the output of the example code!
2020-08-21 14:18:15 -04:00
Suraj Patil
cbda72932c [Doc model summary] add MBart model summary (#6649) 2020-08-21 13:42:59 -04:00
Patrick von Platen
9e8c494da7 Add T5-11B disclaimer
@julien-c
2020-08-21 18:11:18 +02:00
Patrick von Platen
a4db4e3032 [Docs model summaries] Add pegasus to docs (#6640)
* add pegasus to docs

* Update docs/source/model_summary.rst
2020-08-21 16:22:10 +02:00
Suraj Patil
d0e42a7bed CamembertForCausalLM (#6577)
* added CamembertForCausalLM

* add in __init__ and auto model

* style

* doc
2020-08-21 13:52:54 +02:00
josephrocca
bdf7e5de92 Remove accidental comment (#6629) 2020-08-21 05:07:32 -04:00
Manuel Romero
efc7460553 model card for Spanish electra base (#6633) 2020-08-21 05:04:29 -04:00
Morgan Funtowicz
b105f2c6b3 Update ONNX doc to match the removal of --optimize argument.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-08-21 10:37:09 +02:00
Sylvain Gugger
e5f452275b Trainer automatically drops unused columns in nlp datasets (#6449)
* Add a classmethod to easily build a Trainer from nlp dataset and metric

* Fix docstrings

* Split train/eval

* Formatting

* Log dropped columns + docs

* Authorize callable activations

* Poc for auto activation

* Be framework-agnostic

* Formatting

* Remove class method

* Remove unnecessary code
2020-08-20 16:29:14 -04:00
Sam Shleifer
5bf4465e6c Regression test for pegasus bugfix (#6606) 2020-08-20 15:34:43 -04:00
sgugger
86c07e634f One last threshold to raise 2020-08-20 14:23:09 -04:00
Sylvain Gugger
e8af90c052 Move threshold up for flaky test with Electra (#6622)
* Move threshold up for flaky test with Electra

* Update above as well
2020-08-20 13:59:40 -04:00
Ivan Dolgov
953958372a XLNet Bug when training with apex 16-bit precision (#6567)
* xlnet fp16 bug fix

* comment cast added

* Update modeling_xlnet.py

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-21 01:34:23 +08:00
Patrick von Platen
505f2d749e [Tests] fix attention masks in Tests (#6621)
* fix distilbert

* fix typo
2020-08-20 13:23:47 -04:00
Denisa Roberts
c9454507cf Add tests for Reformer tokenizer (#6485) 2020-08-20 18:58:44 +02:00
Joe Davison
f9d280a959 TFTrainer dataset doc & fix evaluation bug (#6618)
* TFTrainer dataset doc & fix evaluation bug

discussed in #6551

* add docstring to test/eval datasets
2020-08-20 12:11:36 -04:00
Sylvain Gugger
573bdb0a5d Add tests to Trainer (#6605)
* Add tests to Trainer

* Test if removing long breaks everything

* Remove ugly hack

* Fix distributed test

* Use float for number of epochs
2020-08-20 11:13:50 -04:00
Joe Davison
039d8d65fc add intro to nlp lib & dataset links to custom datasets tutorial (#6583)
* add intro to nlp lib + links

* unique links...
2020-08-20 10:32:51 -04:00
sgugger
b3e54698dd Fix CI 2020-08-20 08:34:02 -04:00
Prajjwal Bhargava
33bf426498 removed redundant arg in prepare_inputs (#6614)
* removed redundant arg in prepare_inputs

* made same change in prediction_loop
2020-08-20 08:23:35 -04:00
Romain Rigaux
cabfdfafc0 Docs copy button misses ... prefixed code (#6518)
Tested in a local build of the docs.

e.g. Just above https://huggingface.co/transformers/task_summary.html#causal-language-modeling

Copy will copy the full code, e.g.

for token in top_5_tokens:
     print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))

Instead of currently only:

for token in top_5_tokens:


>>> for token in top_5_tokens:
...     print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help reduce our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help increase our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help decrease our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help offset our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help improve our carbon footprint.

Docs for the option fix:
https://sphinx-copybutton.readthedocs.io/en/latest/
2020-08-20 17:35:06 +08:00
Stas Bekman
61b5ee11e3 lighter 'make test' (#6512) 2020-08-20 17:24:25 +08:00
Siddharth Jain
3c3c46f563 Typo fix in 04-onnx-export (#6595) 2020-08-20 16:17:16 +08:00
Oren Amsalem
93c5c9a528 [cleanup] remove confusing newline (#6603) 2020-08-20 00:33:36 -04:00
Sylvain Gugger
18ca0e9140 Fix #6575 (#6596) 2020-08-19 13:04:33 -04:00
Suraj Patil
7581884dee [BartTokenizerFast] add prepare_seq2seq_batch (#6543) 2020-08-19 10:37:48 -04:00
Patrick von Platen
8bcceaceff fix model outputs test (#6593) 2020-08-19 16:18:51 +02:00
Sam Shleifer
9a86321b11 tf generation utils: remove unused kwargs (#6591) 2020-08-19 09:37:45 -04:00
Pradhy729
2a7402cbd3 Feed forward chunking others (#6365)
* Feed forward chunking for Distilbert & Albert

* Added ff chunking for many other models

* Change model signature

* Added chunking for XLM

* Cleaned up by removing some variables.

* remove test_chunking flag

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-08-19 14:31:10 +02:00
Patrick von Platen
fe0b85e77a [EncoderDecoder] Add functionality to tie encoder decoder weights (#6538)
* start adding tie encoder to decoder functionality

* finish model tying

* make style

* Apply suggestions from code review

* fix t5 list including cross attention

* apply sams suggestions

* Update src/transformers/modeling_encoder_decoder.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add max depth break point

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-19 14:23:45 +02:00
Sam Shleifer
ab42d74850 Fix bart base test (#6587) 2020-08-18 21:28:10 -04:00
Sam Shleifer
1529bf9680 add BartConfig.force_bos_token_to_be_generated (#6526)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-18 19:15:50 -04:00
Patrick von Platen
974bb4af26 [Model card] Bert2GPT2 EncoderDecoder model (#6569)
* Bert2GPT2 EncoderDecoder model

* Update README.md
2020-08-18 19:28:17 +02:00
Suraj Patil
6f972e1423 update xnli-mt url (#6580) 2020-08-18 13:10:47 -04:00
Suraj Patil
fb6844aff5 [Pegasus Doc] minor typo (#6579)
Minor typo correction
@sshleifer
2020-08-18 12:47:47 -04:00
Manuel Romero
aaab9ab187 Create README.md (#6556) 2020-08-18 12:43:20 -04:00
Manuel Romero
1dfce0f08a Create README.md (#6557) 2020-08-18 12:42:14 -04:00
Romain Rigaux
7516bcf273 [docs] Fix number of 'ug' occurrences in tokenizer_summary (#6574) 2020-08-18 10:23:25 -04:00
Romain Rigaux
5a5af22ed5 [docs] Fix wrong newline in the middle of a paragraph (#6573) 2020-08-18 10:22:43 -04:00
Stas Bekman
7659a8eb37 fix incorrect codecov reports (#6553)
As discussed at https://github.com/huggingface/transformers/issues/6317 codecov currently sends an invalid report when it fails to find a code coverage report for the base it checks against, so this gets fixed by:

-  require_base: yes        # don't report if there is no base coverage report

let's add this for clarity, this supposedly is already the default.

-  require_head: yes        # don't report if there is no head coverage report 

and perhaps no point reporting on doc changes as they don't make any difference and it just generates noise:

-  require_changes: true    # only comment if there was change in coverage
2020-08-18 10:21:13 -04:00
Stefan Schweter
cfa26d2b41 github: add @stefan-it to bug-report template for all token-classification related bugs (#6489) 2020-08-18 08:38:54 -04:00
Philip May
1fdf372f8c Small typo fixes for model card: electra-base-german-uncased (#6555)
* Update README.md

* Update model_cards/german-nlp-group/electra-base-german-uncased/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-18 08:21:52 -04:00
Ali Modarressi
5a81195ea9 Fixed label datatype for STS-B (#6492)
* fixed label datatype for sts-b

* naming update

* make style

* make style
2020-08-18 08:09:39 -04:00
Sam Shleifer
12d7624199 [marian] converter supports models from new Tatoeba project (#6342) 2020-08-17 23:55:42 -04:00
Jim Regan
fb7330b30e update with #s of sentences/tokens (#6546) 2020-08-17 16:48:05 -04:00
onepointconsulting
63144701ed Added first model card (#6530)
* Added first model card

* Add metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-17 16:24:10 -04:00
Ikram Ali
98ee802023 [model_cards] Add model cards for Urduhack model (roberta-urdu-small) (#6536)
* [model_cards] roberta-urdu-small added.

* [model_cards] typo fixed.

* Tweak license format (yaml expects a simple string)

Co-authored-by: Ikram Ali <mrikram1989>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-17 16:04:29 -04:00
Jim Regan
3a302904cb [model_cards] Add a new model for Irish (#6544) 2020-08-17 15:56:56 -04:00
Julien Chaumond
07971d8b18 [model_cards] Fix yaml for cedpsam/chatbot_fr 2020-08-17 21:33:32 +02:00
Suraj Patil
407da12ef1 [T5Tokenizer] add prepare_seq2seq_batch method (#6122)
* tests
2020-08-17 13:57:19 -04:00
Suraj Patil
c9564f5343 [Doc] add more MBart and other doc (#6490)
* add mbart example

* add Pegasus and MBart in readme

* typo

* add MBart in Pretrained models

* add pre-proc doc

* add DPR in readme

* fix indent

* doc fix
2020-08-17 12:30:26 -04:00
Stas Bekman
f68c873100 replace _ with __ rst links (#6541) 2020-08-17 12:27:02 -04:00
sgugger
7ca6ab67fc Fix CI 2020-08-17 12:20:40 -04:00
Stas Bekman
b732e7e111 [doc] multiple corrections to "Summary of the tasks" (#6509)
* [doc] multiple corrections to "Summary of the tasks"

* fix indentation

* correction

* fix links, add links to examples/seq2seq/README.md instead of non-existing script
2020-08-17 11:49:16 -04:00
Suraj Patil
2a77813d53 [BartTokenizer] add prepare s2s batch (#6212)
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
2020-08-17 11:44:46 -04:00
Stas Bekman
84d33317ae [doc] make the text more readable, fix some typos, add some disambiguation (#6508)
* [doc] make the text more readable, fix some typos, add some disambiguation

* Update docs/source/glossary.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-17 11:07:58 -04:00
Joe Davison
d0c2389f48 add custom datasets tutorial (#6466)
* add custom datasets tutorial

* python -> bash code blocks

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* minor review feedback changes

* add working native QA snippet

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-17 09:15:34 -04:00
Sam Shleifer
d2da2cb232 allow spaces in bash args with "$@" (#6521) 2020-08-17 09:06:35 -04:00
Funtowicz Morgan
b41cc0b86a Fix flaky ONNX tests (#6531) 2020-08-17 09:04:35 -04:00
Stas Bekman
39c3b1d9de [sched] polynomial_decay_schedule use default power=1.0 (#6473) 2020-08-17 08:33:12 -04:00
Stas Bekman
9dbe4094f2 [testing] a new TestCasePlus subclass + get_auto_remove_tmp_dir() (#6494)
* [testing] switch to a new TestCasePlus + get_auto_remove_tmp_dir() for auto-removal of tmp dirs

* respect after=True for tempfile, simplify code

* comments

* comment fix

* put `before` last in args, so can make debug even faster
2020-08-17 08:12:19 -04:00
Patrick von Platen
36010cb1e2 fix pegasus doc (#6533) 2020-08-17 12:24:43 +02:00
Kevin Canwen Xu
37709b5909 Remove deprecated assertEquals (#6532)
`assertEquals` is deprecated: https://stackoverflow.com/questions/930995/assertequals-vs-assertequal-in-python/931011
This PR replaces these deprecated methods.
2020-08-17 17:13:58 +08:00
Stas Bekman
49d8076fa2 [doc] Summary of the models fixes (#6511)
* [doc] Summary of the models fixes

* correction
2020-08-17 16:04:53 +08:00
Cahya Wirawan
72911c893a Create model cards for indonesian models (#6522)
* added model cards for indonesian gpt2-small, bert-base and roberta-base models

* removed bibtex entries
2020-08-17 15:42:25 +08:00
Masatoshi Suzuki
48c6c6139f Support additional dictionaries for BERT Japanese tokenizers (#6515)
* Update BERT Japanese tokenizers

* Update CircleCI config to download unidic

* Specify to use the latest dictionary packages
2020-08-17 12:00:23 +08:00
Stas Bekman
423eb5b1d7 [doc] fix invalid env vars (#6504)
- remove invalid `ENV_` prefix.
- add a few ':' while at it
2020-08-17 11:11:40 +08:00
Philip May
3c72f5584b Add Model Card for electra-base-german-uncased (#6496)
* Add Model Card for electra-base-german-uncased

* Update README.md

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-17 11:02:32 +08:00
Stas Bekman
df15c7c226 typos (#6505) 2020-08-17 10:57:36 +08:00
fabiocapsouza
6d38ab1cc3 Update bert-base-portuguese-cased and bert-large-portuguese-cased model cards (#6527)
Co-authored-by: Fabio Souza <fabiosouza@neuralmind.ai>
2020-08-17 10:49:49 +08:00
Sam Shleifer
84c265ffcc [lightning_base] fix s2s logging, only make train_loader once (#6404) 2020-08-16 22:49:41 -04:00
Sam Shleifer
72add6c98f [s2s] docs, document desired filenames nicely (#6525) 2020-08-16 20:31:22 -04:00
Kyle Piira
2060181126 Fixes paths with spaces in seq2seq example (#6493) 2020-08-16 13:36:38 -04:00
Kevin Canwen Xu
fe61c05b85 Add examples/bert-loses-patience who can help (#6499) 2020-08-16 16:30:16 +08:00
Jin Young (Daniel) Sohn
24107c2c83 Fix TPU Convergence bug introduced by PR#6151 (#6488)
Currently with the bug introduced we're taking two optimizer steps per
batch: one global one, where `xm.optimizer_step` injects a CRS between
all cores in training, and one without. This has been affecting training
accuracy (for example, XLNet GLUE on MNLI is not converging, etc.).
2020-08-14 12:47:37 -04:00
Sylvain Gugger
895ed8f451 Generation doc (#6470)
* Generation doc

* MBartForConditionalGeneration (#6441)

* add MBartForConditionalGeneration

* style

* rebase and fixes

* add mbart test in TEST_FILES_WITH_NO_COMMON_TESTS

* fix docs

* don't ignore mbart

* doc

* fix mbart fairseq link

* put mbart before bart

* apply doc suggestions

* Use hash to clean the test dirs (#6475)

* Use hash to clean the test dirs

* Use hash to clean the test dirs

* Use hash to clean the test dirs

* fix

* [EncoderDecoder] Add Cross Attention for GPT2 (#6415)

* add cross attention layers for gpt2

* make gpt2 cross attention work

* finish bert2gpt2

* add explicit comments

* remove attention mask since not yet supported

* revert attn mask in pipeline

* Update src/transformers/modeling_gpt2.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_encoder_decoder.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Sort unique_no_split_tokens to make it deterministic (#6461)

* change unique_no_split_tokens's type to set

* use sorted list instead of set

* style

* Import accuracy_score (#6480)

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address comments

* Styling

* Generation doc

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address comments

* Styling

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Co-authored-by: gijswijnholds <gijswijnholds@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-08-14 09:46:39 -04:00
gijswijnholds
b5ba758ba9 Import accuracy_score (#6480) 2020-08-14 08:16:16 -04:00
Quentin Lhoest
9a8c168f56 Sort unique_no_split_tokens to make it deterministic (#6461)
* change unique_no_split_tokens's type to set

* use sorted list instead of set

* style
2020-08-14 10:36:58 +02:00
Patrick von Platen
1d6e71e116 [EncoderDecoder] Add Cross Attention for GPT2 (#6415)
* add cross attention layers for gpt2

* make gpt2 cross attention work

* finish bert2gpt2

* add explicit comments

* remove attention mask since not yet supported

* revert attn mask in pipeline

* Update src/transformers/modeling_gpt2.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_encoder_decoder.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-14 09:43:29 +02:00
Kevin Canwen Xu
eb613b566a Use hash to clean the test dirs (#6475)
* Use hash to clean the test dirs

* Use hash to clean the test dirs

* Use hash to clean the test dirs

* fix
2020-08-14 15:34:39 +08:00
Suraj Patil
680f1337c3 MBartForConditionalGeneration (#6441)
* add MBartForConditionalGeneration

* style

* rebase and fixes

* add mbart test in TEST_FILES_WITH_NO_COMMON_TESTS

* fix docs

* don't ignore mbart

* doc

* fix mbart fairseq link

* put mbart before bart

* apply doc suggestions
2020-08-14 03:21:16 -04:00
Manuel Romero
05810cd80a Fix typo (#6469) 2020-08-13 15:01:08 -04:00
Kevin Canwen Xu
7bc00569df Clean directory after script testing (#6453)
* Clean Dir after testing

* remove pabee ignore
2020-08-14 00:34:03 +08:00
Sam Shleifer
e92efcf728 Mult rouge by 100: standard units (#6359) 2020-08-13 12:15:54 -04:00
vblagoje
eda07efaa5 Add POS tagging and Phrase chunking token classification examples (#6457)
* Add more token classification examples

* POS tagging example

* Phrase chunking example

* PR review fixes

* Add conllu to third party list (used in token classification examples)
2020-08-13 12:09:51 -04:00
Suraj Patil
f51161e230 add BartTokenizerFast in AutoTokenizer (#6464)
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-08-13 12:08:11 -04:00
Suraj Patil
a442f87adc add LongformerTokenizerFast in AutoTokenizer (#6463) 2020-08-13 12:06:43 -04:00
Lysandre Debut
f7cbc13db7 Test model outputs equivalence (#6445)
* Test model outputs equivalence

* Fix failing tests

* From dict to kwargs

* DistilBERT

* Addressing @sgugger and @patrickvonplaten's comments
2020-08-13 11:59:35 -04:00
Prajjwal Bhargava
54c687e97c typo fix (#6462) 2020-08-13 09:36:48 -04:00
Zhu Baohe
9d94aecd51 Fix docs and bad word tokens generation_utils.py (#6387)
* fix

* fix2

* fix3
2020-08-13 13:12:16 +02:00
cedspam
0ed7c00ba6 Update README.md (#6435)
* Update README.md

* Update README.md

* Update README.md
2020-08-13 11:01:17 +02:00
Stas Bekman
e983da0e7d cleanup tf unittests: part 2 (#6260)
* cleanup torch unittests: part 2

* remove trailing comma added by isort, and which breaks flake

* one more comma

* revert odd balls

* part 3: odd cases

* more ["key"] -> .key refactoring

* .numpy() is not needed

* more unncessary .numpy() removed

* more simplification
2020-08-13 04:29:06 -04:00
Joe Davison
bc820476a5 add targets arg to fill-mask pipeline (#6239)
* add targets arg to fill-mask pipeline

* add tests and more error handling

* quality

* update docstring
2020-08-12 12:48:29 -04:00
Patrick von Platen
0735def8e1 [EncoderDecoder] Add encoder-decoder for roberta/ vanilla longformer (#6411)
* add encoder-decoder for roberta

* fix headmask

* apply Sylvains suggestions

* fix typo

* Apply suggestions from code review
2020-08-12 18:23:30 +02:00
zcain117
fd3de2000f Get GKE logs via kubectl logs instead of gcloud logging read. (#6446) 2020-08-12 11:46:24 -04:00
Sam Shleifer
f94a52cd79 [s2s] add BartTranslationDistiller for distilling mBART (#6363) 2020-08-12 11:41:04 -04:00
Sylvain Gugger
d2370e1bd8 Adding PaddingDataCollator (#6442)
* Data collator with padding

* Add type annotation

* Support tensors as well

* Add comment

* Fix for labels wrong shape

* Data collator with padding

* Add type annotation

* Support tensors as well

* Add comment

* Fix for labels wrong shape

* Remove changes rendered unnecessary
2020-08-12 11:32:27 -04:00
Sylvain Gugger
96c3329f19 Fix #6428 (#6437) 2020-08-12 08:47:30 -04:00
Sylvain Gugger
a8db954cda Activate check on the CI (#6427)
* Activate check on the CI

* Fix repo inconsistencies

* Don't document too much
2020-08-12 08:42:14 -04:00
Sylvain Gugger
34fabe1697 Move prediction_loss_only to TrainingArguments (#6426) 2020-08-12 08:03:45 -04:00
Sylvain Gugger
e9c3031463 Fixes to make life easier with the nlp library (#6423)
* allow using tokenizer.pad as a collate_fn in pytorch

* allow using tokenizer.pad as a collate_fn in pytorch

* Add documentation and tests

* Make attention mask the right shape

* Better test

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-08-12 08:00:56 -04:00
Stas Bekman
87b359439f [test] replace capsys with the more refined CaptureStderr/CaptureStdout (#6422)
* replace capsys with the more refined CaptureStderr/CaptureStdout

* Update examples/seq2seq/test_seq2seq_examples.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-12 07:54:28 -04:00
Jared T Nielsen
ac5bcf236e Fix FFN dropout in TFAlbertLayer, and split dropout in TFAlbertAttent… (#4323)
* Fix FFN dropout in TFAlbertLayer, and split dropout in TFAlbertAttention into two separate dropout layers.

* Same dropout fixes for PyTorch.
2020-08-12 07:52:42 -04:00
Lysandre Debut
4ffea5ce2f Disabled pabee test (#6431) 2020-08-12 02:52:50 -04:00
Rohan Rajpal
155288f04b [model_card] rohanrajpal/bert-base-codemixed-uncased-sentiment (#6324)
* Create README.md

* Update model_cards/rohanrajpal/bert-base-codemixed-uncased-sentiment/README.md

* Update model_cards/rohanrajpal/bert-base-codemixed-uncased-sentiment/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-11 18:38:18 -04:00
Manuel Romero
4e6245fc7e Create model card T5-base fine-tuned on event2Mind for Intent Prediction (#6412) 2020-08-11 18:35:27 -04:00
Manuel Romero
46e3a0a6ec Create README.md (#6381) 2020-08-11 18:34:11 -04:00
Manuel Romero
31dfde7429 Create README.md (#6378) 2020-08-11 18:32:37 -04:00
Manuel Romero
25e29150a2 Add metadata to be indexed properly (#6380) 2020-08-11 18:32:29 -04:00
Manuel Romero
471be5f279 Change metadata to be indexed correctly (#6379) 2020-08-11 18:32:18 -04:00
Rohan Rajpal
42ee0bc63d Create README.md (#6346)
* Create README.md

* add results on SAIL dataset

* Update model_cards/rohanrajpal/bert-base-multilingual-codemixed-cased-sentiment/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-11 18:31:34 -04:00
Sam Shleifer
3f071c4b6e [examples] add pytest dependency (#6425) 2020-08-11 17:58:09 -04:00
Stas Bekman
ece0903e11 lr_schedulers: add get_polynomial_decay_schedule_with_warmup (#6361)
* [wip] add get_polynomial_decay_schedule_with_warmup

* style

* add assert

* change lr_end to a much smaller default number

* check for exact equality

* [model_cards] electra-base-turkish-cased-ner (#6350)

* for electra-base-turkish-cased-ner

* Add metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Temporarily de-activate TPU CI

* Update modeling_tf_utils.py (#6372)

fix typo: ckeckpoint->checkpoint

* the test now works again (#6371)

* correct pl link in readme (#6364)

* refactor almost identical tests (#6339)

* refactor almost identical tests

* important to add a clear assert error message

* make the assert error even more descriptive than the original bt

* Small docfile fixes (#6328)

* Patch models (#6326)

* TFAlbertFor{TokenClassification, MultipleChoice}

* Patch models

* BERT and TF BERT info


s

* Update check_repo

* Ci GitHub caching (#6382)

* Cache Github Actions CI

* Remove useless file

* Colab button (#6389)

* Add colab button

* Add colab link for tutorials

* Fix links for open in colab (#6391)

* Update src/transformers/optimization.py

consistently use lr_end=1e-7 default

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [wip] add get_polynomial_decay_schedule_with_warmup

* style

* add assert

* change lr_end to a much smaller default number

* check for exact equality

* Update src/transformers/optimization.py

consistently use lr_end=1e-7 default

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove dup (leftover from merge)

* convert the test into the new refactored format

* stick to using the current_step as is, without ++

Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Alexander Measure <ameasure@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-08-11 17:56:41 -04:00
cedspam
6c87b73d6b Create README.md (#6386)
* Create README.md

* Update README.md
2020-08-11 16:56:51 -04:00
Stas Bekman
0203d6517f [pl] restore lr logging behavior for glue, ner examples (#6314) 2020-08-11 16:27:11 -04:00
Sam Shleifer
be1520d3a3 rename prepare_translation_batch -> prepare_seq2seq_batch (#6103) 2020-08-11 15:57:07 -04:00
Sam Shleifer
66fa8ceaea PegasusForConditionalGeneration (torch version) (#6340)
Co-authored-by: Jingqing  Zhang <jingqing.zhang15@imperial.ac.uk>
2020-08-11 14:31:23 -04:00
Stas Bekman
f6cb0f806e [s2s] wmt download script use less ram (#6405) 2020-08-11 12:04:17 -04:00
Stas Bekman
7c6a085ebf pl version: examples/requirements.txt is single source of truth (#6309) 2020-08-11 10:58:54 -04:00
Pranav Vadrevu
1d1d5bec1b Create Model Card File (#6357) 2020-08-11 10:36:15 -04:00
Abed khooli
00ce881c07 Create README.md (#6413)
* Create README.md

Model card for https://huggingface.co/akhooli/gpt2-small-arabic

* Update model_cards/akhooli/gpt2-small-arabic/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-11 10:35:31 -04:00
Nick Doiron
3ae30787b5 switch Hindi-BERT to S3 README (#6396) 2020-08-11 10:34:22 -04:00
Abed khooli
824e651e17 Create README.md (#6397)
* Create README.md

* Update model_cards/akhooli/gpt2-small-arabic-poetry/README.md

* Update model_cards/akhooli/gpt2-small-arabic-poetry/README.md

* Update model_cards/akhooli/gpt2-small-arabic-poetry/README.md

* Update model_cards/akhooli/gpt2-small-arabic-poetry/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-11 09:03:23 -04:00
guillaume-be
404782912a [Performance improvement] "Bad tokens ids" optimization (#6064)
* Optimized banned token masking

* Avoid duplicate EOS masking if in bad_words_id

* Updated mask generation to handle empty banned token list

* Addition of unit tests for the updated bad_words_ids masking

* Updated timeout handling in `test_postprocess_next_token_scores_large_bad_words_list` unit test

* Updated timeout handling in `test_postprocess_next_token_scores_large_bad_words_list` unit test (timeout does not work on Windows)

* Moving Marian import to the test context to allow TF only environments to run

* Moving imports to torch_available test

* Updated operations device and test

* Updated operations device and test

* Added docstring and comment for in-place scores modification

* Moving test to own test_generation_utils, use of lighter models for testing

* removed unneded imports in test_modeling_common

* revert formatting change for ModelTesterMixin

* Updated caching, simplified eos token id test, removed unnecessary @require_torch

* formatting compliance
2020-08-11 05:56:40 -04:00
David LaPalomento
87e124c245 Warn if debug requested without TPU fixes (#6308) (#6390)
* Warn if debug requested without TPU fixes (#6308)
Check whether a PyTorch compatible TPU is available before attempting to print TPU metrics after training has completed. This way, users who apply `--debug` without reading the documentation aren't suprised by a stacktrace.

* Style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-11 05:31:26 -04:00
Junyuan Zheng
cdf1f7edb2 Fix tokenizer saving and loading error (#6026)
* fix tokenizer saving and loading bugs when adding AddedToken to additional special tokens

* Add tokenizer test

* Style

* Style 2

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-11 04:49:16 -04:00
Stas Bekman
83984a61c6 testing utils: capturing std streams context manager (#6231)
* testing utils: capturing std streams context manager

* style

* missing import

* add the origin of this code
2020-08-11 03:56:47 -04:00
Stas Bekman
f6c0680d36 add pl_glue example test (#6034)
* add pl_glue example test

* for now just test that it runs, next validate results of eval or predict?

* complete the run_pl_glue test to validate the actual outcome

* worked on my machine, CI gets less accuracy - trying higher epochs

* match run_pl.sh hparms

* more epochs?

* trying higher lr

* for now just test that the script runs to a completion

* correct the comment

* if cuda is available, add --fp16 --gpus=1 to cover more bases

* style
2020-08-11 03:16:52 -04:00
Pradhy729
b25cec13c5 Feed forward chunking (#6024)
* Chunked feed forward for Bert

This is an initial implementation to test applying feed forward chunking for BERT.
Will need additional modifications based on output and benchmark results.

* Black and cleanup

* Feed forward chunking in BertLayer class.

* Isort

* add chunking for all models

* fix docs

* Fix typo

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-08-11 03:12:45 -04:00
Lysandre
8a3db6b303 Add TPU testing once again 2020-08-11 08:49:37 +02:00
zcain117
f65ac1faf2 Add missing docker arg for TPU CI. (#6393) 2020-08-11 02:48:49 -04:00
Sam Shleifer
b9ecd92ee4 [s2s] Script to save wmt data to disk (#6403) 2020-08-10 22:49:39 -04:00
Patrick von Platen
00bb0b25ed TF Longformer (#5764)
* improve names and tests longformer

* more and better tests for longformer

* add first tf test

* finalize tf basic op functions

* fix merge

* tf shape test passes

* narrow down discrepancies

* make longformer local attn tf work

* correct tf longformer

* add first global attn function

* add more global longformer func

* advance tf longformer

* finish global attn

* upload big model

* finish all tests

* correct false any statement

* fix common tests

* make all tests pass except keras save load

* fix some tests

* fix torch test import

* finish tests

* fix test

* fix torch tf tests

* add docs

* finish docs

* Update src/transformers/modeling_longformer.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_tf_longformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply Lysandres suggestions

* reverse to assert statement because function will fail otherwise

* applying sylvains recommendations

* Update src/transformers/modeling_longformer.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update src/transformers/modeling_tf_longformer.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-10 23:25:06 +02:00
Patrick von Platen
3425936643 [EncoderDecoderModel] add a add_cross_attention boolean to config (#6377)
* correct encoder decoder model

* Apply suggestions from code review

* apply sylvains suggestions
2020-08-10 19:46:48 +02:00
Sylvain Gugger
06bc347c97 Fix links for open in colab (#6391) 2020-08-10 11:16:17 -04:00
Sylvain Gugger
3e0fe3cf5c Colab button (#6389)
* Add colab button

* Add colab link for tutorials
2020-08-10 11:12:29 -04:00
Lysandre Debut
79588e6fdb Ci GitHub caching (#6382)
* Cache Github Actions CI

* Remove useless file
2020-08-10 10:39:31 -04:00
Lysandre Debut
b99098abc7 Patch models (#6326)
* TFAlbertFor{TokenClassification, MultipleChoice}

* Patch models

* BERT and TF BERT info


s

* Update check_repo
2020-08-10 10:39:17 -04:00
Sylvain Gugger
6028ed92bd Small docfile fixes (#6328) 2020-08-10 05:37:12 -04:00
Stas Bekman
1429b920d4 refactor almost identical tests (#6339)
* refactor almost identical tests

* important to add a clear assert error message

* make the assert error even more descriptive than the original bt
2020-08-10 05:31:20 -04:00
Rohit Gupta
35eb96de4d correct pl link in readme (#6364) 2020-08-10 03:08:46 -04:00
Stas Bekman
0830e79512 the test now works again (#6371) 2020-08-10 02:55:52 -04:00
Alexander Measure
3a556b0fb7 Update modeling_tf_utils.py (#6372)
fix typo: ckeckpoint->checkpoint
2020-08-10 02:55:11 -04:00
Lysandre
1bbc54a87c Temporarily de-activate TPU CI 2020-08-10 08:11:40 +02:00
M. Yusuf Sarıgöz
6e8a38568e [model_cards] electra-base-turkish-cased-ner (#6350)
* for electra-base-turkish-cased-ner

* Add metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-09 03:39:51 -04:00
Sam Shleifer
9a5ef83748 [s2s] fix --gpus clarg collision (#6358) 2020-08-08 21:51:37 -04:00
Patrick von Platen
1aec991643 [GPT2] Correct typo in docs (#6352) 2020-08-08 20:37:29 +02:00
elsanns
9f57e39f71 Add notebook on fine-tuning and interpreting Electra (#6321)
Co-authored-by: eliska <3648991+elisans@users.noreply.github.com>
2020-08-08 11:47:33 +02:00
Suraj Patil
9bed355449 [s2s] fix label_smoothed_nll_loss (#6344) 2020-08-08 04:21:12 -04:00
Sam Shleifer
99f73bcc71 [s2s] tiny QOL improvement: run_eval prints scores (#6341) 2020-08-08 02:45:55 -04:00
Stas Bekman
322dffc6c9 remove a TODO item to use a tiny model (#6338)
as discussed with @sshleifer, removing this TODO to switch to a tiny model, since it won't be able to test the results of the evaluation (i.e. the results are meaningless).
2020-08-07 21:30:39 -04:00
Sam Shleifer
1f8e826518 [CI] Self-scheduled runner also pins torch (#6332) 2020-08-07 18:40:21 -04:00
zcain117
1b8a7ffcfd Add setup for TPU CI to run every hour. (#6219)
* Add setup for TPU CI to run every hour.

* Re-organize config.yml

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-07 11:17:07 -04:00
Stas Bekman
6695450a23 [examples] consistently use --gpus, instead of --n_gpu (#6315) 2020-08-07 10:36:32 -04:00
Julien Plu
0e36e51515 Fix the tests for Electra (#6284)
* Fix the tests for Electra

* Apply style
2020-08-07 09:30:57 -04:00
Sylvain Gugger
6ba540b747 Add a script to check all models are tested and documented (#6298)
* Add a script to check all models are tested and documented

* Apply suggestions from code review

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>

* Address comments

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-07 09:18:37 -04:00
Stas Bekman
e1638dce16 fix the slow tests doc (#6167)
remove unnecessary duplication wrt `RUN_SLOW=yes`
2020-08-07 09:17:32 -04:00
Binny Mathew
7e9861f7f4 dehate-bert Model Card (#6248)
Added citation and paper links.
2020-08-07 17:51:03 +08:00
Binny Mathew
f6df6d98dd dehate-bert Model Card (#6249)
Added citation and paper links.
2020-08-07 17:48:38 +08:00
Binny Mathew
26691ecba6 dehate-bert Model Card (#6250)
Added citation and paper links.
2020-08-07 17:48:09 +08:00
Binny Mathew
60657b295c dehate-bert Model Card (#6251)
Added citation and paper links.
2020-08-07 17:47:42 +08:00
Binny Mathew
7218261991 dehate-bert Model Card (#6252)
Added citation and paper links.
2020-08-07 17:47:26 +08:00
Binny Mathew
396d227cd4 dehate-bert Model Card (#6253)
Added citation and paper links.
2020-08-07 17:47:04 +08:00
Binny Mathew
8be260f18a dehate-bert Model Card (#6254)
Added citation and paper links.
2020-08-07 17:46:27 +08:00
Binny Mathew
dce7278cdf dehate-bert Model Card (#6255)
Added citation and paper links.
2020-08-07 17:45:52 +08:00
idoh
3be2d04884 fix consistency CrossEntropyLoss in modeling_bart (#6265) 2020-08-07 17:44:28 +08:00
Lysandre
c72f9c90a1 Remove --no-cache-dir from github CI 2020-08-07 09:07:22 +02:00
Lysandre Debut
0d9328f2ef Patch GPU failures (#6281)
* Pin to 1.5.0

* Patch XLM GPU test
2020-08-07 02:58:15 -04:00
Lysandre Debut
80a0676a51 CI dependency wheel caching (#6287)
* Single workflow cache test




Remove cache dir, re-trigger cache


Only pip archives


Not sudo when pip

* All workflow cache

Remove no-cache-dir instruction


Remove last sudo occurrences


v0.3
2020-08-07 02:48:59 -04:00
Stas Bekman
175cd45e13 fix the shuffle agrument usage and the default (#6307) 2020-08-06 20:32:28 -04:00
Bhashithe Abeysinghe
ffceef2042 [Fix] text-classification PL example (#6027)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-06 15:46:43 -04:00
xujiaze13
eb2bd8d6eb Remove redundant line in run_pl_glue.py (#6305) 2020-08-06 15:43:45 -04:00
Patrick von Platen
118ecfd427 fix for pytorch < 1.6 (#6300) 2020-08-06 21:14:46 +02:00
Sam Shleifer
2804fff839 [s2s]Use prepare_translation_batch for Marian finetuning (#6293)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-06 14:58:38 -04:00
Teven
2f2aa0c89c added n_inner argument to gpt2 config (#6296) 2020-08-06 17:47:32 +02:00
Manuel Romero
0a0d53dcf8 Update model card (#6290)
Add links to RuPERTa models fine-tuned on Spanish SQUAD datasets
2020-08-06 11:42:43 -04:00
Doug Blank
b923871bb7 Adds comet_ml to the list of auto-experiment loggers (#6176)
* Support for Comet.ml

* Need to import comet first

* Log this model, not the one in the backprop step

* Log args as hyperparameters; use framework to allow fine control

* Log hyperparameters with context

* Apply black formatting

* isort fix integrations

* isort fix __init__

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer_tf.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Address review comments

* Style + Quality, remove Tensorboard import test

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-06 11:31:30 -04:00
Philip May
d5bc32ce92 Add strip_accents to basic BertTokenizer. (#6280)
* Add strip_accents to basic tokenizer

* Add tests for strip_accents.

* fix style with black

* Fix strip_accents test

* empty commit to trigger CI

* Improved strip_accents check

* Add code quality with is not False
2020-08-06 18:52:28 +08:00
JME-P
31da35cc89 Create README.md (#6273)
I am adding a descriptive README.md file to my recently uploaded twitter classification model: shrugging-grace/tweetclassifier.
2020-08-05 12:36:24 -04:00
JME-P
a8bdba232f Create README.md for uploaded classifier (#6272)
I am adding a descriptive README.md file to my recently uploaded twitter classification model: shrugging-grace/tweetclassifier.
2020-08-05 12:27:46 -04:00
HUSEIN ZOLKEPLI
a23a535c10 added t5 bahasa summarization readme (#6269) 2020-08-05 12:27:27 -04:00
Sylvain Gugger
c67d1a0259 Tf model outputs (#6247)
* TF outputs and test on BERT

* Albert to DistilBert

* All remaining TF models except T5

* Documentation

* One file forgotten

* TF outputs and test on BERT

* Albert to DistilBert

* All remaining TF models except T5

* Documentation

* One file forgotten

* Add new models and fix issues

* Quality improvements

* Add T5

* A bit of cleanup

* Fix for slow tests

* Style
2020-08-05 11:34:39 -04:00
Teven
bd0eab351a Trainer + wandb quality of life logging tweaks (#6241)
* added `name` argument for wandb logging, also logging model config with trainer arguments

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added tf, post-review changes

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-05 09:05:52 -04:00
Julien Plu
33966811bd Add SequenceClassification and MultipleChoice TF models to Electra (#6227)
* Add SequenceClassification and MultipleChoice TF models to Electra

* Apply style

* Add summary_proj_to_labels to Electra config

* Finally mirroring the PT version of these models

* Apply style

* Fix Electra test
2020-08-05 09:04:27 -04:00
Stas Bekman
376c02e9a9 [WIP] lightning_base: support --lr_scheduler with multiple possibilities (#6232)
* support --lr_scheduler with multiple possibilities

* correct the error message

* add a note about supported schedulers

* cleanup

* cleanup2

* needs the argument default

* style

* add another assert in the test

* implement requested changes

* cleanups

* fix relative import

* cleanup
2020-08-05 09:01:17 -04:00
Zhu Baohe
d89acd07cc fix (#6257) 2020-08-05 07:37:57 -04:00
Ninnart Fuengfusin
24c5a6e351 Update optimization.py (#6261) 2020-08-05 07:34:57 -04:00
Lilian Bordeau
ed6b8f3128 Update to match renamed attributes in fairseq master (#5972)
* Update to match renamed attributes in fairseq master

RobertaModel no longer have model.encoder and args.num_classes attributes as of 5/28/20.

* Quality

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-05 07:23:55 -04:00
Ali Safaya
d9149f00d1 Update README.md (#6201) 2020-08-04 17:44:14 -04:00
Ali Safaya
ddfdbb86c1 Update README.md (#6200) 2020-08-04 17:44:05 -04:00
Ali Safaya
4f67955662 Update README.md (#6199) 2020-08-04 17:43:48 -04:00
Ali Safaya
869ec441c9 Update README.md (#6198) 2020-08-04 17:43:38 -04:00
Adam Montgomerie
5177dca634 Create README.md (#6123) 2020-08-04 17:42:53 -04:00
Manuel Romero
3f30ebe6ca Create README.md (#6075) 2020-08-04 17:41:23 -04:00
Binny Mathew
aa7c22a283 Update Model Card (#6246)
Added citation and paper links.
2020-08-04 17:40:47 -04:00
Joe Davison
972535ea74 fix zero shot pipeline docs (#6245) 2020-08-04 16:37:49 -04:00
Timo Moeller
5920a37a4c Add license info to German Bert models (#6242)
* Add xlm-r QA model card

* Add tags

* Add license info to german bert
2020-08-04 13:40:49 -04:00
Patrick von Platen
6c9ba1d8fc [Reformer] Make random seed generator available on random seed and not on model device (#6244)
* improve if else statement random seeds

* Apply suggestions from code review

* Update src/transformers/modeling_reformer.py
2020-08-04 13:22:43 -04:00
Sam Shleifer
d5b0a0e235 mBART Conversion script (#6230) 2020-08-04 09:53:51 -04:00
Stas Bekman
268bf34630 typo (#6225) 2020-08-04 09:31:49 -04:00
Patrick von Platen
7f65daa2e1 fix reformer fp16 (#6237) 2020-08-04 13:02:25 +02:00
Andrés Felipe Cruz
7ea9b2db37 Encoder decoder config docs (#6195)
* Adding docs for how to load encoder_decoder pretrained model with individual config objects

* Adding docs for loading encoder_decoder config from pretrained folder

* Fixing  W293 blank line contains whitespace

* Update src/transformers/modeling_encoder_decoder.py

* Update src/transformers/modeling_encoder_decoder.py

* Update src/transformers/modeling_encoder_decoder.py

* Apply suggestions from code review

model file should only show examples for how to load save model

* Update src/transformers/configuration_encoder_decoder.py

* Update src/transformers/configuration_encoder_decoder.py

* fix space

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-08-04 09:23:28 +02:00
Lysandre Debut
1d5c3a3d96 Test with --no-cache-dir (#6235) 2020-08-04 03:20:19 -04:00
Sam Shleifer
6730ecdd3c Remove redundant coverage (#6224) 2020-08-04 02:59:21 -04:00
Stas Bekman
5deed37f9f cleanup torch unittests (#6196)
* improve unit tests

this is a sample of one test according to the request in https://github.com/huggingface/transformers/issues/5973
before I apply it to the rest

* batch 1

* batch 2

* batch 3

* batch 4

* batch 5

* style

* non-tf template

* last deletion of check_loss_output
2020-08-04 02:42:56 -04:00
Gong Linyuan
b390a5672a Make the order of additional special tokens deterministic (#5704)
* Make the order of additional special tokens deterministic regardless of hash seeds

* Fix
2020-08-04 02:38:30 -04:00
Lysandre Debut
d740351f7d Upgrade pip when doing CI (#6234)
* Upgrade pip when doing CI

* Don't forget Github CI
2020-08-04 02:37:12 -04:00
Sam Shleifer
57eb1cb68d [s2s] Document better mbart finetuning command (#6229)
* Document better MT command

* improve multigpu command
2020-08-03 18:22:31 -04:00
Victor SANH
0513f8d275 correct label extraction + add note on discrepancies on trained MNLI model and HANS (#6221) 2020-08-03 15:02:51 -04:00
Kevin Canwen Xu
3c289fb38c Remove outdated BERT tips (#6217)
* Remove out-dated BERT tips

* Update modeling_outputs.py

* Update bert.rst

* Update bert.rst
2020-08-04 01:17:56 +08:00
Sylvain Gugger
e4920c92d6 Doc pipelines (#6175)
* Init work on pipelines doc

* Work in progress

* Work in progress

* Doc pipelines

* Rm unwanted default

* Apply suggestions from code review

Lysandre comments

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-08-03 11:44:46 -04:00
Sam Shleifer
b6b2f2270f s2s: fix LR logging, remove some dead code. (#6205) 2020-08-03 10:36:26 -04:00
Maurice Gonzenbach
06f1692b02 Fix _shift_right function in TFT5PreTrainedModel (#6214) 2020-08-03 16:21:23 +02:00
Suraj Patil
0b41867357 fix labels (#6213) 2020-08-03 10:19:35 -04:00
Jay Mody
cedc547e7e Adds train_batch_size, eval_batch_size, and n_gpu to to_sanitized_dict output for logging. (#5331)
* Adds train_batch_size, eval_batch_size, and n_gpu to to_sanitized_dict() output

* Update wandb config logging to use to_sanitized_dict

* removed n_gpu from sanitized dict

* fix quality check errors
2020-08-03 09:00:39 -04:00
Julien Plu
9996f697e3 Fix saved model creation (#5468)
* Fix TF Serving when output_hidden_states and output_attentions are True

* Add tests for saved model creation + bug fix for multiple choices models

* remove unused import

* Fix the input for several layers

* Fix test

* Fix conflict printing

* Apply style

* Fix XLM and Flaubert for TensorFlow

* Apply style

* Fix TF check version

* Apply style

* Trigger CI
2020-08-03 08:10:40 -04:00
Teven
5a0dac53bf Empty assert hunt (#6056)
* Fixed empty asserts

* black-reformatted stragglers in templates

* More code quality checks

* Update src/transformers/convert_marian_to_pytorch.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/convert_marian_to_pytorch.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* removed unused line as per @sshleifer

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-03 10:19:03 +02:00
Martin Müller
16c2240164 Add script to convert tf2.x checkpoint to PyTorch (#5791)
* Add script to convert tf2.x checkpoint to pytorch

The script converts the newer TF2.x checkpoints (as published on their official GitHub: https://github.com/tensorflow/models/tree/master/official/nlp/bert) to Pytorch.

* rename file in order to stay consistent with naming convention
2020-08-03 03:53:38 -04:00
Philip May
82a0e2b67e Fix docstring for BertTokenizerFast (#6185)
- remove duplicate doc-entry for tokenize_chinese_chars
- add doc for strip_accents and wordpieces_prefix
2020-08-02 15:58:26 +08:00
Stas Bekman
d8dbf3b75d [s2s] clean up + doc (#6184)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-01 14:51:07 -04:00
Faiaz Rahman
a39dfe4fb1 Fixed typo in Longformer (#6180) 2020-08-01 18:20:48 +08:00
Joe Davison
8edfaaa81b bart-large-mnli-yahoo-answers model card (#6133)
* Add bart-large-mnli-yahoo-answers model card

* Add examples

* Add widget example

* Rm bart tag

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-31 10:56:32 -04:00
Sylvain Gugger
d951c14ae4 Model output test (#6155)
* Use return_dict=True in all tests

* Formatting
2020-07-31 09:44:37 -04:00
Sylvain Gugger
86caab1e0b Harmonize both Trainers API (#6157)
* Harmonize both Trainers API

* Fix test

* main_prcess -> process_zero
2020-07-31 09:43:23 -04:00
Mehrdad Farahani
603cd81a01 readme m3hrdadfi/albert-fa-base-v2 (#6153)
* readme m3hrdadfi/albert-fa-base-v2

model_card readme for m3hrdadfi/albert-fa-base-v2

* Update model_cards/m3hrdadfi/albert-fa-base-v2/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-31 06:19:06 -04:00
Suraj Patil
838dc06ff5 parse arguments from dict (#4869)
* add parse_dict to parse arguments from dict

* add unit test for parse_dict
2020-07-31 04:44:23 -04:00
Paul O'Leary McCann
cf3cf304ca Replace mecab-python3 with fugashi for Japanese tokenization (#6086)
* Replace mecab-python3 with fugashi

This replaces mecab-python3 with fugashi for Japanese tokenization. I am
the maintainer of both projects.

Both projects are MeCab wrappers, so the underlying C++ code is the
same. fugashi is the newer wrapper and doesn't use SWIG, so for basic
use of the MeCab API it's easier to use.

This code insures the use of a version of ipadic installed via pip,
which should make versioning and tracking down issues easier.

fugashi has wheels for Windows, OSX, and Linux, which will help with
issues with installing old versions of mecab-python3 on Windows.
Compared to mecab-python3, because fugashi doesn't use SWIG, it doesn't
require a C++ runtime to be installed on Windows.

In adding this change I removed some code dealing with `cursor`,
`token_start`, and `token_end` variables. These variables didn't seem to
be used for anything, it is unclear to me why they were there.

I ran the tests and they passed, though I couldn't figure out how to run
the slow tests (`--runslow` gave an error) and didn't try testing with
Tensorflow.

* Style fix

* Remove unused variable

Forgot to delete this...

* Adapt doc with install instructions

* Fix typo

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-31 04:41:14 -04:00
Stas Bekman
f250beb8aa enable easy checkout switch (#5645)
* enable easy checkout switch

allow having multiple repository checkouts and not needing to remember to rerun 'pip install -e .[dev]' when switching between checkouts and running tests.

* make isort happy

* examples needs one too
2020-07-31 04:34:46 -04:00
kolk
7d50af4b02 Create README.md (#6169) 2020-07-31 04:28:35 -04:00
Prajjwal Bhargava
0034a1d248 Add Pytorch Native AMP support in Trainer (#6151)
* fixed type; add Pytorch Native CUDA AMP support

* reverted commit on modeling_utils

* confirming to HF black formatting rule

* changed bool value of _use_apex

* scaler support for gradient clipping

* fix inplace operation of clip_grad_norm

* removed not while version comparison
2020-07-31 04:23:29 -04:00
Funtowicz Morgan
7231f7b503 Enable ONNX/ONNXRuntime optimizations through converter script (#6131)
* Add onnxruntime transformers optimization support

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added Optimization section in ONNX/ONNXRuntime documentation.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve note reference

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fixing imports order.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Add warning about different level of optimization between torch and tf export.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Address @LysandreJik wording suggestion

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address @LysandreJik wording suggestion

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Always optimize model before quantization for maximum performances.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Address comments on the documentation.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve TensorFlow optimization message as suggested by @yufenglee

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Removed --optimize parameter

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Warn the user about current quantization limitation when model is larger than 2GB.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Trigger CI for last check

* Small change in print for the optimization section.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-31 09:45:13 +02:00
Stas Bekman
c0b93a1c7a correct the correction (#6163) 2020-07-30 18:00:02 -04:00
Stas Bekman
a2f6d521c1 typos (#6162)
* 2 small typos

* more typos

* correct path
2020-07-30 17:18:27 -04:00
Sylvain Gugger
f3065abdb8 Doc tokenizer (#6110)
* Start doc tokenizers

* Tokenizer documentation

* Start doc tokenizers

* Tokenizer documentation

* Formatting after rebase

* Formatting after merge

* Update docs/source/main_classes/tokenizer.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address comment

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address Thom's comments

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-07-30 14:51:19 -04:00
guillaume-be
e642c78908 Addition of a DialoguePipeline (#5516)
* initial commit for pipeline implementation

Addition of input processing and history concatenation

* Conversation pipeline tested and working for single & multiple conversation inputs

* Added docstrings for dialogue pipeline

* Addition of dialogue pipeline integration tests

* Delete test_t5.py

* Fixed max code length

* Updated styling

* Fixed test broken by formatting tools

* Removed unused import

* Added unit test for DialoguePipeline

* Fixed Tensorflow compatibility

* Fixed multi-framework support using framework flag

* - Fixed docstring
- Added `min_length_for_response` as an initialization parameter
- Renamed `*args` to `conversations`, `conversations` being a `Conversation` or a `List[Conversation]`
- Updated truncation to truncate entire segments of conversations, instead of cutting in the middle of a user/bot input

* - renamed pipeline name from dialogue to conversational
- removed hardcoded default value of 1000 and use config.max_length instead
- added `append_response` and `set_history` method to the Conversation class to avoid direct fields mutation
- fixed bug in history truncation method

* - Updated ConversationalPipeline to accept only active conversations (otherwise a ValueError is raised)

* - Simplified input tensor conversion

* - Updated attention_mask value for Tensorflow compatibility

* - Updated last dialogue reference to conversational & fixed integration tests

* Fixed conflict with master

* Updates following review comments

* Updated formatting

* Added Conversation and ConversationalPipeline to the library __init__, addition of docstrings for Conversation, added both to the docs

* Update src/transformers/pipelines.py

Updated docsting following review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-30 14:11:39 -04:00
Lysandre Debut
ec0267475c Fix FlauBERT GPU test (#6142)
* Fix GPU test

* Remove legacy constructor
2020-07-30 11:11:48 -04:00
Sylvain Gugger
91cb95461e Switch from return_tuple to return_dict (#6138)
* Switch from return_tuple to return_dict

* Fix test

* [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614)

* Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests

* AutoModels


Tiny tweaks

* Style

* Final changes before merge

* Re-order for simpler review

* Final fixes

* Addressing @sgugger's comments

* Test MultipleChoice

* Rework TF trainer (#6038)

* Fully rework training/prediction loops

* fix method name

* Fix variable name

* Fix property name

* Fix scope

* Fix method name

* Fix tuple index

* Fix tuple index

* Fix indentation

* Fix variable name

* fix eval before log

* Add drop remainder for test dataset

* Fix step number + fix logging datetime

* fix eval loss value

* use global step instead of step + fix logging at step 0

* Fix logging datetime

* Fix global_step usage

* Fix breaking loop + logging datetime

* Fix step in prediction loop

* Fix step breaking

* Fix train/test loops

* Force TF at least 2.2 for the trainer

* Use assert_cardinality to facilitate the dataset size computation

* Log steps per epoch

* Make tfds compliant with TPU

* Make tfds compliant with TPU

* Use TF dataset enumerate instead of the Python one

* revert previous commit

* Fix data_dir

* Apply style

* rebase on master

* Address Sylvain's comments

* Address Sylvain's and Lysandre comments

* Trigger CI

* Remove unused import

* Switch from return_tuple to return_dict

* Fix test

* Add recent model

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
2020-07-30 09:17:00 -04:00
Sylvain Gugger
562b6369c4 Tf trainer cleanup (#6143)
* Clean up TFTrainer

* Add import

* Fix conflicts
2020-07-30 09:13:16 -04:00
Oren Amsalem
c127d055e6 add another e.g. to avoid confusion (#6055) 2020-07-30 08:53:35 -04:00
Oren Amsalem
d24ea708d7 Actually the extra_id are from 0-99 and not from 1-100 (#5967)
a = tokenizer.encode("we got a <extra_id_99>", return_tensors='pt',add_special_tokens=True)
print(a)
>tensor([[   62,   530,     3,     9, 32000]])
a = tokenizer.encode("we got a <extra_id_100>", return_tensors='pt',add_special_tokens=True)
print(a)
>tensor([[   62,   530,     3,     9,     3,     2, 25666,   834,    23,    26,
           834,  2915,  3155]])
2020-07-30 06:13:29 -04:00
Stas Bekman
3212b8850d [s2s] add support for overriding config params (#6149) 2020-07-30 01:09:46 -04:00
Julien Plu
54f9fbeff8 Rework TF trainer (#6038)
* Fully rework training/prediction loops

* fix method name

* Fix variable name

* Fix property name

* Fix scope

* Fix method name

* Fix tuple index

* Fix tuple index

* Fix indentation

* Fix variable name

* fix eval before log

* Add drop remainder for test dataset

* Fix step number + fix logging datetime

* fix eval loss value

* use global step instead of step + fix logging at step 0

* Fix logging datetime

* Fix global_step usage

* Fix breaking loop + logging datetime

* Fix step in prediction loop

* Fix step breaking

* Fix train/test loops

* Force TF at least 2.2 for the trainer

* Use assert_cardinality to facilitate the dataset size computation

* Log steps per epoch

* Make tfds compliant with TPU

* Make tfds compliant with TPU

* Use TF dataset enumerate instead of the Python one

* revert previous commit

* Fix data_dir

* Apply style

* rebase on master

* Address Sylvain's comments

* Address Sylvain's and Lysandre comments

* Trigger CI

* Remove unused import
2020-07-29 14:32:01 -04:00
Lysandre Debut
3f94170a10 [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614)
* Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests

* AutoModels


Tiny tweaks

* Style

* Final changes before merge

* Re-order for simpler review

* Final fixes

* Addressing @sgugger's comments

* Test MultipleChoice
2020-07-29 14:26:26 -04:00
Sylvain Gugger
8a8ae27617 Use google style to document properties (#6130)
* Use google style to document properties

* Update src/transformers/configuration_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-29 12:28:12 -04:00
Julien Plu
fc64559c45 Fix TF CTRL model naming (#6134) 2020-07-29 12:20:00 -04:00
Lysandre Debut
641b873c13 XLNet PLM Readme (#6121) 2020-07-29 11:38:15 -04:00
Timo Moeller
8d157c930b add deepset/xlm-roberta-large-squad2 model card (#6128)
* Add xlm-r QA model card

* Add tags
2020-07-29 17:34:16 +02:00
Funtowicz Morgan
6c002853a6 Added capability to quantize a model while exporting through ONNX. (#6089)
* Added capability to quantize a model while exporting through ONNX.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

We do not support multiple extensions

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Reformat files

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* More quality

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Ensure test_generate_identified_name compares the same object types

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added documentation everywhere on ONNX exporter

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use pathlib.Path instead of plain-old string

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use f-string everywhere

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use the correct parameters for black formatting

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use Python 3 super() style.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use packaging.version to ensure installed onnxruntime version match requirements

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fixing imports sorting order.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Missing raise(s)

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added quantization documentation

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix some spelling.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix bad list header format

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-07-29 13:21:29 +02:00
Sylvain Gugger
25de74ccfe Use FutureWarning to deprecate (#6111) 2020-07-29 05:20:53 -04:00
Funtowicz Morgan
640550fc7a ONNX documentation (#5992)
* Move torchscript and add ONNX documentation under modle_export

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Let's follow guidelines by the gurus: Renamed torchscript.rst to serialization.rst

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove previously introduced tree element

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* WIP doc

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* ONNX documentation

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix invalid link

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve spelling

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Final wording pass

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-07-29 11:02:35 +02:00
Sam Shleifer
92f8ce2ed6 Fix deebert tests (#6102) 2020-07-28 18:30:16 -04:00
Sam Shleifer
c49cd927f7 [Fix] position_ids tests again (#6100) 2020-07-28 18:29:35 -04:00
Sam Shleifer
40796c5801 [fix] add bart to LM_MAPPING (#6099) 2020-07-28 18:29:18 -04:00
Sam Shleifer
5abe50381a Fix #6096: MBartTokenizer's mask token (#6098) 2020-07-28 18:27:58 -04:00
Joe Davison
b1c8b76907 Fix zero-shot pipeline single seq output shape (#6104) 2020-07-28 14:46:03 -04:00
Lysandre Debut
06834bc332 Logs should not be hidden behind a logger.info (#6097) 2020-07-28 12:44:25 -04:00
Sam Shleifer
dafa296c95 [s2s] Delete useless method, log tokens_per_batch (#6081) 2020-07-28 11:24:23 -04:00
Tanmay Thakur
dc4755c6d5 create model-card for lordtt13/emo-mobilebert (#6030) 2020-07-28 10:00:23 -04:00
Sylvain Gugger
28931f81b7 Fix #6092 (#6093)
* Fix #6092

* Format
2020-07-28 09:48:39 -04:00
Manuel Romero
5e97c82940 Create README.md (#6076) 2020-07-28 09:36:00 -04:00
Clement
54f49af4ae Add inference widget examples (#5825) 2020-07-28 09:14:00 -04:00
Sylvain Gugger
0206efb4cf Make all data collators accept dict (#6065)
* Make all data collators accept dict

* Style
2020-07-28 09:08:20 -04:00
Sam Shleifer
31a5486e42 github issue template suggests who to tag (#5790)
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Teven <teven.lescao@gmail.com>
2020-07-28 08:41:27 -04:00
Stas Bekman
f0c70085c2 link to README.md (#6068)
* add a link to README.md

* Update README.md
2020-07-28 20:34:58 +08:00
Pavel Soriano
4f814fd587 [Model Card] camembert-base-squadFR-fquad-piaf (#6087) 2020-07-28 20:33:52 +08:00
Sam Shleifer
3c7fbf35a6 MBART: support summarization tasks where max_src_len > max_tgt_len (#6003)
* MBART: support summarization tasks

* fix test

* Style

* add tokenizer test
2020-07-28 08:18:11 -04:00
Tanmay Thakur
842eb45606 New Community NB Add (#5824)
Signed-off-by: lordtt13 <thakurtanmay72@yahoo.com>
2020-07-28 04:25:12 -04:00
Andrés Felipe Cruz
018d61fa24 Moving transformers package import statements to relative imports in some files (#5796)
* Moving rom transformers statements to relative imports in some files under src/

* Import order

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-28 04:19:17 -04:00
Lysandre Debut
7214954db4 Should return a tuple for serialization (#6061) 2020-07-28 03:14:31 -04:00
Sam Shleifer
7a68d40138 [s2s] Don't mention packed data in README (#6079) 2020-07-27 20:07:21 -04:00
Sam Shleifer
b7345d22d0 [fix] no warning for position_ids buffer (#6063) 2020-07-27 20:00:44 -04:00
Sam Shleifer
1e00ef681d [s2s] dont document packing because it hurts performance (#6077) 2020-07-27 18:26:00 -04:00
sgugger
9d0d3a6645 Pin TF while we wait for a fix 2020-07-27 18:03:09 -04:00
Ramsri Goutham Golla
769e6ba01f Create README.md (#6032)
Adding model card - readme
2020-07-27 16:25:37 -04:00
Sylvain Gugger
fd347e0da7 Add fire to setup.cfg to make isort happy (#6066) 2020-07-27 15:17:33 -04:00
Sam Shleifer
11792d7826 CL util to convert models to fp16 before upload (#5953) 2020-07-27 12:21:25 -04:00
Sam Shleifer
4302ace5bd [pack_dataset] don't sort before packing, only pack train (#5954) 2020-07-27 12:14:23 -04:00
Suraj Patil
c8bdf7f4ec Add new AutoModel classes in pipeline (#6062)
* use new AutoModel classed

* make style and quality
2020-07-27 11:50:08 -04:00
Cola
5779e5434d ✏️ Fix typo (#5734) 2020-07-27 10:55:15 -04:00
Suraj Patil
d1d15d6f2d [examples (seq2seq)] fix preparing decoder_input_ids for T5 (#5994) 2020-07-27 10:10:43 -04:00
Joe Davison
3deffc1d67 Zero shot classification pipeline (#5760)
* add initial zero-shot pipeline

* change default args

* update default template

* add label string splitting

* add str labels support, remove nli from name

* style

* add input validation and working tf defaults

* tests

* quality check

* add docstring to __call__

* add slow tests

* Change truncation to only_first

also lower precision on tests for readibility

* style
2020-07-27 09:42:58 -04:00
Sylvain Gugger
1246b20f6d Fix the return documentation rendering for all model outputs (#6022)
* Fix the return documentation rendering for all model outputs

* Formatting
2020-07-27 09:18:59 -04:00
Sylvain Gugger
3b64ad5d5c Remove unused file (#6023) 2020-07-27 08:31:24 -04:00
Xin Wen
b9b11795cf Update model_summary.rst (#5737)
Add '-' to make the reference of Transformer-XL more accurate and formal.
2020-07-27 05:34:02 -04:00
Gong Linyuan
b21993b362 Allow to set Adam beta1, beta2 in TrainingArgs (#5592)
* Add Adam beta1, beta2 to trainier

* Make style consistent
2020-07-27 05:31:37 -04:00
Pavel Soriano
7969e96f4a draft etalab QA model (#6040) 2020-07-27 05:15:08 -04:00
Vamsi995
a9585fd107 Model card for Vamsi/T5_Paraphrase_Paws (#6037)
* Model card for Vamsi/T5_Paraphrase_Paws

* Update model_cards/Vamsi/T5_Paraphrase_Paws/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-27 05:12:45 -04:00
Rodolfo De Nadai
f7f03b22dc Update README.md of my model (#6042) 2020-07-26 23:31:49 +02:00
Stas Bekman
fb0589a03d don't complain about missing W&B when WANDB_DISABLED=true (#6036)
* don't complain about missing W&B when WANDB_DISABLED=true

* reformat to elif

* typo
2020-07-26 14:29:54 -04:00
Stas Bekman
daa5dd1202 add a summary report flag for run_examples on CI (#6035)
Currently, it's hard to derive which example tests were run on CI, and which weren't. Adding `-rA` flag to `pytest`, will now include a summary like:

```
==================================================================== short test summary info =====================================================================
PASSED examples/test_examples.py::ExamplesTests::test_generation
PASSED examples/test_examples.py::ExamplesTests::test_run_glue
PASSED examples/test_examples.py::ExamplesTests::test_run_language_modeling
PASSED examples/test_examples.py::ExamplesTests::test_run_squad
FAILED examples/test_examples.py::ExamplesTests::test_run_pl_glue - AttributeError: 'Namespace' object has no attribute 'gpus'
============================================================ 1 failed, 4 passed, 8 warnings in 42.96s ============================================================
```
which makes it easier to validate whether some example is being covered by CI or not.
2020-07-26 14:09:14 -04:00
Sam Shleifer
c69ea5efc4 [CI] Don't test apex (#6021) 2020-07-24 15:34:16 -04:00
Sylvain Gugger
a884b7fa38 Update the new model template (#6019) 2020-07-24 14:15:37 -04:00
Julien Chaumond
295466aae6 [model_card] Sample input for rdenadai/BR_BERTo
cc @rdenadai
2020-07-24 14:14:10 -04:00
Manuel Romero
518361d69a Create model card for RuPERTa-base (#6016)
* Update README.md

* Update model_cards/mrm8488/RuPERTa-base/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-24 14:12:29 -04:00
Manuel Romero
bd51f0a7ab Create README.md (#5952) 2020-07-24 14:12:14 -04:00
Manuel Romero
87a779dfa8 Create README.md (#5951) 2020-07-24 14:12:09 -04:00
Rodolfo De Nadai
d115872b38 Create README.md (#6020)
* Create README.md

* Update README.md

* Update model_cards/rdenadai/BR_BERTo/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Update model_cards/rdenadai/BR_BERTo/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-24 14:11:14 -04:00
Sylvain Gugger
3996041d0a Fix question template (#6014) 2020-07-24 10:04:25 -04:00
Victor SANH
778e635fc9 [model_cards] roberta-base-finetuned-yelp-polarity (#6009)
* [model_cards] roberta-base-finetuned-yelp-polarity

* Update model_cards/VictorSanh/roberta-base-finetuned-yelp-polarity/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-24 09:45:21 -04:00
Funtowicz Morgan
614fef1691 Ensure OpenAI GPT position_ids is correctly initialized and registered at init. (#5773)
* Ensure OpenAI GPT position_ids is correctly initialized and registered as buffer at init.

This will make it compatible with TorchScript export.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix missing slice operator on the tensor data accessor.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Style.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fixed BertEmbedding position_ids buffer created at forward.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fixed MobileBertEmbedding position_ids buffer created at forward.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fixed XLM position_ids buffer created at forward.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-07-24 15:37:52 +02:00
Sylvain Gugger
3b44aa935a Model utils doc (#6005)
* Document TF modeling utils

* Document all model utils
2020-07-24 09:16:28 -04:00
sgugger
a540405213 Fix commit hash for stable doc 2020-07-24 09:07:40 -04:00
Qingqing Cao
fc0fe2a532 fix: model card readme clutter (#6008)
this removes the clutter line in the readme.md of model card `csarron/roberta-base-squad-v1`. It also fixes the result table.
2020-07-24 04:17:52 -04:00
Sylvain Gugger
f5b5c5bd7e Avoid unnecessary warnings when loading pretrained model (#5922)
* Avoid unnecessary warnings when loading pretrained model

* Fix test

* Add other keys to ignore

* keys_to_ignore_at_load -> authorized_missing_keys
2020-07-23 18:13:36 -04:00
Philip May
29afb5764f Bert german dbmdz uncased sentence stsb (#6000)
* Describe usage of sentence model

* fix typo usage

* add use and description to readme

* fix typo in readme

* readme formatting

* add training procedure to readme

* description name and company

* readme formatting

* dataset training readme

* typo

* readme
2020-07-23 17:56:45 -04:00
Qingqing Cao
2b5ef9706d Model cards: add roberta-base-squad-v1 and bert-base-uncased-squad-v1 (#6006)
* add: bert-base-uncased-squad-v1

* add: roberta-base-squad-v1
2020-07-23 17:53:47 -04:00
Sam Shleifer
9827d666eb MbartTokenizer: do not hardcode vocab size (#5998) 2020-07-23 15:41:14 -04:00
Sylvain Gugger
6e16195510 Fix #5974 (#5999) 2020-07-23 13:51:29 -04:00
Sylvain Gugger
e168488a74 Cleanup Trainer and expose customization points (#5982)
* Clean up Trainer and expose customization points

* Formatting

* eval_step -> prediction_step
2020-07-23 12:05:41 -04:00
Qingqing Cao
76f52324b1 add fine-tuned mobilebert squad v1 and squad v2 model cards (#5980)
* add mobilebert-uncased-squad-v2

* fix shell cmd, add creator info

* add mobilebert-uncased-squad-v1
2020-07-23 11:57:29 -04:00
GmailB
7e251ae039 Create README.md (#5989) 2020-07-23 11:41:33 -04:00
Sylvain Gugger
33d7506ea1 Update doc of the model page (#5985) 2020-07-22 18:14:57 -04:00
Sam Shleifer
c3206eef44 [test] partial coverage for train_mbart_enro_cc25.sh (#5976) 2020-07-22 14:34:49 -04:00
Stas Bekman
2c0da7803a minor doc fixes (#5831)
* minor doc fixes

correct superclass name and small grammar fixes

* correct the instance name in the error message

It appears to be `BaseTokenizer` from looking at:

`from tokenizers.implementations import BaseTokenizer as BaseTokenizerFast`

and not `Tokenizer` as it currently says.
2020-07-22 13:22:34 -04:00
Sam Shleifer
feeb956a19 [docs] Add integration test example to copy pasta template (#5961)
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-22 12:48:38 -04:00
Sam Shleifer
01116d3c5b T5 Model Cards (#5759)
* T5 Model Cards

* Fix paths

* Fix tags

* lang-en
2020-07-22 11:38:37 -04:00
Funtowicz Morgan
896300177b Expose padding_strategy on squad processor to fix QA pipeline performance regression (#5932)
* Attempt to fix the way squad_convert_examples_to_features pad the elements for the QA pipeline.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Quality

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make the code easier to read and avoid testing multiple test the same thing.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* missing enum value on truncation_strategy.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Rethinking for the easiest fix: expose the padding strategy on squad_convert_examples_to_features.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove unused imports.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-07-22 16:11:57 +02:00
Sam Shleifer
ae67b2439f [CI] Install examples/requirements.txt (#5956) 2020-07-21 21:07:48 -04:00
Sylvain Gugger
e714412fe6 Update doc to new model outputs (#5946)
* Update doc to new model outputs

* Fix outputs in quicktour
2020-07-21 18:13:55 -04:00
Sam Shleifer
ddd40b3211 [CI] self-scheduled runner tests examples/ (#5927) 2020-07-21 17:01:07 -04:00
Sam Shleifer
9dab39feea seq2seq/run_eval.py can take decoder_start_token_id (#5949) 2020-07-21 16:58:45 -04:00
Sam Shleifer
5b193b39b0 [examples/seq2seq]: add --label_smoothing option (#5919) 2020-07-21 16:51:39 -04:00
Sam Shleifer
95d1962b9c [Doc] explaining romanian postprocessing for MBART BLEU hacking (#5943) 2020-07-21 14:12:48 -04:00
Jannes
604a2355dc Create README.md (#5876) 2020-07-21 13:28:22 -04:00
Jannes
77c718edef Create README.md (#5873) 2020-07-21 13:28:06 -04:00
Jannes
325b277db9 Create README.md (#5874) 2020-07-21 13:27:30 -04:00
Jannes
d15be2216c Create README.md (#5879) 2020-07-21 13:27:13 -04:00
Jannes
f3e23dd90a Create README.md (#5878) 2020-07-21 13:20:47 -04:00
Jannes
8b01d15c05 Create README.md (#5877) 2020-07-21 13:20:43 -04:00
Jannes
05bddf304e Create README.md (#5875) 2020-07-21 13:20:32 -04:00
Jannes
783a0c7ee9 Create README.md (#5872) 2020-07-21 13:20:21 -04:00
Jannes
e7844d60c2 Create README.md (#5871) 2020-07-21 13:19:48 -04:00
tuner007
b1ee69763c Create README.md (#5864) 2020-07-21 13:15:07 -04:00
Manuel Romero
5f809e4976 Update README.md (#5857)
Add nlp dataset used
2020-07-21 13:14:27 -04:00
Manuel Romero
4215f59c99 Update README.md (#5856)
Add dataset used as it is now part of nlp package
2020-07-21 13:11:08 -04:00
Ali Hamdi Ali Fadel
1d72460d55 Add ComVE model cards (#5884)
* Add ComVE model cards

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-21 12:54:29 -04:00
Aditya Soni
ccbf74a685 typos in seq2seq/readme (#5937) 2020-07-21 09:44:59 -04:00
BatJedi
d32279438a Created model card for my extreme summarization model (#5839)
* Created model card for my extreme summarization model

* Update model_cards/yuvraj/xSumm/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-21 03:54:57 -04:00
BatJedi
abf5c56e9d Created model card for my summarization model (#5838)
* Created model card for my summarization model

* Update model_cards/yuvraj/summarizer-cnndm/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-21 03:54:14 -04:00
Manuel Romero
d73baeebc5 Create README.md (#5921)
- Maybe the result of this query answers the question You did some days ago @julien-c ;-)
2020-07-21 03:52:52 -04:00
Manuel Romero
50acfc8717 Create README.md (#5924) 2020-07-21 03:41:37 -04:00
Manuel Romero
7249533404 Create README.md (#5920) 2020-07-21 03:31:42 -04:00
Sylvain Gugger
4781afd045 Clarify arg class (#5916) 2020-07-20 19:47:06 -04:00
Qingqing Cao
8e0bcb56ec DataParallel fix: multi gpu evaluation (#5926)
The DataParallel training was fixed in https://github.com/huggingface/transformers/pull/5733, this commit also fixes the evaluation. It's more convenient when the user enables both `do_train` and `do_eval`.
2020-07-20 17:54:08 -04:00
Sylvain Gugger
a20969170b Add AlbertForPretraining to doc (#5914) 2020-07-20 17:53:21 -04:00
Sam Shleifer
f1a4e06f1f [Fix] seq2seq pack_dataset.py actually packs (#5913)
Huge MT speedup!
2020-07-20 15:18:26 -04:00
Sylvain Gugger
32883b310b Improve doc of use_cache (#5912)
* Improve doc of use_cache

* Update src/transformers/configuration_xlnet.py

Co-authored-by: Teven <teven.lescao@gmail.com>

Co-authored-by: Teven <teven.lescao@gmail.com>
2020-07-20 11:50:41 -04:00
Clement
9ccb45a263 Update gpt2-README.md 2020-07-20 11:40:33 -04:00
Clement
f19751117d Create gpt2-medium-README.md 2020-07-20 10:47:42 -04:00
Clement
511523672b Create gpt2-large-README.md 2020-07-20 10:47:27 -04:00
Clement
182c611934 Update gpt2-README.md 2020-07-20 10:47:11 -04:00
Clement
a9ae27cd0f add link to write with transformers to model card 2020-07-20 10:46:10 -04:00
Sam Shleifer
01c40db4f8 [cleanup] squad processor (#5868) 2020-07-20 10:44:10 -04:00
Stas Bekman
35cb101eae DataParallel fixes (#5733)
* DataParallel fixes:

1. switched to a more precise check
-        if self.args.n_gpu > 1:
+        if isinstance(model, nn.DataParallel):

2. fix tests - require the same fixup under DataParallel as the training module

* another fix
2020-07-20 09:29:12 -04:00
Pradhy729
290b6e18ac Trainer support for iterabledataset (#5834)
* Don't pass sampler for iterable dataset

* Added check for test and eval dataloaders.

* Formatting

* Don't pass sampler for iterable dataset

* Added check for test and eval dataloaders.

* Formatting

* Cleaner if nesting.

* Added test for trainer and iterable dataset

* Formatting for test

* Fixed import when torch is available only.

* Added require torch decorator to helper class

* Moved dataset class inside unittest

* Removed nested if and changed model in test

* Checking torch availability for IterableDataset
2020-07-20 09:07:37 -04:00
Julien Chaumond
82dd96cae7 [model_cards] Dataset ids are case-sensitive
cc @lhoestq @thomwolf

Also cc'ing model author @nreimers => Model pages now properly link to the dataset pages (and in the future, eval results, etc.)
2020-07-20 12:47:28 +02:00
Manuel Romero
b01a8844a9 Create README.md (#5813) 2020-07-20 04:06:42 -04:00
Alan deLevie
223bad242d fix typo in (#5893) 2020-07-20 03:53:03 -04:00
Alan deLevie
d441f8d29d fix typo in training_args_tf.py (#5894) 2020-07-20 03:48:22 -04:00
Sam Shleifer
09a2f40684 Seq2SeqDataset uses linecache to save memory by @Pradhy729 (#5792)
Co-authored-by: Pradhy729 <49659913+Pradhy729@users.noreply.github.com>
2020-07-18 13:57:33 -04:00
Teven
4b506a37e3 Xlnet outputs (#5883)
Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
2020-07-18 17:33:13 +02:00
Teven
a55809241f Revert "Xlnet outputs (#5881)" (#5882)
This reverts commit 13be487212.
2020-07-18 17:15:40 +02:00
Teven
13be487212 Xlnet outputs (#5881)
Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
2020-07-18 16:53:29 +02:00
Sebastian
eae6d8d14f Update tokenizers to 0.8.1.rc to fix Mac OS X issues (#5867) 2020-07-18 08:20:11 -04:00
Sam Shleifer
dad5e12e54 [seq2seq] distillation.py accepts trainer arguments (#5865) 2020-07-18 07:43:57 -04:00
Sam Shleifer
ba2400189b [seq2seq] MAX_LEN env var for MT commands (#5837) 2020-07-17 22:51:31 -04:00
Nathan Raw
529850ae7b Lightning Updates for v0.8.5 (#5798)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-07-17 22:43:06 -04:00
Teven
615be03f9d Revert "XLNet use_cache refactor (#5770)" (#5854)
This reverts commit 0b2da0e592.
2020-07-17 20:33:44 +02:00
Teven
0b2da0e592 XLNet use_cache refactor (#5770)
Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
2020-07-17 20:24:16 +02:00
Jannes
9750e1300c Create README.md (#5847) 2020-07-17 14:03:53 -04:00
Julien Chaumond
1bca4fbd39 [model_card] Fix metadata 2020-07-17 13:55:37 -04:00
Gianpaolo Di Pietro
a9d56a675a Added model card for neuraly/bert-base-italian-cased-sentiment (#5845)
* Added model card for neuraly/bert-base-italian-cased-sentiment

* Update model_cards/neuraly/bert-base-italian-cased-sentiment/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Gianpy15 <g.dipietro@neuraly.ai>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-17 13:50:49 -04:00
Patrick von Platen
12f14710ce [Model card] Bert2Bert
Add Rouge2 results
2020-07-17 18:22:05 +02:00
Patrick von Platen
9d37c56bab [Reformer] - Cache hidden states and buckets to speed up inference (#5578)
* fix merge rebase

* add intermediate reformer code

* save intermediate caching results

* save intermediate

* save intermediate results

* save intermediate

* upload next step

* fix generate tests

* make tests work

* add named tuple output

* Apply suggestions from code review

* fix use_cache for False case

* fix tensor to gpu

* fix tensor to gpu

* refactor

* refactor and make style
2020-07-17 16:17:42 +02:00
Patrick von Platen
0b6c255a95 [Model card] Bert2Bert (#5841)
* Create README.md

* Update README.md

* Update README.md

* Update README.md
2020-07-17 11:41:56 +02:00
Sam Shleifer
3d9556a72b [cleanups] make Marian save as Marian (#5830) 2020-07-17 02:54:25 -04:00
Sam Shleifer
e238e3d55a [seq2seq] Don't copy self.source in sortishsampler (#5818) 2020-07-17 01:53:25 -04:00
Bayartsogt Yadamsuren
2e4624b415 language tag addition on albert-mongolian (#5828)
* language tag addition on albert-mongolian

* Update model_cards/bayartsogt/albert-mongolian/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-17 01:40:38 -04:00
Manuel Romero
d088d744ad Create README.md (#5821) 2020-07-16 15:18:31 -04:00
Nick Doiron
233072fc1e dv-wave (#5823) 2020-07-16 15:13:51 -04:00
Sam Shleifer
283500ff9f [seq2seq] pack_dataset.py rewrites dataset in max_tokens format (#5819) 2020-07-16 14:06:49 -04:00
Manuel Romero
c45d7a707d Update README.md (#5812)
Fix missig "-" in meta data
2020-07-16 10:25:50 -04:00
Patrick von Platen
057411c56a fix longformer slow down (#5811) 2020-07-16 16:19:37 +02:00
Patrick von Platen
89a78be51f fix benchmark for longformer (#5808) 2020-07-16 15:15:10 +02:00
Patrick von Platen
aefc0c0429 fix benchmark non standard model (#5801) 2020-07-16 12:13:10 +02:00
Martin Müller
8ce610bc96 Update README.md (#5789) 2020-07-16 05:26:17 -04:00
Julien Chaumond
6b6d035d8f [model_card] illuin/lepetit 2020-07-16 03:50:47 -04:00
HuYong
d1f74b9aff ADD ERNIE model (#5763)
* ERNIE model card

* Update Readme.md

* Update Readme.md

* Update Readme.md

* Rename Readme.md to README.md

* Update README.md

* Update Readme.md

* Update README.md

* Rename Readme.md to README.md

* Update Readme.md

* Update Readme.md

* Rename Readme.md to README.md

* Update and rename Readme.md to README.md

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-07-16 11:03:05 +08:00
Clement
3b924fabee Create distilbert squad tags 2020-07-15 17:59:06 -04:00
Clement
067814102c fix readme 2020-07-15 17:50:46 -04:00
Clement
d179fd69ca test readme change 2020-07-15 17:48:22 -04:00
Manuel Romero
63761614eb Update README.md (#5776)
Add cherry picked example for the widget

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-15 16:19:21 -04:00
Manuel Romero
221e23c6c1 Create README.md (#5781)
* Create README.md

* Update model_cards/mrm8488/RoBasquERTa/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-15 16:17:25 -04:00
Manuel Romero
d4cda29af1 Create README.md (#5782)
* Create README.md

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-15 16:17:19 -04:00
Julien Chaumond
62ec28ce4f [model_cards] Fix pierreguillou/gpt2-small-portuguese 2020-07-15 22:14:52 +02:00
Pierre Guillou
a946724bbf metadata (#5758)
* metadata

* Update model_cards/pierreguillou/gpt2-small-portuguese/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-15 16:13:28 -04:00
Julien Chaumond
015dc51fe3 [model_card] bert-portuguese: add language meta
cc @rodrigonogueira4 @abiocapsouza @robertoalotufo

Also cc @piegu

Obrigado :)
2020-07-15 21:25:52 +02:00
Sam Shleifer
1a647abf0b [fix] check code quality (#5772) 2020-07-15 14:59:38 -04:00
Julien Chaumond
b23d3a5ad4 [model_cards] Switch all languages codes to ISO-639-{1,2,3} 2020-07-15 18:59:20 +02:00
Funtowicz Morgan
d533c7e9b9 [fix] T5 ONNX test: model.to(torch_device) (#5769)
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-07-15 10:11:22 -04:00
Sam Shleifer
d0486c8bc2 [cleanup] T5 test, warnings (#5761) 2020-07-15 08:23:22 -04:00
Patrick von Platen
ec0a945cf9 [AutoModels] Fix config params handling of all PT and TF AutoModels (#5665)
* fix auto model causal lm

* leverage given functionality

* apply unused kwargs to all auto models
2020-07-15 09:51:14 +02:00
Julien Chaumond
8ab565a4be [model_card] Fix syntax 2020-07-14 22:27:07 +02:00
Bashar Talafha
92dc959224 Update README.md (#5752) 2020-07-14 15:48:59 -04:00
Bashar Talafha
baf93b02c4 Update README.md (#5696) 2020-07-14 12:51:57 -04:00
Joe Davison
5d178954c9 tiny ppl doc typo fix (#5751) 2020-07-14 10:39:44 -06:00
Manuel Romero
ac921f0385 RuPERTa model card (#5743)
* Customize inference widget input

* Update model_cards/mrm8488/RuPERTa-base/README.md

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-07-14 22:58:45 +08:00
dartrevan
21c1fe5290 RuDR-BERT model card (#5698) 2020-07-14 22:51:53 +08:00
Doron Adler
2db1cc807b Norod78/hewiki-articles-distilGPT2py-il model card (#5735)
Model card for hewiki-articles-distilGPT2py-il
A tiny GPT2 model for generating Hebrew text
2020-07-14 22:50:44 +08:00
Pierre Guillou
dae244ad89 GPorTuguese-2 model card (#5744) 2020-07-14 22:48:52 +08:00
Sam Shleifer
b2505f7db7 Cleanup bart caching logic (#5640) 2020-07-14 06:13:05 -04:00
Sam Shleifer
838950ee44 [fix] mbart_en_ro_generate test now identical to fairseq (#5731) 2020-07-14 06:12:24 -04:00
Boris Dayma
4d5a8d6557 docs(wandb): explain how to use W&B integration (#5607)
* docs(wandb): explain how to use W&B integration

fix #5262

* Also mention TensorBoard

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-14 05:12:33 -04:00
Gunnlaugur Thor Briem
cd30f98fd2 doc: fix apparent copy-paste error in docstring (#5626) 2020-07-14 09:47:41 +02:00
as-stevens
f867000f56 [Reformer classification head] Implement the reformer model classification head for text classification (#5198)
* Reformer model head classification implementation for text classification

* Reformat the reformer model classification code

* PR review comments, and test case implementation for reformer for classification head changes

* CI/CD reformer for classification head test import error fix

* CI/CD test case implementation  added ReformerForSequenceClassification to all_model_classes

* Code formatting- fixed

* Normal test cases added for reformer classification head

* Fix test cases implementation for the reformer classification head

* removed token_type_id parameter from the reformer classification head

* fixed the test case for reformer classification head

* merge conflict with master fixed

* merge conflict, changed reformer classification to accept the choice_label parameter added in latest code

* refactored the the reformer classification head test code

* reformer classification head, common transform test cases fixed

* final set of the review comment, rearranging the reformer classes and docstring add to classification forward method

* fixed the compilation error and text case fix for reformer classification head

* Apply suggestions from code review

Remove unnecessary dup

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-07-14 09:16:22 +02:00
Gaurav Mishra
f0bda06f43 Update tokenization_t5.py (#5717)
Minor doc fix.
2020-07-14 00:02:03 -04:00
Sam Shleifer
c3c61ea017 [Fix] github actions CI by reverting #5138 (#5686) 2020-07-13 17:12:18 -04:00
Stas Bekman
45addfe96d FlaubertForTokenClassification (#5644)
* implement FlaubertForTokenClassification as a subclass of XLMForTokenClassification

* fix mapping order

* add the doc

* add common tests
2020-07-13 14:59:53 -04:00
Patrick von Platen
7096e47513 [Longformer] fix longformer global attention output (#5659)
* fix longformer global attention output

* fix multi gpu problem

* replace -10000 with 0

* better comment

* make attention output equal local and global

* Update src/transformers/modeling_longformer.py
2020-07-13 17:23:22 +02:00
Sylvain Gugger
ce374ba877 Fix Trainer in DataParallel setting (#5685)
* Fix Trainer in DataParallel setting

* Fix typo

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-07-13 08:37:38 -04:00
Stas Bekman
0a19a49dfe doc improvements (#5688) 2020-07-13 18:10:17 +08:00
Stas Bekman
443b0cad96 rename the function to match the rest of the test convention (#5692) 2020-07-13 18:09:49 +08:00
onepointconsulting
74843695eb Added first description of the model (#5672)
Added general description, information about the tags and also some example usage code.
2020-07-13 02:53:48 -04:00
Kevin Canwen Xu
0befb51327 Pipeline model type check (#5679)
* Add model type check for pipelines

* Add model type check for pipelines

* rename func

* Fix the init parameters

* Fix format

* rollback unnecessary refactor
2020-07-12 12:34:21 +08:00
Kevin Canwen Xu
dc31a72f50 Add Microsoft's CodeBERT (#5683)
* Add Microsoft's CodeBERT

* link style

* single modal

* unused import
2020-07-11 21:37:30 +08:00
Sylvain Gugger
7fad617dc1 Document model outputs (#5673)
* Document model outputs

* Update docs/source/main_classes/output.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-10 17:31:02 -04:00
Sylvain Gugger
df983b7483 Deprecate old past arguments (#5671) 2020-07-10 17:25:52 -04:00
Tomo Lazovich
cdf4cd7068 [squad] add version tag to squad cache (#5669) 2020-07-10 16:34:21 -04:00
Patrick von Platen
223084e42b Add Reformer to notebooks 2020-07-10 18:34:25 +02:00
Julien Chaumond
201d23f285 Update The Big Table of Tasks
Co-Authored-By: Suraj Patil <surajp815@gmail.com>
Co-Authored-By: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-10 18:07:29 +02:00
Bashar Talafha
82f7bbbd93 Update README.md (#5617)
* Update README.md

* Update README.md
2020-07-10 11:43:27 -04:00
Manuel Romero
bf497376ee Create README.md (#5572) 2020-07-10 11:42:49 -04:00
kolk
3653d01f2a Create README.md for electra-base-squad2 (#5574) 2020-07-10 11:39:44 -04:00
Txus
aa69c81f29 Add freshly trained base version (#5621) 2020-07-10 11:39:04 -04:00
Teven
227e0a406d Fixed use of memories in XLNet (caching for language generation + warning when loading improper memoryless model) (#5632)
* Pytorch gpu => cpu proper device

* Memoryless XLNet warning + fixed memories during generation

* Revert "Pytorch gpu => cpu proper device"

This reverts commit 93489b36

* made black happy

* TF generation with memories

* dim => axis

* added padding_text to TF XL models

* Added comment, added TF
2020-07-10 17:38:36 +02:00
Manuel Romero
3b7b646563 Create README.md (#5638) 2020-07-10 11:38:23 -04:00
Manuel Romero
0039b965db Create model card (#5655)
Create model card for T5-small fine-tuned on SQUAD v2
2020-07-10 11:38:11 -04:00
Nils Reimers
46982d612f Create README.md - Model card (#5657)
Model card for sentence-transformers/bert-base-nli-cls-token
2020-07-10 11:38:03 -04:00
Nils Reimers
c483803d1b Create README.md - Model card (#5658)
Model card for sentence-transformers/bert-base-nli-max-tokens
2020-07-10 11:37:56 -04:00
Sylvain Gugger
edfd82f5ff Change model outputs types to self-document outputs (#5438)
* [WIP] Proposal for model outputs

* All Bert models

* Make CI green maybe?

* Fix ONNX test

* Isolate ModelOutput from pt and tf

* Formatting

* Add Electra models

* Auto-generate docstrings from outputs

* Add TF outputs

* Add some BERT models

* Revert TF side

* Remove last traces of TF changes

* Fail with a clear error message

* Add Albert and work through Bart

* Add CTRL and DistilBert

* Formatting

* Progress on Bart

* Renames and finish Bart

* Formatting

* Fix last test

* Add DPR

* Finish Electra and add FlauBERT

* Add GPT2

* Add Longformer

* Add MMBT

* Add MobileBert

* Add GPT

* Formatting

* Add Reformer

* Add Roberta

* Add T5

* Add Transformer XL

* Fix test

* Add XLM + fix XLMForTokenClassification

* Style + XLMRoberta

* Add XLNet

* Formatting

* Add doc of return_tuple arg
2020-07-10 11:36:53 -04:00
Suraj Parmar
fa265230a2 Create Model card for RoBERTa-hindi-guj-san (#5661) 2020-07-10 11:34:23 -04:00
Sylvain Gugger
b2747af543 Improvements to PretrainedConfig documentation (#5642)
* Update PretrainedConfig doc

* Formatting

* Small fixes

* Forgotten args and more cleanup
2020-07-10 10:31:47 -04:00
Julien Chaumond
bfacb2e34f [model_card] BART for ELI5
cc @yjernite
2020-07-10 08:10:24 -04:00
Nils Reimers
2e6bb0e9c3 Create README.md (#5652) 2020-07-10 05:41:10 -04:00
Julien Chaumond
552e4591f5 [model_card] Add meta + fix link to image
(hotlinking to image works on GitHub but not on external sites)

cc @bashartalafha
2020-07-10 05:07:33 -04:00
Teven
02a0b43014 Fixed TextGenerationPipeline on torch + GPU (#5629)
* Pytorch gpu => cpu proper device

* Memoryless XLNet warning + fixed memories during generation

* Revert "Memoryless XLNet warning + fixed memories during generation"

This reverts commit 3d3251ff

* Took the operations on the generated_sequence out of the ensure_device scope
2020-07-09 16:29:32 -04:00
Sylvain Gugger
760f726e51 Add forum link in the docs (#5637) 2020-07-09 15:13:22 -04:00
Stas Bekman
bfeaae2235 fix 404 (#5616) 2020-07-09 15:12:29 -04:00
Lysandre Debut
b25f7802de Should check that torch TPU is available (#5636) 2020-07-09 13:54:32 -04:00
Lysandre Debut
3cc23eee06 More explicit error when failing to tensorize overflowing tokens (#5633) 2020-07-09 13:35:21 -04:00
Lysandre
b9d8af07e6 Update stable doc 2020-07-09 11:06:23 -04:00
Lysandre Debut
1158e56551 Correct extension (#5631) 2020-07-09 11:03:07 -04:00
Lysandre
5c82bf6831 Update stable doc 2020-07-09 10:16:13 -04:00
Lysandre Debut
0533cf4706 Test XLA examples (#5583)
* Test XLA examples

* Style

* Using `require_torch_tpu`

* Style

* No need for pytest
2020-07-09 09:19:19 -04:00
Funtowicz Morgan
3bd55199cd QA pipeline BART compatible (#5496)
* Ensure padding and question cannot have higher probs than context.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Add bart the the list of tokenizers adding two <sep> tokens for squad_convert_example_to_feature

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Format.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Addressing @patrickvonplaten comments.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Addressing @patrickvonplaten comments about masking non-context element when generating the answer.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Addressing @sshleifer comments.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make sure we mask CLS after handling impossible answers

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Mask in the correct vectors ...

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-07-09 15:11:40 +02:00
Stas Bekman
fa5423b169 doc fixes (#5613) 2020-07-08 19:52:44 -04:00
Txus
7d0ef00420 Add newly trained calbert-tiny-uncased (#5599)
* Create README.md

Add newly trained `calbert-tiny-uncased` (complete rewrite with SentencePiece)

* Add Exbert link

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-08 17:54:51 -04:00
Lorenzo Ampil
0cc4eae0e6 Fix Inconsistent NER Grouping (Pipeline) (#4987)
* Add B I handling to grouping

* Add fix to include separate entity as last token

* move last_idx definition outside loop

* Use first entity in entity group as reference for entity type

* Add test cases

* Take out extra class accidentally added

* Return tf ner grouped test to original

* Take out redundant last entity

* Get last_idx safely

Co-authored-by: ColleterVi <36503688+ColleterVi@users.noreply.github.com>

* Fix first entity comment

* Create separate functions for group_sub_entities and group_entities (splitting call method to testable functions)

* Take out unnecessary last_idx

* Remove additional forward pass test

* Move token classification basic tests to separate class

* Move token classification basic tests back to monocolumninputtestcase

* Move base ner tests to nerpipelinetests

* Take out unused kwargs

* Add back mandatory_keys argument

* Add unitary tests for group_entities in _test_ner_pipeline

* Fix last entity handling

* Fix grouping fucntion used

* Add typing to group_sub_entities and group_entities

Co-authored-by: ColleterVi <36503688+ColleterVi@users.noreply.github.com>
2020-07-08 16:18:17 -04:00
Suraj Patil
82ce8488bb create model cards for qg models (#5610) 2020-07-08 16:08:56 -04:00
Bashar Talafha
d6b6ab11f0 Create README.md (#5601) 2020-07-08 16:07:48 -04:00
Patrick von Platen
40d98ebf50 Update benchmark notebook (#5603)
* Créé avec Colaboratory

* delete old file
2020-07-08 16:03:59 +02:00
Sylvain Gugger
281e394889 Update question template (#5585) 2020-07-08 08:46:35 -04:00
Patrick von Platen
f82a2a5e8e [Benchmark] Add benchmarks for TF Training (#5594)
* tf_train

* adapt timing for tpu

* fix timing

* fix timing

* fix timing

* fix timing

* update notebook

* add tests
2020-07-08 12:11:09 +02:00
Ji Xin
cfbb982974 Add DeeBERT (entropy-based early exiting for *BERT) (#5477)
* Add deebert code

* Add readme of deebert

* Add test for deebert

Update test for Deebert

* Update DeeBert (README, class names, function refactoring); remove requirements.txt

* Format update

* Update test

* Update readme and model init methods
2020-07-08 08:17:59 +08:00
Joe Davison
b4b33fdf25 Guide to fixed-length model perplexity evaluation (#5449)
* add first draft ppl guide

* upload imgs

* expand on strides

* ref typo

* rm superfluous past var

* add tokenization disclaimer
2020-07-07 16:04:15 -06:00
Patrick von Platen
fde217c679 readme for benchmark (#5363) 2020-07-07 23:21:23 +02:00
Sam Shleifer
d6eab53058 mbart.prepare_translation_batch: pass through kwargs (#5581) 2020-07-07 13:46:05 -04:00
Sam Shleifer
353b8f1e7a Add mbart-large-cc25, support translation finetuning (#5129)
improve unittests for finetuning, especially w.r.t testing frozen parameters
fix freeze_embeds for T5
add streamlit setup.cfg
2020-07-07 13:23:01 -04:00
Julien Chaumond
141492448b Create xlm-roberta-large-finetuned-conll03-german-README.md
cc @BramVanroy
2020-07-07 13:15:10 -04:00
Patrick von Platen
4dc65591b5 [Almost all TF models] TF clean up: add missing CLM / MLM loss; fix T5 naming and keras compile (#5395)
* add first version of clm tf

* make style

* add more tests for bert

* update tf clm loss

* fix tests

* correct tf ner script

* add mlm loss

* delete bogus file

* clean tf auto model + add tests

* finish adding clm loss everywhere

* fix training in distilbert

* fix flake8

* save intermediate

* fix tf t5 naming

* remove prints

* finish up

* up

* fix tf gpt2

* fix new test utils import

* fix flake8

* keep backward compatibility

* Update src/transformers/modeling_tf_albert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_electra.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_roberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_mobilebert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_distilbert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply sylvains suggestions

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-07 18:15:53 +02:00
Suraj Patil
33e43edddc [docs] fix model_doc links in model summary (#5566)
* fix model_doc links

* update model links
2020-07-07 11:06:12 -04:00
Quentin Lhoest
4fedc1256c Fix tests imports dpr (#5576)
* fix test imports

* fix max_length

* style

* fix tests
2020-07-07 16:35:12 +02:00
Sam Shleifer
d4886173b2 [Bart] enable test_torchscript, update test_tie_weights (#5457)
* Passing all but one torchscript test

* Style

* move comment

* remove unneeded assert
2020-07-07 10:06:48 -04:00
Suraj Patil
e49393c361 [examples] Add trainer support for question-answering (#4829)
* add SquadDataset

* add DataCollatorForQuestionAnswering

* update __init__

* add run_squad with  trainer

* add DataCollatorForQuestionAnswering in __init__

* pass data_collator to trainer

* doc tweak

* Update run_squad_trainer.py

* Update __init__.py

* Update __init__.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-07 08:57:08 -04:00
Quentin Lhoest
fbd8792195 Add DPR model (#5279)
* beginning of dpr modeling

* wip

* implement forward

* remove biencoder + better init weights

* export dpr model to embed model for nlp lib

* add new api

* remove old code

* make style

* fix dumb typo

* don't load bert weights

* docs

* docs

* style

* move the `k` parameter

* fix init_weights

* add pretrained configs

* minor

* update config names

* style

* better config

* style

* clean code based on PR comments

* change Dpr to DPR

* fix config

* switch encoder config to a dict

* style

* inheritance -> composition

* add messages in assert startements

* add dpr reader tokenizer

* one tokenizer per model

* fix base_model_prefix

* fix imports

* typo

* add convert script

* docs

* change tokenizers conf names

* style

* change tokenizers conf names

* minor

* minor

* fix wrong names

* minor

* remove unused convert functions

* rename convert script

* use return_tensors in tokenizers

* remove n_questions dim

* move generate logic to tokenizer

* style

* add docs

* docs

* quality

* docs

* add tests

* style

* add tokenization tests

* DPR full tests

* Stay true to the attention mask building

* update docs

* missing param in bert input docs

* docs

* style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-07-07 08:56:12 -04:00
Savaş Yıldırım
d2a9399115 Update model card (#5491) 2020-07-07 18:43:49 +08:00
Savaş Yıldırım
2e653d89d7 Update model card (#5492) 2020-07-07 18:43:34 +08:00
Savaş Yıldırım
beaf60e589 bert-turkish-text-classification model card (#5493) 2020-07-07 18:43:09 +08:00
Manuel Romero
e6eba8419c electra-small-finetuned-squadv1 model card (#5430)
* Create model card

Create model card for electra-small-discriminator finetuned on SQUAD v1.1

* Set right model path in code example
2020-07-07 18:41:42 +08:00
Vitalii Radchenko
43b7ad5df5 ukr-roberta-base model card (#5514) 2020-07-07 18:40:23 +08:00
Manuel Romero
87aa857d7e roberta-base-1B-1-finetuned-squadv1 model card (#5515) 2020-07-07 18:39:09 +08:00
Moseli Motsoehli
c7d96b60e4 zuBERTa model card (#5536)
* Create README

* Update README.md

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-07-07 18:38:15 +08:00
Manuel Romero
b95dfcf110 roberta-base-1B-1-finetuned-squadv2 model card (#5523) 2020-07-07 18:33:42 +08:00
Abel
6912265711 Make T5 compatible with ONNX (#5518)
* Default decoder inputs to encoder ones for T5 if neither are specified.

* Fixing typo, now all tests are passing.

* Changing einsum to operations supported by onnx

* Adding a test to ensure T5 can be exported to onnx op>9

* Modified test for onnx export to make it faster

* Styling changes.

* Styling changes.

* Changing notation for matrix multiplication

Co-authored-by: Abel Riboulot <tkai@protomail.com>
2020-07-07 11:32:29 +02:00
Patrick von Platen
989ae326b5 [Reformer] Adapt Reformer MaskedLM Attn mask (#5560)
* fix attention mask

* fix slow test

* refactor attn masks

* fix fp16 generate test
2020-07-07 10:48:06 +02:00
Shashank Gupta
3dcb748e31 Added data collator for permutation (XLNet) language modeling and related calls (#5522)
* Added data collator for XLNet language modeling and related calls

Added DataCollatorForXLNetLanguageModeling in data/data_collator.py
to generate necessary inputs for language modeling training with
XLNetLMHeadModel. Also added related arguments, logic and calls in
examples/language-modeling/run_language_modeling.py.

Resolves: #4739, #2008 (partially)

* Changed name to `DataCollatorForPermutationLanguageModeling`

Changed the name of `DataCollatorForXLNetLanguageModeling` to the more general `DataCollatorForPermutationLanguageModelling`.
Removed the `--mlm` flag requirement for the new collator and defined a separate `--plm_probability` flag for its use.
CTRL uses a CLM loss just like GPT and GPT-2, so should work out of the box with this script (provided `past` is taken care of
similar to `mems` for XLNet).
Changed calls and imports appropriately.

* Added detailed comments, changed variable names

Added more detailed comments to `DataCollatorForPermutationLanguageModeling` in `data/data_collator.py` to explain working. Also cleaned up variable names and made them more informative.

* Added tests for new data collator

Added tests in `tests/test_trainer.py` for DataCollatorForPermutationLanguageModeling based on those in DataCollatorForLanguageModeling. A specific test has been added to check for odd-length sequences.

* Fixed styling issues
2020-07-07 10:17:37 +02:00
Lysandre
1d2332861f Post v3.0.2 release commit 2020-07-06 18:56:47 -04:00
Lysandre
b0892fa0e8 Release: v3.0.2
Some checks failed
GitHub-hosted runner / check_code_quality (push) Has been cancelled
2020-07-06 18:49:44 -04:00
Sylvain Gugger
f1e2e423ab Fix fast tokenizers too (#5562) 2020-07-06 18:45:01 -04:00
Anthony MOI
5787e4c159 Various tokenizers fixes (#5558)
* BertTokenizerFast - Do not specify strip_accents by default

* Bump tokenizers to new version

* Add test for AddedToken serialization
2020-07-06 18:27:53 -04:00
Sylvain Gugger
21f28c34b7 Fix #5507 (#5559)
* Fix #5507

* Fix formatting
2020-07-06 17:26:48 -04:00
Lysandre Debut
9d9b872b66 The add_space_before_punct_symbol is only for TransfoXL (#5549) 2020-07-06 12:17:05 -04:00
Lysandre Debut
d6b0b9d451 GPT2 tokenizer should not output token type IDs (#5546)
* GPT2 tokenizer should not output token type IDs

* Same for OpenAIGPT
2020-07-06 11:33:57 -04:00
Sylvain Gugger
7833b21a5a Fix #5544 (#5551) 2020-07-06 11:22:24 -04:00
Thomas Wolf
c473484087 Fix the tokenization warning noted in #5505 (#5550)
* fix warning

* style and quality
2020-07-06 11:15:25 -04:00
Lysandre
1bbc28bee7 Imports organization 2020-07-06 10:27:10 -04:00
Mohamed Taher Alrefaie
1bc13697b1 Update convert_pytorch_checkpoint_to_tf2.py (#5531)
fixed ImportError: cannot import name 'hf_bucket_url'
2020-07-06 09:55:10 -04:00
Arnav Sharma
b2309cc6bf Typo fix in training doc (#5495) 2020-07-06 09:15:22 -04:00
ELanning
7ecff0ccbb Fix typo in training (#5510) 2020-07-06 09:14:57 -04:00
Sam Shleifer
58cca47c16 [cleanup] TF T5 tests only init t5-base once. (#5410) 2020-07-03 14:27:49 -04:00
Patrick von Platen
991172922f better error message (#5497) 2020-07-03 19:25:25 +02:00
Thomas Wolf
b58a15a31e unpining specific git versions in setup.py 2020-07-03 17:38:39 +02:00
998 changed files with 103201 additions and 25099 deletions

View File

@@ -1,4 +1,66 @@
version: 2
version: 2.1
orbs:
gcp-gke: circleci/gcp-gke@1.0.4
go: circleci/go@1.3.0
# TPU REFERENCES
references:
checkout_ml_testing: &checkout_ml_testing
run:
name: Checkout ml-testing-accelerators
command: |
git clone https://github.com/GoogleCloudPlatform/ml-testing-accelerators.git
cd ml-testing-accelerators
git fetch origin 5e88ac24f631c27045e62f0e8d5dfcf34e425e25:stable
git checkout stable
build_push_docker: &build_push_docker
run:
name: Configure Docker
command: |
gcloud --quiet auth configure-docker
cd docker/transformers-pytorch-tpu
if [ -z "$CIRCLE_PR_NUMBER" ]; then docker build --tag "$GCR_IMAGE_PATH:$CIRCLE_WORKFLOW_JOB_ID" -f Dockerfile --build-arg "TEST_IMAGE=1" . ; else docker build --tag "$GCR_IMAGE_PATH:$CIRCLE_WORKFLOW_JOB_ID" -f Dockerfile --build-arg "TEST_IMAGE=1" --build-arg "GITHUB_REF=pull/$CIRCLE_PR_NUMBER/head" . ; fi
docker push "$GCR_IMAGE_PATH:$CIRCLE_WORKFLOW_JOB_ID"
deploy_cluster: &deploy_cluster
run:
name: Deploy the job on the kubernetes cluster
command: |
go get github.com/google/go-jsonnet/cmd/jsonnet && \
export PATH=$PATH:$HOME/go/bin && \
kubectl create -f docker/transformers-pytorch-tpu/dataset.yaml || true && \
job_name=$(jsonnet -J ml-testing-accelerators/ docker/transformers-pytorch-tpu/bert-base-cased.jsonnet --ext-str image=$GCR_IMAGE_PATH --ext-str image-tag=$CIRCLE_WORKFLOW_JOB_ID | kubectl create -f -) && \
job_name=${job_name#job.batch/} && \
job_name=${job_name% created} && \
echo "Waiting on kubernetes job: $job_name" && \
i=0 && \
# 30 checks spaced 30s apart = 900s total.
max_checks=30 && \
status_code=2 && \
# Check on the job periodically. Set the status code depending on what
# happened to the job in Kubernetes. If we try max_checks times and
# still the job hasn't finished, give up and return the starting
# non-zero status code.
while [ $i -lt $max_checks ]; do ((i++)); if kubectl get jobs $job_name -o jsonpath='Failed:{.status.failed}' | grep "Failed:1"; then status_code=1 && break; elif kubectl get jobs $job_name -o jsonpath='Succeeded:{.status.succeeded}' | grep "Succeeded:1" ; then status_code=0 && break; else echo "Job not finished yet"; fi; sleep 30; done && \
echo "Done waiting. Job status code: $status_code" && \
pod_name=$(kubectl get po -l controller-uid=`kubectl get job $job_name -o "jsonpath={.metadata.labels.controller-uid}"` | awk 'match($0,!/NAME/) {print $1}') && \
echo "GKE pod name: $pod_name" && \
kubectl logs -f $pod_name --container=train
echo "Done with log retrieval attempt." && \
gcloud container images delete "$GCR_IMAGE_PATH:$CIRCLE_WORKFLOW_JOB_ID" --force-delete-tags && \
exit $status_code
delete_gke_jobs: &delete_gke_jobs
run:
name: Delete GKE Jobs
command: |
# Match jobs whose age matches patterns like '1h' or '1d', i.e. any job
# that has been around longer than 1hr. First print all columns for
# matches, then execute the delete.
kubectl get job | awk 'match($4,/[0-9]+[dh]/) {print $0}'
kubectl delete job $(kubectl get job | awk 'match($4,/[0-9]+[dh]/) {print $1}')
jobs:
run_tests_torch_and_tf:
working_directory: ~/transformers
@@ -10,9 +72,19 @@ jobs:
parallelism: 1
steps:
- checkout
- run: sudo pip install .[sklearn,tf-cpu,torch,testing]
- run: sudo pip install codecov pytest-cov
- run: python -m pytest -n 8 --dist=loadfile -s ./tests/ --cov | tee output.txt
- restore_cache:
keys:
- v0.3-torch_and_tf-{{ checksum "setup.py" }}
- v0.3-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install git+https://github.com/huggingface/datasets
- run: pip install .[sklearn,tf-cpu,torch,testing]
- run: pip install codecov pytest-cov
- save_cache:
key: v0.3-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -n 8 --dist=loadfile -rA -s ./tests/ --cov | tee output.txt
- run: codecov
- store_artifacts:
path: ~/transformers/output.txt
@@ -27,12 +99,21 @@ jobs:
parallelism: 1
steps:
- checkout
- run: sudo pip install .[sklearn,torch,testing]
- run: python -m pytest -n 8 --dist=loadfile -s ./tests/ | tee output.txt
- restore_cache:
keys:
- v0.3-torch-{{ checksum "setup.py" }}
- v0.3-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install git+https://github.com/huggingface/datasets
- run: pip install .[sklearn,torch,testing]
- save_cache:
key: v0.3-torch-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -n 8 --dist=loadfile -rA -s ./tests/ | tee output.txt
- store_artifacts:
path: ~/transformers/output.txt
destination: test_output.txt
run_tests_tf:
working_directory: ~/transformers
docker:
@@ -43,8 +124,18 @@ jobs:
parallelism: 1
steps:
- checkout
- run: sudo pip install .[sklearn,tf-cpu,testing]
- run: python -m pytest -n 8 --dist=loadfile -s ./tests/ | tee output.txt
- restore_cache:
keys:
- v0.3-tf-{{ checksum "setup.py" }}
- v0.3-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install git+https://github.com/huggingface/datasets
- run: pip install .[sklearn,tf-cpu,testing]
- save_cache:
key: v0.3-tf-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -n 8 --dist=loadfile -rA -s ./tests/ | tee output.txt
- store_artifacts:
path: ~/transformers/output.txt
destination: test_output.txt
@@ -56,7 +147,17 @@ jobs:
RUN_CUSTOM_TOKENIZERS: yes
steps:
- checkout
- run: sudo pip install .[mecab,testing]
- restore_cache:
keys:
- v0.3-custom_tokenizers-{{ checksum "setup.py" }}
- v0.3-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install .[ja,testing]
- run: python -m unidic download
- save_cache:
key: v0.3-custom_tokenizers-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -s ./tests/test_tokenization_bert_japanese.py | tee output.txt
- store_artifacts:
path: ~/transformers/output.txt
@@ -71,9 +172,18 @@ jobs:
parallelism: 1
steps:
- checkout
- run: sudo pip install .[sklearn,torch,testing]
- run: sudo pip install -r examples/requirements.txt
- run: python -m pytest -n 8 --dist=loadfile -s ./examples/ | tee output.txt
- restore_cache:
keys:
- v0.3-torch_examples-{{ checksum "setup.py" }}
- v0.3-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install .[sklearn,torch,testing]
- run: pip install -r examples/requirements.txt
- save_cache:
key: v0.3-torch_examples-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -n 8 --dist=loadfile -rA -s ./examples/ | tee output.txt
- store_artifacts:
path: ~/transformers/output.txt
destination: test_output.txt
@@ -83,7 +193,16 @@ jobs:
- image: circleci/python:3.6
steps:
- checkout
- run: sudo pip install .[tf,torch,docs]
- restore_cache:
keys:
- v0.3-build_doc-{{ checksum "setup.py" }}
- v0.3-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install .[tf,torch,docs]
- save_cache:
key: v0.3-build_doc-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: cd docs && make html SPHINXOPTS="-W"
- store_artifacts:
path: ./docs/_build
@@ -96,7 +215,15 @@ jobs:
fingerprints:
- "5b:7a:95:18:07:8c:aa:76:4c:60:35:88:ad:60:56:71"
- checkout
- run: sudo pip install .[tf,torch,docs]
- restore_cache:
keys:
- v0.3-deploy_doc-{{ checksum "setup.py" }}
- v0.3-{{ checksum "setup.py" }}
- run: pip install .[tf,torch,docs]
- save_cache:
key: v0.3-deploy_doc-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: ./.circleci/deploy.sh
check_code_quality:
working_directory: ~/transformers
@@ -106,12 +233,22 @@ jobs:
parallelism: 1
steps:
- checkout
# we need a version of isort with https://github.com/timothycrosley/isort/pull/1000
- run: sudo pip install git+git://github.com/timothycrosley/isort.git@e63ae06ec7d70b06df9e528357650281a3d3ec22#egg=isort
- run: sudo pip install .[tf,torch,quality]
- run: black --check --line-length 119 --target-version py35 examples templates tests src utils
- run: isort --check-only --recursive examples templates tests src utils
- restore_cache:
keys:
- v0.3-code_quality-{{ checksum "setup.py" }}
- v0.3-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install isort
- run: pip install .[tf,torch,quality]
- save_cache:
key: v0.3-code_quality-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: black --check examples templates tests src utils
- run: isort --check-only examples templates tests src utils
- run: flake8 examples templates tests src utils
- run: python utils/check_copies.py
- run: python utils/check_repo.py
check_repository_consistency:
working_directory: ~/transformers
docker:
@@ -120,8 +257,37 @@ jobs:
parallelism: 1
steps:
- checkout
- run: sudo pip install requests
- run: pip install requests
- run: python ./utils/link_tester.py
# TPU JOBS
run_examples_tpu:
docker:
- image: circleci/python:3.6
environment:
OMP_NUM_THREADS: 1
resource_class: xlarge
parallelism: 1
steps:
- checkout
- go/install
- *checkout_ml_testing
- gcp-gke/install
- gcp-gke/update-kubeconfig-with-credentials:
cluster: $GKE_CLUSTER
perform-login: true
- setup_remote_docker
- *build_push_docker
- *deploy_cluster
cleanup-gke-jobs:
docker:
- image: circleci/python:3.6
steps:
- gcp-gke/install
- gcp-gke/update-kubeconfig-with-credentials:
cluster: $GKE_CLUSTER
perform-login: true
- *delete_gke_jobs
workflow_filters: &workflow_filters
filters:
branches:
@@ -140,3 +306,15 @@ workflows:
- run_tests_tf
- build_doc
- deploy_doc: *workflow_filters
tpu_testing_jobs:
triggers:
- schedule:
# Set to run at the first minute of every hour.
cron: "0 8 * * *"
filters:
branches:
only:
- master
jobs:
- cleanup-gke-jobs
- run_examples_tpu

View File

@@ -47,4 +47,7 @@ deploy_doc "e7cfc1a" v2.9.0
deploy_doc "7cb203f" v2.9.1
deploy_doc "10d7239" v2.10.0
deploy_doc "b42586e" v2.11.0
deploy_doc "b62ca59" #v3.0.0 Latest stable release
deploy_doc "7fb8bdf" v3.0.2
deploy_doc "4b3ee9c" v3.1.0
deploy_doc "3ebb1b3" v3.2.0
deploy_doc "0613f05" # v3.3.0 Latest stable release

View File

@@ -7,14 +7,53 @@ assignees: ''
---
# 🐛 Bug
## Environment info
<!-- You can run the command `transformers-cli env` and copy-and-paste its output below.
Don't forget to fill out the missing fields in that output! -->
- `transformers` version:
- Platform:
- Python version:
- PyTorch version (GPU?):
- Tensorflow version (GPU?):
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
### Who can help
<!-- Your issue will be replied to more quickly if you can figure out the right person to tag with @
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
Please tag fewer than 3 people.
albert, bert, GPT2, XLM: @LysandreJik
tokenizers: @mfuntowicz
Trainer: @sgugger
Speed and Memory Benchmarks: @patrickvonplaten
Model Cards: @julien-c
Translation: @sshleifer
Summarization: @sshleifer
TextGeneration: @TevenLeScao
examples/distillation: @VictorSanh
nlp datasets: [different repo](https://github.com/huggingface/nlp)
rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Text Generation: @TevenLeScao
blenderbot: @mariamabarham
Bart: @sshleifer
Marian: @sshleifer
T5: @patrickvonplaten
Longformer/Reformer: @patrickvonplaten
TransfoXL/XLNet: @TevenLeScao
examples/seq2seq: @sshleifer
examples/bert-loses-patience: @JetRunner
tensorflow: @jplu
examples/token-classification: @stefan-it
documentation: @sgugger
-->
## Information
Model I am using (Bert, XLNet ...):
Language I am using the model on (English, Chinese ...):
The problem arises when using:
* [ ] the official example scripts: (give details below)
* [ ] my own modified scripts: (give details below)
@@ -38,15 +77,3 @@ Steps to reproduce the behavior:
## Expected behavior
<!-- A clear and concise description of what you would expect to happen. -->
## Environment info
<!-- You can run the command `transformers-cli env` and copy-and-paste its output below.
Don't forget to fill out the missing fields in that output! -->
- `transformers` version:
- Platform:
- Python version:
- PyTorch version (GPU?):
- Tensorflow version (GPU?):
- Using GPU in script?:
- Using distributed or parallel set-up in script?:

View File

@@ -1,6 +1,6 @@
---
name: "❓ Questions & Help"
about: Post your general questions on Stack Overflow tagged huggingface-transformers
about: Post your general questions on the Hugging Face forum or Stack Overflow tagged huggingface-transformers
title: ''
labels: ''
assignees: ''
@@ -11,19 +11,17 @@ assignees: ''
<!-- The GitHub issue tracker is primarly intended for bugs, feature requests,
new models and benchmarks, and migration questions. For all other questions,
we direct you to Stack Overflow (SO) where a whole community of PyTorch and
Tensorflow enthusiast can help you out. Make sure to tag your question with the
right deep learning framework as well as the huggingface-transformers tag:
we direct you to the Hugging Face forum: https://discuss.huggingface.co/ .
You can also try Stack Overflow (SO) where a whole community of PyTorch and
Tensorflow enthusiast can help you out. In this case, make sure to tag your
question with the right deep learning framework as well as the
huggingface-transformers tag:
https://stackoverflow.com/questions/tagged/huggingface-transformers
If your question wasn't answered after a period of time on Stack Overflow, you
can always open a question on GitHub. You should then link to the SO question
that you posted.
-->
## Details
<!-- Description of your issue -->
<!-- You should first ask your question on SO, and only if
<!-- You should first ask your question on the forum or SO, and only if
you didn't get an answer ask it here on GitHub. -->
**A link to original question on Stack Overflow**:
**A link to original question on the forum/Stack Overflow**:

61
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

@@ -0,0 +1,61 @@
# What does this PR do?
<!--
Congratulations! You've made it this far! You're not quite done yet though.
Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution.
Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change.
Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost.
-->
<!-- Remove if not applicable -->
Fixes # (issue)
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dimiss the other checks if that's the case).
- [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md#start-contributing-pull-requests),
Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link
to the it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the
[documentation guidelines](https://github.com/huggingface/transformers/tree/master/docs), and
[here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/master/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?
## Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.
<!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
Please tag fewer than 3 people.
albert, bert, GPT2, XLM: @LysandreJik
tokenizers: @mfuntowicz
Trainer: @sgugger
Speed and Memory Benchmarks: @patrickvonplaten
Model Cards: @julien-c
Translation: @sshleifer
Summarization: @sshleifer
TextGeneration: @TevenLeScao
examples/distillation: @VictorSanh
nlp datasets: [different repo](https://github.com/huggingface/nlp)
rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Text Generation: @TevenLeScao
Blenderbot, Bart, Marian, Pegasus: @sshleifer
T5: @patrickvonplaten
Longformer/Reformer: @patrickvonplaten
TransfoXL/XLNet: @TevenLeScao
examples/seq2seq: @sshleifer
examples/bert-loses-patience: @JetRunner
tensorflow: @jplu
examples/token-classification: @stefan-it
documentation: @sgugger
-->

View File

@@ -1,19 +0,0 @@
name: GitHub-hosted runner
on: push
jobs:
check_code_quality:
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v1
with:
python-version: 3.7
# - name: Install dependencies
# run: |
# pip install .[tf,torch,quality]

View File

@@ -18,8 +18,17 @@ jobs:
uses: actions/setup-python@v1
with:
python-version: 3.7
- name: Loading cache
uses: actions/cache@v2
id: cache
with:
path: ~/.cache/pip
key: v0-torch_hub-${{ hashFiles('setup.py') }}
- name: Install dependencies
run: |
pip install --upgrade pip
pip install torch
pip install numpy tokenizers filelock requests tqdm regex sentencepiece sacremoses packaging

View File

@@ -25,6 +25,14 @@ jobs:
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v0-tests_tf_torch_gpu-${{ hashFiles('setup.py') }}
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
run: |
python -m venv .env
@@ -35,8 +43,10 @@ jobs:
- name: Install dependencies
run: |
source .env/bin/activate
pip install torch
pip install .[sklearn,testing]
pip install --upgrade pip
pip install torch!=1.6.0
pip install .[sklearn,testing,onnxruntime]
pip install git+https://github.com/huggingface/datasets
- name: Are GPUs recognized by our DL frameworks
run: |
@@ -51,11 +61,4 @@ jobs:
USE_CUDA: yes
run: |
source .env/bin/activate
python -m pytest -n 2 --dist=loadfile -s ./tests/ | tee output.txt
- name: cat output.txt
run: cat output.txt
- name: Upload output.txt
uses: actions/upload-artifact@v1
with:
name: pytest_output
path: output.txt
python -m pytest -n 2 --dist=loadfile -s ./tests/

View File

@@ -13,6 +13,14 @@ jobs:
runs-on: self-hosted
steps:
- uses: actions/checkout@v2
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v0-slow_tests_tf_torch_gpu-${{ hashFiles('setup.py') }}
- name: Python version
run: |
which python
@@ -22,6 +30,7 @@ jobs:
run: pwd
- run: nvidia-smi
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
if: steps.cache.outputs.cache-hit != 'true'
run: |
python -m venv .env
source .env/bin/activate
@@ -31,7 +40,10 @@ jobs:
- name: Install dependencies
run: |
source .env/bin/activate
pip install .[sklearn,torch,testing]
pip install --upgrade pip
pip install torch!=1.6.0
pip install .[sklearn,testing,onnxruntime]
pip install git+https://github.com/huggingface/datasets
- name: Are GPUs recognized by our DL frameworks
run: |
@@ -46,11 +58,15 @@ jobs:
USE_CUDA: yes
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s ./tests/ | tee output.txt
- name: cat output.txt
run: cat output.txt
- name: Upload output.txt
uses: actions/upload-artifact@v1
with:
name: pytest_output
path: output.txt
python -m pytest -n 1 --dist=loadfile -s ./tests/
- name: Run examples tests on GPU
env:
TF_FORCE_GPU_ALLOW_GROWTH: "true"
OMP_NUM_THREADS: 1
RUN_SLOW: yes
USE_CUDA: yes
run: |
source .env/bin/activate
pip install -r examples/requirements.txt
python -m pytest -n 1 --dist=loadfile -s examples

2
.gitignore vendored
View File

@@ -11,6 +11,7 @@ __pycache__/
# tests and logs
tests/fixtures
logs/
lightning_logs/
# Distribution / packaging
.Python
@@ -139,6 +140,7 @@ runs
/wandb
/examples/runs
/examples/**/*.args
/examples/rag/sweep
# data
/data

129
CODE_OF_CONDUCT.md Normal file
View File

@@ -0,0 +1,129 @@
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, religion, or sexual identity
and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
overall community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or
advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
feedback@huggingface.co.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series
of actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within
the community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
Community Impact Guidelines were inspired by [Mozilla's code of conduct
enforcement ladder](https://github.com/mozilla/diversity).
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see the FAQ at
https://www.contributor-covenant.org/faq. Translations are available at
https://www.contributor-covenant.org/translations.

View File

@@ -9,6 +9,9 @@ It also helps us if you spread the word: reference the library from blog posts
on the awesome projects it made possible, shout out on Twitter every time it has
helped you, or simply star the repo to say "thank you".
Whichever way you choose to contribute, please be mindful to respect our
[code of conduct](https://github.com/huggingface/transformers/blob/master/CODE_OF_CONDUCT.md).
## You can contribute in so many ways!
There are 4 ways you can contribute to transformers:
@@ -65,8 +68,8 @@ Awesome! Please provide the following information:
If you are willing to contribute the model yourself, let us know so we can best
guide you.
We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them
in the [`templates`](https://github.com/huggingface/transformers/templates) folder.
We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them
in the [`templates`](https://github.com/huggingface/transformers/tree/master/templates) folder.
### Do you want a new feature (that is not a model)?
@@ -87,8 +90,8 @@ A world-class feature request addresses the following points:
If your issue is well written we're already 80% of the way there by the time you
post it.
We have added **templates** to guide you in the process of adding a new example script for training or testing the
models in the library. You can find them in the [`templates`](https://github.com/huggingface/transformers/templates)
We have added **templates** to guide you in the process of adding a new example script for training or testing the
models in the library. You can find them in the [`templates`](https://github.com/huggingface/transformers/tree/master/templates)
folder.
## Start contributing! (Pull Requests)
@@ -134,12 +137,18 @@ Follow these steps to start contributing:
it with `pip uninstall transformers` before reinstalling it in editable
mode with the `-e` flag.)
Right now, we need an unreleased version of `isort` to avoid a
[bug](https://github.com/timothycrosley/isort/pull/1000):
To run the full test suite, you might need the additional dependency on `datasets` which requires a separate source
install:
```bash
$ pip install -U git+git://github.com/timothycrosley/isort.git@e63ae06ec7d70b06df9e528357650281a3d3ec22#egg=isort
$ git clone https://github.com/huggingface/datasets
$ cd datasets
$ pip install -e .
```
If you have already cloned that repo, you might need to `git pull` to get the most recent changes in the `datasets`
library.
5. Develop the features on your branch.
As you work on the features, you should make sure that the test suite
@@ -149,6 +158,14 @@ Follow these steps to start contributing:
$ make test
```
Note, that this command uses `-n auto` pytest flag, therefore, it will start as many parallel `pytest` processes as the number of your computer's CPU-cores, and if you have lots of those and a few GPUs and not a great amount of RAM, it's likely to overload your computer. Therefore, to run the test suite, you may want to consider using this command instead:
```bash
$ python -m pytest -n 3 --dist=loadfile -s -v ./tests/
```
Adjust the value of `-n` to fit the load your hardware can support.
`transformers` relies on `black` and `isort` to format its source code
consistently. After you make changes, format them with:
@@ -156,12 +173,29 @@ Follow these steps to start contributing:
$ make style
```
`transformers` also uses `flake8` to check for coding mistakes. Quality
`transformers` also uses `flake8` and a few custom scripts to check for coding mistakes. Quality
control runs in CI, however you can also run the same checks with:
```bash
$ make quality
```
You can do the automatic style corrections and code verifications that can't be automated in one go:
```bash
$ make fixup
```
This target is also optimized to only work with files modified by the PR you're working on.
If you're modifying documents under `docs/source`, make sure to validate that
they can still be built. This check also runs in CI. To run a local check
make sure you have installed the documentation builder requirements, by
running `pip install .[tf,torch,docs]` once from the root of this repository
and then run:
```bash
$ make docs
```
Once you're happy with your changes, add changed files using `git add` and
make a commit with `git commit` to record your changes locally:
@@ -208,21 +242,21 @@ Follow these steps to start contributing:
are useful to avoid duplicated work, and to differentiate it from PRs ready
to be merged;
4. Make sure existing tests pass;
5. Add high-coverage tests. No quality testing = no merge.
- If you are adding a new model, make sure that you use
5. Add high-coverage tests. No quality testing = no merge.
- If you are adding a new model, make sure that you use
`ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)`, which triggers the common tests.
- If you are adding new `@slow` tests, make sure they pass using
`RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`.
- If you are adding a new tokenizer, write tests, and make sure
- If you are adding new `@slow` tests, make sure they pass using
`RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`.
- If you are adding a new tokenizer, write tests, and make sure
`RUN_SLOW=1 python -m pytest tests/test_tokenization_{your_model_name}.py` passes.
CircleCI does not run the slow tests.
6. All public methods must have informative docstrings that work nicely with sphinx. See `modeling_ctrl.py` for an
CircleCI does not run the slow tests, but github actions does every night!
6. All public methods must have informative docstrings that work nicely with sphinx. See `modeling_ctrl.py` for an
example.
### Tests
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in
the [tests folder](https://github.com/huggingface/transformers/tree/master/tests) and examples tests in the
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in
the [tests folder](https://github.com/huggingface/transformers/tree/master/tests) and examples tests in the
[examples folder](https://github.com/huggingface/transformers/tree/master/examples).
We like `pytest` and `pytest-xdist` because it's faster. From the root of the
@@ -238,8 +272,7 @@ and for the examples:
$ pip install -r examples/requirements.txt # only needed the first time
$ python -m pytest -n auto --dist=loadfile -s -v ./examples/
```
In fact, that's how `make test` and `make test-examples` are implemented!
In fact, that's how `make test` and `make test-examples` are implemented (sans the `pip install` line)!
You can specify a smaller set of tests in order to test only the feature
you're working on.

View File

@@ -1,17 +1,51 @@
.PHONY: quality style test test-examples
.PHONY: modified_only_fixup extra_quality_checks quality style fixup fix-copies test test-examples docs
check_dirs := examples templates tests src utils
# get modified files since the branch was made
fork_point_sha := $(shell git merge-base --fork-point master)
joined_dirs := $(shell echo $(check_dirs) | tr " " "|")
modified_files := $(shell git diff --name-only $(fork_point_sha) | egrep '^($(joined_dirs))')
#$(info modified files are: $(modified_files))
modified_only_fixup:
@if [ -n "$(modified_files)" ]; then \
echo "Checking/fixing $(modified_files)"; \
black $(modified_files); \
isort $(modified_files); \
flake8 $(modified_files); \
else \
echo "No relevant files were modified"; \
fi
# Check that source code meets quality standards
quality:
black --check --line-length 119 --target-version py35 examples templates tests src utils
isort --check-only --recursive examples templates tests src utils
flake8 examples templates tests src utils
extra_quality_checks:
python utils/check_copies.py
python utils/check_repo.py
# Format source code automatically
# this target runs checks on all files
quality:
black --check $(check_dirs)
isort --check-only $(check_dirs)
flake8 $(check_dirs)
${MAKE} extra_quality_checks
# Format source code automatically and check is there are any problems left that need manual fixing
style:
black --line-length 119 --target-version py35 examples templates tests src utils
isort --recursive examples templates tests src utils
black $(check_dirs)
isort $(check_dirs)
# Super fast fix and check target that only works on relevant modified files since the branch was made
fixup: modified_only_fixup extra_quality_checks
# Make marked copies of snippets of codes conform to the original
fix-copies:
python utils/check_copies.py --fix_and_overwrite
# Run tests for the library
@@ -22,3 +56,8 @@ test:
test-examples:
python -m pytest -n auto --dist=loadfile -s -v ./examples/
# Check that docs can build
docs:
cd docs && make html SPHINXOPTS="-W"

751
README.md
View File

@@ -16,65 +16,134 @@
<a href="https://github.com/huggingface/transformers/releases">
<img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/transformers.svg">
</a>
<a href="https://github.com/huggingface/transformers/blob/master/CODE_OF_CONDUCT.md">
<img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg">
</a>
</p>
<h3 align="center">
<p>State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0
</h3>
🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides state-of-the-art general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet, T5, CTRL...) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over thousands of pretrained models in 100+ languages and deep interoperability between PyTorch & TensorFlow 2.0.
🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation, etc in 100+ languages. Its aim is to make cutting-edge NLP easier to use for everyone.
🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets then share them with the community on our [model hub](https://huggingface.co/models). At the same time, each python module defining an architecture can be used as a standalone and modified to enable quick research experiments.
🤗 Transformers is backed by the two most popular deep learning libraries, [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/), with a seamless integration between them, allowing you to train your models with one then load it for inference with the other.
### Recent contributors
[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/0)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/0)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/1)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/1)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/2)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/2)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/3)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/3)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/4)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/4)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/5)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/5)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/6)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/6)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/7)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/7)
### Features
- High performance on NLU and NLG tasks
- Low barrier to entry for educators and practitioners
## Online demos
State-of-the-art NLP for everyone
- Deep learning researchers
- Hands-on practitioners
- AI/ML/NLP teachers and educators
You can test most of our models directly on their pages from the [model hub](https://huggingface.co/models). We also offer an [inference API](https://huggingface.co/pricing) to use those models.
Lower compute costs, smaller carbon footprint
- Researchers can share trained models instead of always retraining
- Practitioners can reduce compute time and production costs
- Dozens of architectures with over 1,000 pretrained models, some in more than 100 languages
Here are a few examples:
- [Masked word completion with BERT](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France)
- [Name Entity Recognition with Electra](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city)
- [Text generation with GPT-2](https://huggingface.co/gpt2?text=A+long+time+ago%2C+)
- [Natural Langugage Inference with RoBERTa](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal)
- [Summarization with BART](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct)
- [Question answering with DistilBERT](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species)
- [Translation with T5](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin)
Choose the right framework for every part of a model's lifetime
- Train state-of-the-art models in 3 lines of code
- Deep interoperability between TensorFlow 2.0 and PyTorch models
- Move a single model between TF2.0/PyTorch frameworks at will
- Seamlessly pick the right framework for training, evaluation, production
**[Write With Transformer](https://transformer.huggingface.co)**, built by the Hugging Face team, is the official demo of this repos text generation capabilities.
## Quick tour
| Section | Description |
|-|-|
| [Installation](#installation) | How to install the package |
| [Model architectures](#model-architectures) | Architectures (with pretrained weights) |
| [Online demo](#online-demo) | Experimenting with this repos text generation capabilities |
| [Quick tour: Usage](#quick-tour) | Tokenizers & models usage: Bert and GPT-2 |
| [Quick tour: TF 2.0 and PyTorch ](#Quick-tour-TF-20-training-and-PyTorch-interoperability) | Train a TF 2.0 model in 10 lines of code, load it in PyTorch |
| [Quick tour: pipelines](#quick-tour-of-pipelines) | Using Pipelines: Wrapper around tokenizer and models to use finetuned models |
| [Quick tour: Fine-tuning/usage scripts](#quick-tour-of-the-fine-tuningusage-scripts) | Using provided scripts: GLUE, SQuAD and Text generation |
| [Quick tour: Share your models ](#Quick-tour-of-model-sharing) | Upload and share your fine-tuned models with the community |
| [Migrating from pytorch-transformers to transformers](#Migrating-from-pytorch-transformers-to-transformers) | Migrating your code from pytorch-transformers to transformers |
| [Migrating from pytorch-pretrained-bert to pytorch-transformers](#Migrating-from-pytorch-pretrained-bert-to-transformers) | Migrating your code from pytorch-pretrained-bert to transformers |
| [Documentation](https://huggingface.co/transformers/) | Full API documentation and more |
To immediately use a model on a given text, we provide the `pipeline` API. Pipelines group together a pretrained model with the preprocessing that was used during that model training. Here is how to quickly use a pipeline to classify positive versus negative texts
```python
>>> from transformers import pipeline
# Allocate a pipeline for sentiment-analysis
>>> classifier = pipeline('sentiment-analysis')
>>> classifier('We are very happy to include pipeline into the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9978193640708923}]
```
The second line of code downloads and caches the pretrained model used by the pipeline, the third line evaluates it on the given text. Here the answer is "positive" with a confidence of 99.8%.
This is another example of pipeline used for that can extract question answers from some context:
``` python
>>> from transformers import pipeline
# Allocate a pipeline for question-answering
>>> question_answerer = pipeline('question-answering')
>>> question_answerer({
... 'question': 'What is the name of the repository ?',
... 'context': 'Pipeline have been included in the huggingface/transformers repository'
... })
{'score': 0.5135612454720828, 'start': 35, 'end': 59, 'answer': 'huggingface/transformers'}
```
On top of the answer, the pretrained model used here returned its confidence score, along with the start position and its end position in the tokenized sentence. You can learn more about the tasks supported by the `pipeline` API in [this tutorial](https://huggingface.co/transformers/task_summary.html).
To download and use any of the pretrained models on your given task, you just need to use those three lines of codes (PyTorch verison):
```python
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = AutoModel.from_pretrained("bert-base-uncased")
>>> inputs = tokenizer("Hello world!", return_tensors="pt")
>>> outputs = model(**inputs)
```
or for TensorFlow:
```python
>>> from transformers import AutoTokenizer, TFAutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = TFAutoModel.from_pretrained("bert-base-uncased")
>>> inputs = tokenizer("Hello world!", return_tensors="tf")
>>> outputs = model(**inputs)
```
The tokenizer is responsible for all the preprocessing the pretrained model expects, and can be called directly on one (or list) of texts (as we can see on the fourth line of both code examples). It will output a dictionary you can directly pass to your model (which is done on the fifth line).
The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) or a [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) (depending on your backend) which you can use normally. For instance, [this tutorial](https://huggingface.co/transformers/training.html) explains how to integrate such a model in classic PyTorch or TensorFlow training loop, or how to use our `Trainer` API to quickly fine-tune the on a new dataset.
## Why should I use transformers?
1. Easy-to-use state-of-the-art models:
- High performance on NLU and NLG tasks.
- Low barrier to entry for educators and practitioners.
- Few user-facing abastractions with just three classes to learn.
- A unified API for using all our pretrained models.
1. Lower compute costs, smaller carbon footprint:
- Researchers can share trained models instead of always retraining.
- Practitioners can reduce compute time and production costs.
- Dozens of architectures with over 2,000 pretrained models, some in more than 100 languages.
1. Choose the right framework for every part of a model's lifetime:
- Train state-of-the-art models in 3 lines of code.
- Move a single model between TF2.0/PyTorch frameworks at will.
- Seamlessly pick the right framework for training, evaluation, production.
1. Easily customize a model or an example to your needs:
- Examples for each architecture to reproduce the results by the official authors of said architecture.
- Expose the models internal as consistently as possible.
- Model files can be used independently of the library for quick experiments.
## Why shouldn't I use transformers?
- This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving in additional abstractions/files.
- The training API is not intended to work on any model but is optimized to work with the models provided by the library. For generic machine learning loops, you should use another library.
- While we strive to present as many use cases as possible, the scripts in our [examples folder](https://github.com/huggingface/transformers/tree/master/examples) are just that: examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs.
## Installation
This repo is tested on Python 3.6+, PyTorch 1.0.0+ (PyTorch 1.3.1+ for examples) and TensorFlow 2.0.
This repository is tested on Python 3.6+, PyTorch 1.0.0+ (PyTorch 1.3.1+ for [examples](https://github.com/huggingface/transformers/tree/master/examples)) and TensorFlow 2.0.
You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
Create a virtual environment with the version of Python you're going to use and activate it.
First, create a virtual environment with the version of Python you're going to use and activate it.
Now, if you want to use 🤗 Transformers, you can install it with pip. If you'd like to play with the examples, you must install it from source.
### With pip
First you need to install one of, or both, TensorFlow 2.0 and PyTorch.
Then, you will need to install one of, or both, TensorFlow 2.0 and PyTorch.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available) and/or [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform.
When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows:
@@ -83,68 +152,11 @@ When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be
pip install transformers
```
### From source
If you'd like to play with the examples, you must [install the library from source](https://huggingface.co/transformers/installation.html#installing-from-source).
Here also, you first need to install one of, or both, TensorFlow 2.0 and PyTorch.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available) and/or [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform.
## Models architectures
When TensorFlow 2.0 and/or PyTorch has been installed, you can install from source by cloning the repository and running:
```bash
git clone https://github.com/huggingface/transformers
cd transformers
pip install .
```
When you update the repository, you should upgrade the transformers installation and its dependencies as follows:
```bash
git pull
pip install --upgrade .
```
### Run the examples
Examples are included in the repository but are not shipped with the library.
Therefore, in order to run the latest versions of the examples, you need to install from source, as described above.
Look at the [README](https://github.com/huggingface/transformers/blob/master/examples/README.md) for how to run examples.
### Tests
A series of tests are included for the library and for some example scripts. Library tests can be found in the [tests folder](https://github.com/huggingface/transformers/tree/master/tests) and examples tests in the [examples folder](https://github.com/huggingface/transformers/tree/master/examples).
Depending on which framework is installed (TensorFlow 2.0 and/or PyTorch), the irrelevant tests will be skipped. Ensure that both frameworks are installed if you want to execute all tests.
Here's the easiest way to run tests for the library:
```bash
pip install -e ".[testing]"
make test
```
and for the examples:
```bash
pip install -e ".[testing]"
pip install -r examples/requirements.txt
make test-examples
```
For details, refer to the [contributing guide](https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md#tests).
### Do you want to run a Transformer model on a mobile device?
You should check out our [`swift-coreml-transformers`](https://github.com/huggingface/swift-coreml-transformers) repo.
It contains a set of tools to convert PyTorch or TensorFlow 2.0 trained Transformer models (currently contains `GPT-2`, `DistilGPT-2`, `BERT`, and `DistilBERT`) to CoreML models that run on iOS devices.
At some point in the future, you'll be able to seamlessly move from pre-training or fine-tuning models to productizing them in CoreML, or prototype a model or an app in CoreML then research its hyperparameters or architecture from TensorFlow 2.0 and/or PyTorch. Super exciting!
## Model architectures
🤗 Transformers currently provides the following NLU/NLG architectures:
🤗 Transformers currently provides the following architectures (see [here](https://huggingface.co/transformers/model_summary.html) for a high-level summary of each them):
1. **[BERT](https://huggingface.co/transformers/model_doc/bert.html)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
2. **[GPT](https://huggingface.co/transformers/model_doc/gpt.html)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
@@ -167,530 +179,39 @@ At some point in the future, you'll be able to seamlessly move from pre-training
19. **[Reformer](https://huggingface.co/transformers/model_doc/reformer.html)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
20. **[MarianMT](https://huggingface.co/transformers/model_doc/marian.html)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
21. **[Longformer](https://huggingface.co/transformers/model_doc/longformer.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
22. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
23. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Pearson R coefficient on STS-B for XLNet). You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
## Online demo
**[Write With Transformer](https://transformer.huggingface.co)**, built by the Hugging Face team at transformer.huggingface.co, is the official demo of this repos text generation capabilities.
You can use it to experiment with completions generated by `GPT2Model`, `TransfoXLModel`, and `XLNetModel`.
> “🦄 Write with transformer is to writing what calculators are to calculus.”
![write_with_transformer](https://transformer.huggingface.co/front/assets/thumbnail-large.png)
## Quick tour
Let's do a very quick overview of the model architectures in 🤗 Transformers. Detailed examples for each model architecture (Bert, GPT, GPT-2, Transformer-XL, XLNet and XLM) can be found in the [full documentation](https://huggingface.co/transformers/).
```python
import torch
from transformers import *
# Transformers has a unified API
# for 10 transformer architectures and 30 pretrained weights.
# Model | Tokenizer | Pretrained weights shortcut
MODELS = [(BertModel, BertTokenizer, 'bert-base-uncased'),
(OpenAIGPTModel, OpenAIGPTTokenizer, 'openai-gpt'),
(GPT2Model, GPT2Tokenizer, 'gpt2'),
(CTRLModel, CTRLTokenizer, 'ctrl'),
(TransfoXLModel, TransfoXLTokenizer, 'transfo-xl-wt103'),
(XLNetModel, XLNetTokenizer, 'xlnet-base-cased'),
(XLMModel, XLMTokenizer, 'xlm-mlm-enfr-1024'),
(DistilBertModel, DistilBertTokenizer, 'distilbert-base-cased'),
(RobertaModel, RobertaTokenizer, 'roberta-base'),
(XLMRobertaModel, XLMRobertaTokenizer, 'xlm-roberta-base'),
]
# To use TensorFlow 2.0 versions of the models, simply prefix the class names with 'TF', e.g. `TFRobertaModel` is the TF 2.0 counterpart of the PyTorch model `RobertaModel`
# Let's encode some text in a sequence of hidden-states using each model:
for model_class, tokenizer_class, pretrained_weights in MODELS:
# Load pretrained model/tokenizer
tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
model = model_class.from_pretrained(pretrained_weights)
# Encode text
input_ids = torch.tensor([tokenizer.encode("Here is some text to encode", add_special_tokens=True)]) # Add special tokens takes care of adding [CLS], [SEP], <s>... tokens in the right way for each model.
with torch.no_grad():
last_hidden_states = model(input_ids)[0] # Models outputs are now tuples
# Each architecture is provided with several class for fine-tuning on down-stream tasks, e.g.
BERT_MODEL_CLASSES = [BertModel, BertForPreTraining, BertForMaskedLM, BertForNextSentencePrediction,
BertForSequenceClassification, BertForTokenClassification, BertForQuestionAnswering]
# All the classes for an architecture can be initiated from pretrained weights for this architecture
# Note that additional weights added for fine-tuning are only initialized
# and need to be trained on the down-stream task
pretrained_weights = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(pretrained_weights)
for model_class in BERT_MODEL_CLASSES:
# Load pretrained model/tokenizer
model = model_class.from_pretrained(pretrained_weights)
# Models can return full list of hidden-states & attentions weights at each layer
model = model_class.from_pretrained(pretrained_weights,
output_hidden_states=True,
output_attentions=True)
input_ids = torch.tensor([tokenizer.encode("Let's see all hidden-states and attentions on this text")])
all_hidden_states, all_attentions = model(input_ids)[-2:]
# Models are compatible with Torchscript
model = model_class.from_pretrained(pretrained_weights, torchscript=True)
traced_model = torch.jit.trace(model, (input_ids,))
# Simple serialization for models and tokenizers
model.save_pretrained('./directory/to/save/') # save
model = model_class.from_pretrained('./directory/to/save/') # re-load
tokenizer.save_pretrained('./directory/to/save/') # save
tokenizer = BertTokenizer.from_pretrained('./directory/to/save/') # re-load
# SOTA examples for GLUE, SQUAD, text generation...
```
## Quick tour TF 2.0 training and PyTorch interoperability
Let's do a quick example of how a TensorFlow 2.0 model can be trained in 12 lines of code with 🤗 Transformers and then loaded in PyTorch for fast inspection/tests.
```python
import tensorflow as tf
import tensorflow_datasets
from transformers import *
# Load dataset, tokenizer, model from pretrained model/vocabulary
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
model = TFBertForSequenceClassification.from_pretrained('bert-base-cased')
data = tensorflow_datasets.load('glue/mrpc')
# Prepare dataset for GLUE as a tf.data.Dataset instance
train_dataset = glue_convert_examples_to_features(data['train'], tokenizer, max_length=128, task='mrpc')
valid_dataset = glue_convert_examples_to_features(data['validation'], tokenizer, max_length=128, task='mrpc')
train_dataset = train_dataset.shuffle(100).batch(32).repeat(2)
valid_dataset = valid_dataset.batch(64)
# Prepare training: Compile tf.keras model with optimizer, loss and learning rate schedule
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])
# Train and evaluate using tf.keras.Model.fit()
history = model.fit(train_dataset, epochs=2, steps_per_epoch=115,
validation_data=valid_dataset, validation_steps=7)
# Load the TensorFlow model in PyTorch for inspection
model.save_pretrained('./save/')
pytorch_model = BertForSequenceClassification.from_pretrained('./save/', from_tf=True)
# Quickly test a few predictions - MRPC is a paraphrasing task, let's see if our model learned the task
sentence_0 = "This research was consistent with his findings."
sentence_1 = "His findings were compatible with this research."
sentence_2 = "His findings were not compatible with this research."
inputs_1 = tokenizer(sentence_0, sentence_1, add_special_tokens=True, return_tensors='pt')
inputs_2 = tokenizer(sentence_0, sentence_2, add_special_tokens=True, return_tensors='pt')
pred_1 = pytorch_model(inputs_1['input_ids'], token_type_ids=inputs_1['token_type_ids'])[0].argmax().item()
pred_2 = pytorch_model(inputs_2['input_ids'], token_type_ids=inputs_2['token_type_ids'])[0].argmax().item()
print("sentence_1 is", "a paraphrase" if pred_1 else "not a paraphrase", "of sentence_0")
print("sentence_2 is", "a paraphrase" if pred_2 else "not a paraphrase", "of sentence_0")
```
## Quick tour of the fine-tuning/usage scripts
**Important**
Before running the fine-tuning scripts, please read the
[instructions](#run-the-examples) on how to
setup your environment to run the examples.
The library comprises several example scripts with SOTA performances for NLU and NLG tasks:
- `run_glue.py`: an example fine-tuning sequence classification models on nine different GLUE tasks (*sequence-level classification*)
- `run_squad.py`: an example fine-tuning question answering models on the question answering dataset SQuAD 2.0 (*token-level classification*)
- `run_ner.py`: an example fine-tuning token classification models on named entity recognition (*token-level classification*)
- `run_generation.py`: an example using GPT, GPT-2, CTRL, Transformer-XL and XLNet for conditional language generation
- other model-specific examples (see the documentation).
Here are three quick usage examples for these scripts:
### `run_glue.py`: Fine-tuning on GLUE tasks for sequence classification
The [General Language Understanding Evaluation (GLUE) benchmark](https://gluebenchmark.com/) is a collection of nine sentence- or sentence-pair language understanding tasks for evaluating and analyzing natural language understanding systems.
Before running any of these GLUE tasks you should download the
[GLUE data](https://gluebenchmark.com/tasks) by running
[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
and unpack it to some directory `$GLUE_DIR`.
You should also install the additional packages required by the examples:
```shell
pip install -r ./examples/requirements.txt
```
```shell
export GLUE_DIR=/path/to/glue
export TASK_NAME=MRPC
python ./examples/text-classification/run_glue.py \
--model_name_or_path bert-base-uncased \
--task_name $TASK_NAME \
--do_train \
--do_eval \
--data_dir $GLUE_DIR/$TASK_NAME \
--max_seq_length 128 \
--per_device_eval_batch_size=8 \
--per_device_train_batch_size=8 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir /tmp/$TASK_NAME/
```
where task name can be one of CoLA, SST-2, MRPC, STS-B, QQP, MNLI, QNLI, RTE, WNLI.
The dev set results will be present within the text file 'eval_results.txt' in the specified output_dir. In case of MNLI, since there are two separate dev sets, matched and mismatched, there will be a separate output folder called '/tmp/MNLI-MM/' in addition to '/tmp/MNLI/'.
#### Fine-tuning XLNet model on the STS-B regression task
This example code fine-tunes XLNet on the STS-B corpus using parallel training on a server with 4 V100 GPUs.
Parallel training is a simple way to use several GPUs (but is slower and less flexible than distributed training, see below).
```shell
export GLUE_DIR=/path/to/glue
python ./examples/text-classification/run_glue.py \
--model_name_or_path xlnet-large-cased \
--do_train \
--do_eval \
--task_name=sts-b \
--data_dir=${GLUE_DIR}/STS-B \
--output_dir=./proc_data/sts-b-110 \
--max_seq_length=128 \
--per_device_eval_batch_size=8 \
--per_device_train_batch_size=8 \
--gradient_accumulation_steps=1 \
--max_steps=1200 \
--model_name=xlnet-large-cased \
--overwrite_output_dir \
--overwrite_cache \
--warmup_steps=120
```
On this machine we thus have a batch size of 32, please increase `gradient_accumulation_steps` to reach the same batch size if you have a smaller machine. These hyper-parameters should result in a Pearson correlation coefficient of `+0.917` on the development set.
#### Fine-tuning Bert model on the MRPC classification task
This example code fine-tunes the Bert Whole Word Masking model on the Microsoft Research Paraphrase Corpus (MRPC) corpus using distributed training on 8 V100 GPUs to reach a F1 > 92.
```bash
python -m torch.distributed.launch --nproc_per_node 8 ./examples/text-classification/run_glue.py \
--model_name_or_path bert-large-uncased-whole-word-masking \
--task_name MRPC \
--do_train \
--do_eval \
--data_dir $GLUE_DIR/MRPC/ \
--max_seq_length 128 \
--per_device_eval_batch_size=8 \
--per_device_train_batch_size=8 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir /tmp/mrpc_output/ \
--overwrite_output_dir \
--overwrite_cache \
```
Training with these hyper-parameters gave us the following results:
```bash
acc = 0.8823529411764706
acc_and_f1 = 0.901702786377709
eval_loss = 0.3418912578906332
f1 = 0.9210526315789473
global_step = 174
loss = 0.07231863956341798
```
### `run_squad.py`: Fine-tuning on SQuAD for question-answering
This example code fine-tunes BERT on the SQuAD dataset using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD:
```bash
python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_squad.py \
--model_type bert \
--model_name_or_path bert-large-uncased-whole-word-masking \
--do_train \
--do_eval \
--train_file $SQUAD_DIR/train-v1.1.json \
--predict_file $SQUAD_DIR/dev-v1.1.json \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ../models/wwm_uncased_finetuned_squad/ \
--per_device_eval_batch_size=3 \
--per_device_train_batch_size=3 \
```
Training with these hyper-parameters gave us the following results:
```bash
python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ../models/wwm_uncased_finetuned_squad/predictions.json
{"exact_match": 86.91579943235573, "f1": 93.1532499015869}
```
This is the model provided as `bert-large-uncased-whole-word-masking-finetuned-squad`.
### `run_generation.py`: Text generation with GPT, GPT-2, CTRL, Transformer-XL and XLNet
A conditional generation script is also included to generate text from a prompt.
The generation script includes the [tricks](https://github.com/rusiaaman/XLNet-gen#methodology) proposed by Aman Rusia to get high-quality generation with memory models like Transformer-XL and XLNet (include a predefined text to make short inputs longer).
Here is how to run the script with the small version of OpenAI GPT-2 model:
```shell
python ./examples/text-generation/run_generation.py \
--model_type=gpt2 \
--length=20 \
--model_name_or_path=gpt2 \
```
and from the Salesforce CTRL model:
```shell
python ./examples/text-generation/run_generation.py \
--model_type=ctrl \
--length=20 \
--model_name_or_path=ctrl \
--temperature=0 \
--repetition_penalty=1.2 \
```
## Quick tour of model sharing
Starting with `v2.2.2`, you can now upload and share your fine-tuned models with the community, using the <abbr title="Command-line interface">CLI</abbr> that's built-in to the library.
**First, create an account on [https://huggingface.co/join](https://huggingface.co/join)**. Optionally, join an existing organization or create a new one. Then:
```shell
transformers-cli login
# log in using the same credentials as on huggingface.co
```
Upload your model:
```shell
transformers-cli upload ./path/to/pretrained_model/
# ^^ Upload folder containing weights/tokenizer/config
# saved via `.save_pretrained()`
transformers-cli upload ./config.json [--filename folder/foobar.json]
# ^^ Upload a single file
# (you can optionally override its filename, which can be nested inside a folder)
```
If you want your model to be namespaced by your organization name rather than your username, add the following flag to any command:
```shell
--organization organization_name
```
Your model will then be accessible through its identifier, a concatenation of your username (or organization name) and the folder name above:
```python
"username/pretrained_model"
# or if an org:
"organization_name/pretrained_model"
```
**Please add a README.md model card** to the repo under `model_cards/` with: model description, training params (dataset, preprocessing, hardware used, hyperparameters), evaluation results, intended uses & limitations, etc.
Your model now has a page on huggingface.co/models 🔥
Anyone can load it from code:
```python
tokenizer = AutoTokenizer.from_pretrained("namespace/pretrained_model")
model = AutoModel.from_pretrained("namespace/pretrained_model")
```
List all your files on S3:
```shell
transformers-cli s3 ls
```
You can also delete unneeded files:
```shell
transformers-cli s3 rm …
```
## Quick tour of pipelines
New in version `v2.3`: `Pipeline` are high-level objects which automatically handle tokenization, running your data through a transformers model
and outputting the result in a structured object.
You can create `Pipeline` objects for the following down-stream tasks:
- `feature-extraction`: Generates a tensor representation for the input sequence
- `ner`: Generates named entity mapping for each word in the input sequence.
- `sentiment-analysis`: Gives the polarity (positive / negative) of the whole input sequence.
- `text-classification`: Initialize a `TextClassificationPipeline` directly, or see `sentiment-analysis` for an example.
- `question-answering`: Provided some context and a question refering to the context, it will extract the answer to the question in the context.
- `fill-mask`: Takes an input sequence containing a masked token (e.g. `<mask>`) and return list of most probable filled sequences, with their probabilities.
- `summarization`
- `translation_xx_to_yy`
```python
>>> from transformers import pipeline
# Allocate a pipeline for sentiment-analysis
>>> nlp = pipeline('sentiment-analysis')
>>> nlp('We are very happy to include pipeline into the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9978193640708923}]
# Allocate a pipeline for question-answering
>>> nlp = pipeline('question-answering')
>>> nlp({
... 'question': 'What is the name of the repository ?',
... 'context': 'Pipeline have been included in the huggingface/transformers repository'
... })
{'score': 0.5135612454720828, 'start': 35, 'end': 59, 'answer': 'huggingface/transformers'}
```
## Migrating from pytorch-transformers to transformers
Here is a quick summary of what you should take care of when migrating from `pytorch-transformers` to `transformers`.
### Positional order of some models' keywords inputs (`attention_mask`, `token_type_ids`...) changed
To be able to use Torchscript (see #1010, #1204 and #1195) the specific order of some models **keywords inputs** (`attention_mask`, `token_type_ids`...) has been changed.
If you used to call the models with keyword names for keyword arguments, e.g. `model(inputs_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)`, this should not cause any change.
If you used to call the models with positional inputs for keyword arguments, e.g. `model(inputs_ids, attention_mask, token_type_ids)`, you may have to double check the exact order of input arguments.
## Migrating from pytorch-pretrained-bert to transformers
Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to `transformers`.
### Models always output `tuples`
The main breaking change when migrating from `pytorch-pretrained-bert` to `transformers` is that every model's forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.
The exact content of the tuples for each model is detailed in the models' docstrings and the [documentation](https://huggingface.co/transformers/).
In pretty much every case, you will be fine by taking the first element of the output as the output you previously used in `pytorch-pretrained-bert`.
Here is a `pytorch-pretrained-bert` to `transformers` conversion example for a `BertForSequenceClassification` classification model:
```python
# Let's load our model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
# If you used to have this line in pytorch-pretrained-bert:
loss = model(input_ids, labels=labels)
# Now just use this line in transformers to extract the loss from the output tuple:
outputs = model(input_ids, labels=labels)
loss = outputs[0]
# In transformers you can also have access to the logits:
loss, logits = outputs[:2]
# And even the attention weights if you configure the model to output them (and other outputs too, see the docstrings and documentation)
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', output_attentions=True)
outputs = model(input_ids, labels=labels)
loss, logits, attentions = outputs
```
### Using hidden states
By enabling the configuration option `output_hidden_states`, it was possible to retrieve the last hidden states of the encoder. In `pytorch-transformers` as well as `transformers` the return value has changed slightly: `all_hidden_states` now also includes the hidden state of the embeddings in addition to those of the encoding layers. This allows users to easily access the embeddings final state.
### Serialization
Breaking change in the `from_pretrained()` method:
1. Models are now set in evaluation mode by default when instantiated with the `from_pretrained()` method. To train them, don't forget to set them back in training mode (`model.train()`) to activate the dropout modules.
2. The additional `*input` and `**kwargs` arguments supplied to the `from_pretrained()` method used to be directly passed to the underlying model's class `__init__()` method. They are now used to update the model configuration attribute instead, which can break derived model classes built based on the previous `BertForSequenceClassification` examples. We are working on a way to mitigate this breaking change in [#866](https://github.com/huggingface/transformers/pull/866) by forwarding the the model's `__init__()` method (i) the provided positional arguments and (ii) the keyword arguments which do not match any configuration class attributes.
Also, while not a breaking change, the serialization methods have been standardized and you probably should switch to the new method `save_pretrained(save_directory)` if you were using any other serialization method before.
Here is an example:
```python
### Let's load a model and tokenizer
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
### Do some stuff to our model and tokenizer
# Ex: add new tokens to the vocabulary and embeddings of our model
tokenizer.add_tokens(['[SPECIAL_TOKEN_1]', '[SPECIAL_TOKEN_2]'])
model.resize_token_embeddings(len(tokenizer))
# Train our model
train(model)
### Now let's save our model and tokenizer to a directory
model.save_pretrained('./my_saved_model_directory/')
tokenizer.save_pretrained('./my_saved_model_directory/')
### Reload the model and the tokenizer
model = BertForSequenceClassification.from_pretrained('./my_saved_model_directory/')
tokenizer = BertTokenizer.from_pretrained('./my_saved_model_directory/')
```
### Optimizers: BertAdam & OpenAIAdam are now AdamW, schedules are standard PyTorch schedules
The two optimizers previously included, `BertAdam` and `OpenAIAdam`, have been replaced by a single `AdamW` optimizer which has a few differences:
- it only implements weights decay correction,
- schedules are now externals (see below),
- gradient clipping is now also external (see below).
The new optimizer `AdamW` matches PyTorch `Adam` optimizer API and let you use standard PyTorch or apex methods for the schedule and clipping.
The schedules are now standard [PyTorch learning rate schedulers](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate) and not part of the optimizer anymore.
Here is a conversion examples from `BertAdam` with a linear warmup and decay schedule to `AdamW` and the same schedule:
```python
# Parameters:
lr = 1e-3
max_grad_norm = 1.0
num_training_steps = 1000
num_warmup_steps = 100
warmup_proportion = float(num_warmup_steps) / float(num_training_steps) # 0.1
### Previously BertAdam optimizer was instantiated like this:
optimizer = BertAdam(model.parameters(), lr=lr, schedule='warmup_linear', warmup=warmup_proportion, t_total=num_training_steps)
### and used like this:
for batch in train_data:
loss = model(batch)
loss.backward()
optimizer.step()
### In Transformers, optimizer and schedules are splitted and instantiated like this:
optimizer = AdamW(model.parameters(), lr=lr, correct_bias=False) # To reproduce BertAdam specific behavior set correct_bias=False
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=num_warmup_steps, num_training_steps=num_training_steps) # PyTorch scheduler
### and used like this:
for batch in train_data:
model.train()
loss = model(batch)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm) # Gradient clipping is not in AdamW anymore (so you can use amp without issue)
optimizer.step()
scheduler.step()
optimizer.zero_grad()
```
22. **[DPR](https://github.com/facebookresearch/DPR)** (from Facebook) released with the paper [Dense Passage Retrieval
for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon
Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
23. **[Pegasus](https://github.com/google-research/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
24. **[MBart](https://github.com/pytorch/fairseq/tree/master/examples/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
25. **[LXMERT](https://github.com/airsplay/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
26. **[Funnel Transformer](https://github.com/laiguokun/Funnel-Transformer)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
27. **[LayoutLM](https://github.com/microsoft/unilm/tree/master/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
28. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
29. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations. You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
## Learn more
| Section | Description |
|-|-|
| [Documentation](https://huggingface.co/transformers/) | Full API documentation and tutorials |
| [Task summary](https://huggingface.co/transformers/task_summary.html) | Tasks supported by 🤗 Transformers |
| [Preprocessing tutorial](https://huggingface.co/transformers/preprocessing.html) | Using the `Tokenizer` class to prepare data for the models |
| [Training and fine-tuning](https://huggingface.co/transformers/training.html) | Using the models provided by 🤗 Transformers in a PyTorch/TensorFlow training loop and the `Trainer` API |
| [Quick tour: Fine-tuning/usage scripts](https://github.com/huggingface/transformers/tree/master/examples) | Example scripts for fine-tuning models on a wide range of tasks |
| [Model sharing and uploading](https://huggingface.co/transformers/model_sharing.html) | Upload and share your fine-tuned models with the community |
| [Migration](https://huggingface.co/transformers/migration.html) | Migrate to 🤗 Transformers from `pytorch-transformers` or `pytorch-pretrained-bert` |
## Citation
We now have a paper you can cite for the 🤗 Transformers library:
We now have a [paper](https://arxiv.org/abs/1910.03771) you can cite for the 🤗 Transformers library:
```bibtex
@article{Wolf2019HuggingFacesTS,
title={HuggingFace's Transformers: State-of-the-art Natural Language Processing},
author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and R'emi Louf and Morgan Funtowicz and Jamie Brew},
author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush},
journal={ArXiv},
year={2019},
volume={abs/1910.03771}

View File

@@ -4,3 +4,7 @@ coverage:
default:
informational: true
patch: off
comment:
require_changes: true # only comment if there was change in coverage
require_head: yes # don't report if there is no head coverage report
require_base: yes # don't report if there is no base coverage report

View File

@@ -1,23 +0,0 @@
cd docs
function deploy_doc(){
echo "Creating doc at commit $1 and pushing to folder $2"
git checkout $1
if [ ! -z "$2" ]
then
echo "Pushing version" $2
make clean && make html && scp -r -oStrictHostKeyChecking=no _build/html $doc:$dir/$2
else
echo "Pushing master"
make clean && make html && scp -r -oStrictHostKeyChecking=no _build/html/* $doc:$dir
fi
}
deploy_doc "master"
deploy_doc "b33a385" v1.0.0
deploy_doc "fe02e45" v1.1.0
deploy_doc "89fd345" v1.2.0
deploy_doc "fc9faa8" v2.0.0
deploy_doc "3ddce1d" v2.1.1
deploy_doc "f2f3294" v2.2.0
deploy_doc "d0f8b9a" v2.3.0

View File

@@ -0,0 +1,65 @@
FROM google/cloud-sdk:slim
# Build args.
ARG GITHUB_REF=refs/heads/master
# TODO: This Dockerfile installs pytorch/xla 3.6 wheels. There are also 3.7
# wheels available; see below.
ENV PYTHON_VERSION=3.6
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cmake \
git \
curl \
ca-certificates
# Install conda and python.
# NOTE new Conda does not forward the exit status... https://github.com/conda/conda/issues/8385
RUN curl -o ~/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-4.7.12-Linux-x86_64.sh && \
chmod +x ~/miniconda.sh && \
~/miniconda.sh -b && \
rm ~/miniconda.sh
ENV PATH=/root/miniconda3/bin:$PATH
RUN conda create -y --name container python=$PYTHON_VERSION
# Run the rest of commands within the new conda env.
# Use absolute path to appease Codefactor.
SHELL ["/root/miniconda3/bin/conda", "run", "-n", "container", "/bin/bash", "-c"]
RUN conda install -y python=$PYTHON_VERSION mkl
RUN pip uninstall -y torch && \
# Python 3.7 wheels are available. Replace cp36-cp36m with cp37-cp37m
gsutil cp 'gs://tpu-pytorch/wheels/torch-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' . && \
gsutil cp 'gs://tpu-pytorch/wheels/torch_xla-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' . && \
gsutil cp 'gs://tpu-pytorch/wheels/torchvision-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' . && \
pip install 'torch-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' && \
pip install 'torch_xla-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' && \
pip install 'torchvision-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' && \
rm 'torch-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' && \
rm 'torch_xla-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' && \
rm 'torchvision-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' && \
apt-get install -y libomp5
ENV LD_LIBRARY_PATH=root/miniconda3/envs/container/lib
# Install huggingface/transformers at the current PR, plus dependencies.
RUN git clone https://github.com/huggingface/transformers.git && \
cd transformers && \
git fetch origin $GITHUB_REF:CI && \
git checkout CI && \
cd .. && \
pip install ./transformers && \
pip install -r ./transformers/examples/requirements.txt && \
pip install pytest
RUN python -c "import torch_xla; print(torch_xla.__version__)"
RUN python -c "import transformers as trf; print(trf.__version__)"
RUN conda init bash
COPY docker-entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]
CMD ["bash"]

View File

@@ -0,0 +1,38 @@
local base = import 'templates/base.libsonnet';
local tpus = import 'templates/tpus.libsonnet';
local utils = import "templates/utils.libsonnet";
local volumes = import "templates/volumes.libsonnet";
local bertBaseCased = base.BaseTest {
frameworkPrefix: "hf",
modelName: "bert-base-cased",
mode: "example",
configMaps: [],
timeout: 3600, # 1 hour, in seconds
image: std.extVar('image'),
imageTag: std.extVar('image-tag'),
tpuSettings+: {
softwareVersion: "pytorch-nightly",
},
accelerator: tpus.v3_8,
volumeMap+: {
datasets: volumes.PersistentVolumeSpec {
name: "huggingface-cluster-disk",
mountPath: "/datasets",
},
},
command: utils.scriptCommand(
|||
python -m pytest -s transformers/examples/test_xla_examples.py -v
test_exit_code=$?
echo "\nFinished running commands.\n"
test $test_exit_code -eq 0
|||
),
};
bertBaseCased.oneshotJob

View File

@@ -0,0 +1,32 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: huggingface-cluster-disk
spec:
storageClassName: ""
capacity:
storage: 500Gi
accessModes:
- ReadOnlyMany
claimRef:
namespace: default
name: huggingface-cluster-disk-claim
gcePersistentDisk:
pdName: huggingface-cluster-disk
fsType: ext4
readOnly: true
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: huggingface-cluster-disk-claim
spec:
# Specify "" as the storageClassName so it matches the PersistentVolume's StorageClass.
# A nil storageClassName value uses the default StorageClass. For details, see
# https://kubernetes.io/docs/concepts/storage/persistent-volumes/#class-1
storageClassName: ""
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 1Ki

View File

@@ -0,0 +1,8 @@
#!/bin/bash
source ~/.bashrc
echo "running docker-entrypoint.sh"
conda activate container
echo $KUBE_GOOGLE_CLOUD_TPU_ENDPOINTS
echo "printed TPU info"
export XRT_TPU_CONFIG="tpu_worker;0;${KUBE_GOOGLE_CLOUD_TPU_ENDPOINTS:7}"
exec "$@"#!/bin/bash

View File

@@ -88,20 +88,25 @@ The `huggingface/transformers` documentation follows the
[Google documentation](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) style. It is
mostly written in ReStructuredText
([Sphinx simple documentation](https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html),
[Sourceforge complete documentation](https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html))
[Sourceforge complete documentation](https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html)).
### Adding a new section
A section is a page held in the `Notes` toc-tree on the documentation. Adding a new section is done in two steps:
### Adding a new tutorial
Adding a new tutorial or section is done in two steps:
- Add a new file under `./source`. This file can either be ReStructuredText (.rst) or Markdown (.md).
- Link that file in `./source/index.rst` on the correct toc-tree.
Make sure to put your new file under the proper section. It's unlikely to go in the first section (*Get Started*), so
depending on the intended targets (beginners, more advanced users or researchers) it should go in section two, three or
four.
### Adding a new model
When adding a new model:
- Create a file `xxx.rst` under `./source/model_doc`.
- Create a file `xxx.rst` under `./source/model_doc` (don't hesitate to copy an existing file as template).
- Link that file in `./source/index.rst` on the `model_doc` toc-tree.
- Write a short overview of the model:
- Overview with paper & authors
@@ -120,18 +125,18 @@ When adding a new model:
These classes should be added using the RST syntax. Usually as follows:
```
XXXConfig
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XXXConfig
:members:
```
This will include every public method of the configuration. If for some reason you wish for a method not to be
displayed in the documentation, you can do so by specifying which methods should be in the docs:
This will include every public method of the configuration that is documented. If for some reason you wish for a method
not to be displayed in the documentation, you can do so by specifying which methods should be in the docs:
```
XXXTokenizer
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XXXTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
@@ -142,13 +147,17 @@ XXXTokenizer
### Writing source documentation
Values that should be put in `code` should either be surrounded by double backticks: \`\`like so\`\` or be written as
an object using the :obj: syntax: :obj:\`like so\`.
an object using the :obj: syntax: :obj:\`like so\`. Note that argument names and objects like True, None or any strings
should usually be put in `code`.
When mentionning a class, it is recommended to use the :class: syntax as the mentioned class will be automatically
linked by Sphinx: :class:\`transformers.XXXClass\`
linked by Sphinx: :class:\`~transformers.XXXClass\`
When mentioning a function, it is recommended to use the :func: syntax as the mentioned method will be automatically
linked by Sphinx: :func:\`transformers.XXXClass.method\`
When mentioning a function, it is recommended to use the :func: syntax as the mentioned function will be automatically
linked by Sphinx: :func:\`~transformers.function\`.
When mentioning a method, it is recommended to use the :meth: syntax as the mentioned method will be automatically
linked by Sphinx: :meth:\`~transformers.XXXClass.method\`.
Links should be done as so (note the double underscore at the end): \`text for the link <./local-link-or-global-link#loc>\`__
@@ -165,13 +174,34 @@ Here's an example showcasing everything so far:
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`):
Indices of input sequence tokens in the vocabulary.
Indices can be obtained using :class:`transformers.AlbertTokenizer`.
See :func:`transformers.PreTrainedTokenizer.encode` and
:func:`transformers.PreTrainedTokenizer.__call__` for details.
Indices can be obtained using :class:`~transformers.AlbertTokenizer`.
See :meth:`~transformers.PreTrainedTokenizer.encode` and
:meth:`~transformers.PreTrainedTokenizer.__call__` for details.
`What are input IDs? <../glossary.html#input-ids>`__
```
For optional arguments or arguments with defaults we follow the following syntax: imagine we have a function with the
following signature:
```
def my_function(x: str = None, a: float = 1):
```
then its documentation should look like this:
```
Args:
x (:obj:`str`, `optional`):
This argument controls ...
a (:obj:`float`, `optional`, defaults to 1):
This argument is used to ...
```
Note that we always omit the "defaults to :obj:\`None\`" when None is the default for any argument. Also note that even
if the first line describing your argument type and its default gets long, you can't break it on several lines. You can
however write as many lines as you want in the indented description (see the example above with `input_ids`).
#### Writing a multi-line code block
Multi-line code blocks can be useful for displaying examples. They are done like so:
@@ -186,6 +216,9 @@ Example::
The `Example` string at the beginning can be replaced by anything as long as there are two semicolons following it.
We follow the [doctest](https://docs.python.org/3/library/doctest.html) syntax for the examples to automatically test
the results stay consistent with the library.
#### Writing a return block
Arguments should be defined with the `Args:` prefix, followed by a line return and an indentation.
@@ -207,5 +240,5 @@ Here's an example for a single value return:
```
Returns:
A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
:obj:`List[int]`: A list of integers in the range [0, 1] --- 1 for a special token, 0 for a sequence token.
```

View File

@@ -1,5 +1,36 @@
/* Our DOM objects */
/* Colab dropdown */
.colab-dropdown {
position: relative;
display: inline-block;
}
.colab-dropdown-content {
display: none;
position: absolute;
background-color: #f9f9f9;
min-width: 117px;
box-shadow: 0px 8px 16px 0px rgba(0,0,0,0.2);
z-index: 1;
}
.colab-dropdown-content button {
color: #6670FF;
background-color: #f9f9f9;
font-size: 12px;
border: none;
min-width: 117px;
padding: 5px 5px;
text-decoration: none;
display: block;
}
.colab-dropdown-content button:hover {background-color: #eee;}
.colab-dropdown:hover .colab-dropdown-content {display: block;}
/* Version control */
.version-button {
@@ -94,6 +125,12 @@ a.copybtn {
background-color: #6670FF;
}
/* The section headers in the toc tree */
.wy-menu-vertical p.caption{
background-color: #4d59ff;
line-height: 40px;
}
/* The selected items in the toc tree */
.wy-menu-vertical li.current{
background-color: #A6B0FF;

View File

@@ -1,10 +1,13 @@
// These two things need to be updated at each release for the version selector.
// Last stable version
const stableVersion = "v3.0.0"
const stableVersion = "v3.3.0"
// Dictionary doc folder to label
const versionMapping = {
"master": "master",
"": "v3.0.0 (stable)",
"": "v3.3.0",
"v3.2.0": "v3.2.0",
"v3.1.0": "v3.1.0 (stable)",
"v3.0.2": "v3.0.0/v3.0.1/v3.0.2",
"v2.11.0": "v2.11.0",
"v2.10.0": "v2.10.0",
"v2.9.1": "v2.9.0/v2.9.1",
@@ -21,6 +24,18 @@ const versionMapping = {
"v1.1.0": "v1.1.0",
"v1.0.0": "v1.0.0"
}
// The page that have a notebook and therefore should have the open in colab badge.
const hasNotebook = [
"benchmarks",
"custom_datasets",
"multilingual",
"perplexity",
"preprocessing",
"quicktour",
"task_summary",
"tokenizer_summary",
"training"
];
function addIcon() {
const huggingFaceLogo = "https://huggingface.co/landing/assets/transformers-docs/huggingface_logo.svg";
@@ -82,6 +97,26 @@ function addGithubButton() {
document.querySelector(".wy-side-nav-search .icon-home").insertAdjacentHTML('afterend', div);
}
function addColabLink() {
const parts = location.toString().split('/');
const pageName = parts[parts.length - 1].split(".")[0];
if (hasNotebook.includes(pageName)) {
const baseURL = "https://colab.research.google.com/github/huggingface/notebooks/blob/master/transformers_doc/"
const linksColab = `
<div class="colab-dropdown">
<img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg">
<div class="colab-dropdown-content">
<button onclick=" window.open('${baseURL}${pageName}.ipynb')">Mixed</button>
<button onclick=" window.open('${baseURL}pytorch/${pageName}.ipynb')">PyTorch</button>
<button onclick=" window.open('${baseURL}tensorflow/${pageName}.ipynb')">TensorFlow</button>
</div>
</div>`
const leftMenu = document.querySelector(".wy-breadcrumbs-aside")
leftMenu.innerHTML = linksColab + '\n' + leftMenu.innerHTML
}
}
function addVersionControl() {
// To grab the version currently in view, we parse the url
const parts = location.toString().split('/');
@@ -149,6 +184,7 @@ function addHfMenu() {
<div class="menu">
<a href="/welcome">🔥 Sign in</a>
<a href="/models">🚀 Models</a>
<a href="http://discuss.huggingface.co">💬 Forum</a>
</div>
`;
document.body.insertAdjacentHTML('afterbegin', div);
@@ -254,6 +290,7 @@ function onLoad() {
addGithubButton();
parseGithubButtons();
addHfMenu();
addColabLink();
platformToggle();
}

View File

@@ -1,12 +1,12 @@
Benchmarks
==========
=======================================================================================================================
Let's take a look at how 🤗 Transformer models can be benchmarked, best practices, and already available benchmarks.
A notebook explaining in more detail how to benchmark 🤗 Transformer models can be found `here <https://github.com/huggingface/transformers/blob/master/notebooks/05-benchmark.ipynb>`__.
How to benchmark 🤗 Transformer models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The classes :class:`~transformers.PyTorchBenchmark` and :class:`~transformers.TensorFlowBenchmark` allow to flexibly benchmark 🤗 Transformer models.
The benchmark classes allow us to measure the `peak memory usage` and `required time` for both
@@ -40,12 +40,12 @@ There are many more parameters that can be configured via the benchmark argument
``src/transformers/benchmark/benchmark_args_utils.py``, ``src/transformers/benchmark/benchmark_args.py`` (for PyTorch) and ``src/transformers/benchmark/benchmark_args_tf.py`` (for Tensorflow).
Alternatively, running the following shell commands from root will print out a descriptive list of all configurable parameters for PyTorch and Tensorflow respectively.
.. code-block::
.. code-block:: bash
>>> ## PYTORCH CODE
## PYTORCH CODE
python examples/benchmarking/run_benchmark.py --help
>>> ## TENSORFLOW CODE
## TENSORFLOW CODE
python examples/benchmarking/run_benchmark_tf.py --help
@@ -300,7 +300,7 @@ deciding for which configuration the model should be trained.
Benchmark best practices
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section lists a couple of best practices one should be aware of when benchmarking a model.
@@ -311,7 +311,7 @@ This section lists a couple of best practices one should be aware of when benchm
Sharing your benchmark
~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Previously all available core models (10 at the time) have been benchmarked for `inference time`, across many different settings: using PyTorch, with
and without TorchScript, using TensorFlow, with and without XLA. All of those tests were done across CPUs (except for

View File

@@ -1,5 +1,5 @@
BERTology
---------
-----------------------------------------------------------------------------------------------------------------------
There is a growing field of study concerned with investigating the inner working of large-scale transformers like BERT (that some call "BERTology"). Some good examples of this field are:

View File

@@ -26,7 +26,7 @@ author = u'huggingface'
# The short X.Y version
version = u''
# The full version, including alpha/beta/rc tags
release = u'3.0.0'
release = u'3.3.1'
# -- General configuration ---------------------------------------------------
@@ -76,7 +76,8 @@ exclude_patterns = [u'_build', 'Thumbs.db', '.DS_Store']
pygments_style = None
# Remove the prompt when copying examples
copybutton_prompt_text = ">>> "
copybutton_prompt_text = r">>> |\.\.\. "
copybutton_prompt_is_regexp = True
# -- Options for HTML output -------------------------------------------------

View File

@@ -1,5 +1,5 @@
Converting Tensorflow Checkpoints
================================================
=======================================================================================================================
A command-line interface is provided to convert original Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM checkpoints in models than be loaded using the ``from_pretrained`` methods of the library.
@@ -10,7 +10,7 @@ A command-line interface is provided to convert original Bert/GPT/GPT-2/Transfor
The documentation below reflects the **transformers-cli convert** command format.
BERT
^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can convert any TensorFlow checkpoint for BERT (in particular `the pre-trained models released by Google <https://github.com/google-research/bert#pre-trained-models>`_\ ) in a PyTorch save file by using the `convert_bert_original_tf_checkpoint_to_pytorch.py <https://github.com/huggingface/transformers/blob/master/src/transformers/convert_bert_original_tf_checkpoint_to_pytorch.py>`_ script.
@@ -34,7 +34,7 @@ Here is an example of the conversion process for a pre-trained ``BERT-Base Uncas
You can download Google's pre-trained models for the conversion `here <https://github.com/google-research/bert#pre-trained-models>`__.
ALBERT
^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Convert TensorFlow model checkpoints of ALBERT to PyTorch using the `convert_albert_original_tf_checkpoint_to_pytorch.py <https://github.com/huggingface/transformers/blob/master/src/transformers/convert_bert_original_tf_checkpoint_to_pytorch.py>`_ script.
@@ -54,7 +54,7 @@ Here is an example of the conversion process for the pre-trained ``ALBERT Base``
You can download Google's pre-trained models for the conversion `here <https://github.com/google-research/albert#pre-trained-models>`__.
OpenAI GPT
^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained OpenAI GPT model, assuming that your NumPy checkpoint save as the same format than OpenAI pretrained model (see `here <https://github.com/openai/finetune-transformer-lm>`__\ )
@@ -70,7 +70,7 @@ Here is an example of the conversion process for a pre-trained OpenAI GPT model,
OpenAI GPT-2
^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained OpenAI GPT-2 model (see `here <https://github.com/openai/gpt-2>`__\ )
@@ -85,7 +85,7 @@ Here is an example of the conversion process for a pre-trained OpenAI GPT-2 mode
[--finetuning_task_name OPENAI_GPT2_FINETUNED_TASK]
Transformer-XL
^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained Transformer-XL model (see `here <https://github.com/kimiyoung/transformer-xl/tree/master/tf#obtain-and-evaluate-pretrained-sota-models>`__\ )
@@ -101,7 +101,7 @@ Here is an example of the conversion process for a pre-trained Transformer-XL mo
XLNet
^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained XLNet model:
@@ -118,7 +118,7 @@ Here is an example of the conversion process for a pre-trained XLNet model:
XLM
^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained XLM model:

View File

@@ -0,0 +1,715 @@
Fine-tuning with custom datasets
=======================================================================================================================
.. note::
The datasets used in this tutorial are available and can be more easily accessed using the
`🤗 NLP library <https://github.com/huggingface/nlp>`_. We do not use this library to access the datasets here
since this tutorial meant to illustrate how to work with your own data. A brief of introduction can be found
at the end of the tutorial in the section ":ref:`nlplib`".
This tutorial will take you through several examples of using 🤗 Transformers models with your own datasets. The
guide shows one of many valid workflows for using these models and is meant to be illustrative rather than
definitive. We show examples of reading in several data formats, preprocessing the data for several types of tasks,
and then preparing the data into PyTorch/TensorFlow ``Dataset`` objects which can easily be used either with
:class:`~transformers.Trainer`/:class:`~transformers.TFTrainer` or with native PyTorch/TensorFlow.
We include several examples, each of which demonstrates a different type of common downstream task:
- :ref:`seq_imdb`
- :ref:`tok_ner`
- :ref:`qa_squad`
- :ref:`resources`
.. _seq_imdb:
Sequence Classification with IMDb Reviews
-----------------------------------------------------------------------------------------------------------------------
.. note::
This dataset can be explored in the Hugging Face model hub (`IMDb <https://huggingface.co/datasets/imdb>`_), and can
be alternatively downloaded with the 🤗 NLP library with ``load_dataset("imdb")``.
In this example, we'll show how to download, tokenize, and train a model on the IMDb reviews dataset. This task
takes the text of a review and requires the model to predict whether the sentiment of the review is positive or
negative. Let's start by downloading the dataset from the
`Large Movie Review Dataset <http://ai.stanford.edu/~amaas/data/sentiment/>`_ webpage.
.. code-block:: bash
wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
tar -xf aclImdb_v1.tar.gz
This data is organized into ``pos`` and ``neg`` folders with one text file per example. Let's write a function that can
read this in.
.. code-block:: python
from pathlib import Path
def read_imdb_split(split_dir):
split_dir = Path(split_dir)
texts = []
labels = []
for label_dir in ["pos", "neg"]:
for text_file in (split_dir/label_dir).iterdir():
texts.append(text_file.read_text())
labels.append(0 if label_dir is "neg" else 1)
return texts, labels
train_texts, train_labels = read_imdb_split('aclImdb/train')
test_texts, test_labels = read_imdb_split('aclImdb/test')
We now have a train and test dataset, but let's also also create a validation set which we can use for for
evaluation and tuning without training our test set results. Sklearn has a convenient utility for creating such
splits:
.. code-block:: python
from sklearn.model_selection import train_test_split
train_texts, val_texts, train_labels, val_labels = train_test_split(train_texts, train_labels, test_size=.2)
Alright, we've read in our dataset. Now let's tackle tokenization. We'll eventually train a classifier using
pre-trained DistilBert, so let's use the DistilBert tokenizer.
.. code-block:: python
from transformers import DistilBertTokenizerFast
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
Now we can simply pass our texts to the tokenizer. We'll pass ``truncation=True`` and ``padding=True``, which will
ensure that all of our sequences are padded to the same length and are truncated to be no longer model's maximum
input length. This will allow us to feed batches of sequences into the model at the same time.
.. code-block:: python
train_encodings = tokenizer(train_texts, truncation=True, padding=True)
val_encodings = tokenizer(val_texts, truncation=True, padding=True)
test_encodings = tokenizer(test_texts, truncation=True, padding=True)
Now, let's turn our labels and encodings into a Dataset object. In PyTorch, this is done by subclassing a
``torch.utils.data.Dataset`` object and implementing ``__len__`` and ``__getitem__``. In TensorFlow, we pass our input encodings and
labels to the ``from_tensor_slices`` constructor method. We put the data in this format so that the data can be
easily batched such that each key in the batch encoding corresponds to a named parameter of the
:meth:`~transformers.DistilBertForSequenceClassification.forward` method of the model we will train.
.. code-block:: python
## PYTORCH CODE
import torch
class IMDbDataset(torch.utils.data.Dataset):
def __init__(self, encodings, labels):
self.encodings = encodings
self.labels = labels
def __getitem__(self, idx):
item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
item['labels'] = torch.tensor(self.labels[idx])
return item
def __len__(self):
return len(self.labels)
train_dataset = IMDbDataset(train_encodings, train_labels)
val_dataset = IMDbDataset(val_encodings, val_labels)
test_dataset = IMDbDataset(test_encodings, test_labels)
## TENSORFLOW CODE
import tensorflow as tf
train_dataset = tf.data.Dataset.from_tensor_slices((
dict(train_encodings),
train_labels
))
val_dataset = tf.data.Dataset.from_tensor_slices((
dict(val_encodings),
val_labels
))
test_dataset = tf.data.Dataset.from_tensor_slices((
dict(test_encodings),
test_labels
))
Now that our datasets our ready, we can fine-tune a model either with the 🤗
:class:`~transformers.Trainer`/:class:`~transformers.TFTrainer` or with native PyTorch/TensorFlow. See
:doc:`training <training>`.
.. _ft_trainer:
Fine-tuning with Trainer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The steps above prepared the datasets in the way that the trainer is expected. Now all we need to do is create a
model to fine-tune, define the :class:`~transformers.TrainingArguments`/:class:`~transformers.TFTrainingArguments`
and instantiate a :class:`~transformers.Trainer`/:class:`~transformers.TFTrainer`.
.. code-block:: python
## PYTORCH CODE
from transformers import DistilBertForSequenceClassification, Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir='./results', # output directory
num_train_epochs=3, # total number of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=64, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
logging_steps=10,
)
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
trainer = Trainer(
model=model, # the instantiated 🤗 Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=train_dataset, # training dataset
eval_dataset=val_dataset # evaluation dataset
)
trainer.train()
## TENSORFLOW CODE
from transformers import TFDistilBertForSequenceClassification, TFTrainer, TFTrainingArguments
training_args = TFTrainingArguments(
output_dir='./results', # output directory
num_train_epochs=3, # total number of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=64, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
logging_steps=10,
)
with training_args.strategy.scope():
model = TFDistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
trainer = TFTrainer(
model=model, # the instantiated 🤗 Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=train_dataset, # training dataset
eval_dataset=val_dataset # evaluation dataset
)
trainer.train()
.. _ft_native:
Fine-tuning with native PyTorch/TensorFlow
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We can also train use native PyTorch or TensorFlow:
.. code-block:: python
## PYTORCH CODE
from torch.utils.data import DataLoader
from transformers import DistilBertForSequenceClassification, AdamW
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
model.to(device)
model.train()
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
optim = AdamW(model.parameters(), lr=5e-5)
for epoch in range(3):
for batch in train_loader:
optim.zero_grad()
input_ids = batch['input_ids'].to(device)
attention_mask = batch['attention_mask'].to(device)
labels = batch['labels'].to(device)
outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
loss = outputs[0]
loss.backward()
optim.step()
model.eval()
## TENSORFLOW CODE
from transformers import TFDistilBertForSequenceClassification
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)
.. _tok_ner:
Token Classification with W-NUT Emerging Entities
-----------------------------------------------------------------------------------------------------------------------
.. note::
This dataset can be explored in the Hugging Face model hub (`WNUT-17 <https://huggingface.co/datasets/wnut_17>`_), and can
be alternatively downloaded with the 🤗 NLP library with ``load_dataset("wnut_17")``.
Next we will look at token classification. Rather than classifying an entire sequence, this task classifies token by
token. We'll demonstrate how to do this with
`Named Entity Recognition <http://nlpprogress.com/english/named_entity_recognition.html>`_, which involves
identifying tokens which correspond to a predefined set of "entities". Specifically, we'll use the
`W-NUT Emerging and Rare entities <http://noisy-text.github.io/2017/emerging-rare-entities.html>`_ corpus. The data
is given as a collection of pre-tokenized documents where each token is assigned a tag.
Let's start by downloading the data.
.. code-block:: bash
wget http://noisy-text.github.io/2017/files/wnut17train.conll
In this case, we'll just download the train set, which is a single text file. Each line of the file contains either
(1) a word and tag separated by a tab, or (2) a blank line indicating the end of a document. Let's write a
function to read this in. We'll take in the file path and return ``token_docs`` which is a list of lists of token
strings, and ``token_tags`` which is a list of lists of tag strings.
.. code-block:: python
from pathlib import Path
import re
def read_wnut(file_path):
file_path = Path(file_path)
raw_text = file_path.read_text().strip()
raw_docs = re.split(r'\n\t?\n', raw_text)
token_docs = []
tag_docs = []
for doc in raw_docs:
tokens = []
tags = []
for line in doc.split('\n'):
token, tag = line.split('\t')
tokens.append(token)
tags.append(tag)
token_docs.append(tokens)
tag_docs.append(tags)
return token_docs, tag_docs
texts, tags = read_wnut('wnut17train.conll')
Just to see what this data looks like, let's take a look at a segment of the first document.
.. code-block:: python
>>> print(texts[0][10:17], tags[0][10:17], sep='\n')
['for', 'two', 'weeks', '.', 'Empire', 'State', 'Building']
['O', 'O', 'O', 'O', 'B-location', 'I-location', 'I-location']
``location`` is an entity type, ``B-`` indicates the beginning of an entity, and ``I-`` indicates consecutive positions of
the same entity ("Empire State Building" is considered one entity). ``O`` indicates the token does not correspond to
any entity.
Now that we've read the data in, let's create a train/validation split:
.. code-block:: python
from sklearn.model_selection import train_test_split
train_texts, val_texts, train_tags, val_tags = train_test_split(texts, tags, test_size=.2)
Next, let's create encodings for our tokens and tags. For the tags, we can start by just create a simple mapping
which we'll use in a moment:
.. code-block:: python
unique_tags = set(tag for doc in tags for tag in doc)
tag2id = {tag: id for id, tag in enumerate(unique_tags)}
id2tag = {id: tag for tag, id in tag2id.items()}
To encode the tokens, we'll use a pre-trained DistilBert tokenizer. We can tell the tokenizer that we're dealing
with ready-split tokens rather than full sentence strings by passing ``is_split_into_words=True``. We'll also pass
``padding=True`` and ``truncation=True`` to pad the sequences to be the same length. Lastly, we can tell the model
to return information about the tokens which are split by the wordpiece tokenization process, which we will need in
a moment.
.. code-block:: python
from transformers import DistilBertTokenizerFast
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-cased')
train_encodings = tokenizer(train_texts, is_split_into_words=True, return_offsets_mapping=True, padding=True, truncation=True)
val_encodings = tokenizer(val_texts, is_split_into_words=True, return_offsets_mapping=True, padding=True, truncation=True)
Great, so now our tokens are nicely encoded in the format that they need to be in to feed them into our DistilBert
model below.
Now we arrive at a common obstacle with using pre-trained models for token-level classification: many of the tokens
in the W-NUT corpus are not in DistilBert's vocabulary. Bert and many models like it use a method called WordPiece
Tokenization, meaning that single words are split into multiple tokens such that each token is likely to be in
the vocabulary. For example, DistilBert's tokenizer would split the Twitter handle ``@huggingface`` into the tokens
``['@', 'hugging', '##face']``. This is a problem for us because we have exactly one tag per token. If the tokenizer
splits a token into multiple sub-tokens, then we will end up with a mismatch between our tokens and our labels.
One way to handle this is to only train on the tag labels for the first subtoken of a split token. We can do this in
🤗 Transformers by setting the labels we wish to ignore to ``-100``. In the example above, if the label for
``@HuggingFace`` is ``3`` (indexing ``B-corporation``), we would set the labels of ``['@', 'hugging', '##face']`` to
``[3, -100, -100]``.
Let's write a function to do this. This is where we will use the ``offset_mapping`` from the tokenizer as mentioned
above. For each sub-token returned by the tokenizer, the offset mapping gives us a tuple indicating the sub-token's
start position and end position relative to the original token it was split from. That means that if the first
position in the tuple is anything other than ``0``, we will set its corresponding label to ``-100``. While we're at
it, we can also set labels to ``-100`` if the second position of the offset mapping is ``0``, since this means it must
be a special token like ``[PAD]`` or ``[CLS]``.
.. note::
Due to a recently fixed bug, -1 must be used instead of -100 when using TensorFlow in 🤗 Transformers <= 3.02.
.. code-block:: python
import numpy as np
def encode_tags(tags, encodings):
labels = [[tag2id[tag] for tag in doc] for doc in tags]
encoded_labels = []
for doc_labels, doc_offset in zip(labels, encodings.offset_mapping):
# create an empty array of -100
doc_enc_labels = np.ones(len(doc_offset),dtype=int) * -100
arr_offset = np.array(doc_offset)
# set labels whose first offset position is 0 and the second is not 0
doc_enc_labels[(arr_offset[:,0] == 0) & (arr_offset[:,1] != 0)] = doc_labels
encoded_labels.append(doc_enc_labels.tolist())
return encoded_labels
train_labels = encode_tags(train_tags, train_encodings)
val_labels = encode_tags(val_tags, val_encodings)
The hard part is now done. Just as in the sequence classification example above, we can create a dataset object:
.. code-block:: python
## PYTORCH CODE
import torch
class WNUTDataset(torch.utils.data.Dataset):
def __init__(self, encodings, labels):
self.encodings = encodings
self.labels = labels
def __getitem__(self, idx):
item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
item['labels'] = torch.tensor(self.labels[idx])
return item
def __len__(self):
return len(self.labels)
train_encodings.pop("offset_mapping") # we don't want to pass this to the model
val_encodings.pop("offset_mapping")
train_dataset = WNUTDataset(train_encodings, train_labels)
val_dataset = WNUTDataset(val_encodings, val_labels)
## TENSORFLOW CODE
import tensorflow as tf
train_encodings.pop("offset_mapping") # we don't want to pass this to the model
val_encodings.pop("offset_mapping")
train_dataset = tf.data.Dataset.from_tensor_slices((
dict(train_encodings),
train_labels
))
val_dataset = tf.data.Dataset.from_tensor_slices((
dict(val_encodings),
val_labels
))
Now load in a token classification model and specify the number of labels:
.. code-block:: python
## PYTORCH CODE
from transformers import DistilBertForTokenClassification
model = DistilBertForTokenClassification.from_pretrained('distilbert-base-cased', num_labels=len(unique_tags))
## TENSORFLOW CODE
from transformers import TFDistilBertForTokenClassification
model = TFDistilBertForTokenClassification.from_pretrained('distilbert-base-cased', num_labels=len(unique_tags))
The data and model are both ready to go. You can train the model either with
:class:`~transformers.Trainer`/:class:`~transformers.TFTrainer` or with native PyTorch/TensorFlow, exactly as in the
sequence classification example above.
- :ref:`ft_trainer`
- :ref:`ft_native`
.. _qa_squad:
Question Answering with SQuAD 2.0
-----------------------------------------------------------------------------------------------------------------------
.. note::
This dataset can be explored in the Hugging Face model hub (`SQuAD V2 <https://huggingface.co/datasets/squad_v2>`_), and can
be alternatively downloaded with the 🤗 NLP library with ``load_dataset("squad_v2")``.
Question answering comes in many forms. In this example, we'll look at the particular type of extractive QA that
involves answering a question about a passage by highlighting the segment of the passage that answers the question.
This involves fine-tuning a model which predicts a start position and an end position in the passage. We will use the
`Stanford Question Answering Dataset (SQuAD) 2.0 <https://rajpurkar.github.io/SQuAD-explorer/>`_.
We will start by downloading the data:
.. code-block:: bash
mkdir squad
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json -O squad/train-v2.0.json
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json -O squad/dev-v2.0.json
Each split is in a structured json file with a number of questions and answers for each passage (or context). We'll
take this apart into parallel lists of contexts, questions, and answers (note that the contexts here are repeated
since there are multiple questions per context):
.. code-block:: python
import json
from pathlib import Path
def read_squad(path):
path = Path(path)
with open(path, 'rb') as f:
squad_dict = json.load(f)
contexts = []
questions = []
answers = []
for group in squad_dict['data']:
for passage in group['paragraphs']:
context = passage['context']
for qa in passage['qas']:
question = qa['question']
for answer in qa['answers']:
contexts.append(context)
questions.append(question)
answers.append(answer)
return contexts, questions, answers
train_contexts, train_questions, train_answers = read_squad('squad/train-v2.0.json')
val_contexts, val_questions, val_answers = read_squad('squad/dev-v2.0.json')
The contexts and questions are just strings. The answers are dicts containing the subsequence of the passage with
the correct answer as well as an integer indicating the character at which the answer begins. In order to train a
model on this data we need (1) the tokenized context/question pairs, and (2) integers indicating at which *token*
positions the answer begins and ends.
First, let's get the *character* position at which the answer ends in the passage (we are given the starting
position). Sometimes SQuAD answers are off by one or two characters, so we will also adjust for that.
.. code-block:: python
def add_end_idx(answers, contexts):
for answer, context in zip(answers, contexts):
gold_text = answer['text']
start_idx = answer['answer_start']
end_idx = start_idx + len(gold_text)
# sometimes squad answers are off by a character or two fix this
if context[start_idx:end_idx] == gold_text:
answer['answer_end'] = end_idx
elif context[start_idx-1:end_idx-1] == gold_text:
answer['answer_start'] = start_idx - 1
answer['answer_end'] = end_idx - 1 # When the gold label is off by one character
elif context[start_idx-2:end_idx-2] == gold_text:
answer['answer_start'] = start_idx - 2
answer['answer_end'] = end_idx - 2 # When the gold label is off by two characters
add_end_idx(train_answers, train_contexts)
add_end_idx(val_answers, val_contexts)
Now ``train_answers`` and ``val_answers`` include the character end positions and the corrected start positions.
Next, let's tokenize our context/question pairs. 🤗 Tokenizers can accept parallel lists of sequences and encode
them together as sequence pairs.
.. code-block:: python
from transformers import DistilBertTokenizerFast
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
train_encodings = tokenizer(train_contexts, train_questions, truncation=True, padding=True)
val_encodings = tokenizer(val_contexts, val_questions, truncation=True, padding=True)
Next we need to convert our character start/end positions to token start/end positions. When using 🤗 Fast
Tokenizers, we can use the built in :func:`~transformers.BatchEncoding.char_to_token` method.
.. code-block:: python
def add_token_positions(encodings, answers):
start_positions = []
end_positions = []
for i in range(len(answers)):
start_positions.append(encodings.char_to_token(i, answers[i]['answer_start']))
end_positions.append(encodings.char_to_token(i, answers[i]['answer_end'] - 1))
# if None, the answer passage has been truncated
if start_positions[-1] is None:
start_positions[-1] = tokenizer.model_max_length
if end_positions[-1] is None:
end_positions[-1] = tokenizer.model_max_length
encodings.update({'start_positions': start_positions, 'end_positions': end_positions})
add_token_positions(train_encodings, train_answers)
add_token_positions(val_encodings, val_answers)
Our data is ready. Let's just put it in a PyTorch/TensorFlow dataset so that we can easily use it for
training. In PyTorch, we define a custom ``Dataset`` class. In TensorFlow, we pass a tuple of
``(inputs_dict, labels_dict)`` to the ``from_tensor_slices`` method.
.. code-block:: python
## PYTORCH CODE
import torch
class SquadDataset(torch.utils.data.Dataset):
def __init__(self, encodings):
self.encodings = encodings
def __getitem__(self, idx):
return {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
def __len__(self):
return len(self.encodings.input_ids)
train_dataset = SquadDataset(train_encodings)
val_dataset = SquadDataset(val_encodings)
## TENSORFLOW CODE
import tensorflow as tf
train_dataset = tf.data.Dataset.from_tensor_slices((
{key: train_encodings[key] for key in ['input_ids', 'attention_mask']},
{key: train_encodings[key] for key in ['start_positions', 'end_positions']}
))
val_dataset = tf.data.Dataset.from_tensor_slices((
{key: val_encodings[key] for key in ['input_ids', 'attention_mask']},
{key: val_encodings[key] for key in ['start_positions', 'end_positions']}
))
Now we can use a DistilBert model with a QA head for training:
.. code-block:: python
## PYTORCH CODE
from transformers import DistilBertForQuestionAnswering
model = DistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased")
## TENSORFLOW CODE
from transformers import TFDistilBertForQuestionAnswering
model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased")
The data and model are both ready to go. You can train the model with
:class:`~transformers.Trainer`/:class:`~transformers.TFTrainer` exactly as in the sequence classification example
above. If using native PyTorch, replace ``labels`` with ``start_positions`` and ``end_positions`` in the training
example. If using Keras's ``fit``, we need to make a minor modification to handle this example since it involves
multiple model outputs.
- :ref:`ft_trainer`
.. code-block:: python
## PYTORCH CODE
from torch.utils.data import DataLoader
from transformers import AdamW
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)
model.train()
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
optim = AdamW(model.parameters(), lr=5e-5)
for epoch in range(3):
for batch in train_loader:
optim.zero_grad()
input_ids = batch['input_ids'].to(device)
attention_mask = batch['attention_mask'].to(device)
start_positions = batch['start_positions'].to(device)
end_positions = batch['end_positions'].to(device)
outputs = model(input_ids, attention_mask=attention_mask, start_positions=start_positions, end_positions=end_positions)
loss = outputs[0]
loss.backward()
optim.step()
model.eval()
## TENSORFLOW CODE
# Keras will expect a tuple when dealing with labels
train_dataset = train_dataset.map(lambda x, y: (x, (y['start_positions'], y['end_positions'])))
# Keras will assign a separate loss for each output and add them together. So we'll just use the standard CE loss
# instead of using the built-in model.compute_loss, which expects a dict of outputs and averages the two terms.
# Note that this means the loss will be 2x of when using TFTrainer since we're adding instead of averaging them.
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.distilbert.return_dict = False # if using 🤗 Transformers >3.02, make sure outputs are tuples
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=loss) # can also use any keras loss fn
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)
.. _resources:
Additional Resources
-----------------------------------------------------------------------------------------------------------------------
- `How to train a new language model from scratch using Transformers and Tokenizers
<https://huggingface.co/blog/how-to-train>`_. Blog post showing the steps to load in Esperanto data and train a
masked language model from scratch.
- :doc:`Preprocessing <preprocessing>`. Docs page on data preprocessing.
- :doc:`Training <training>`. Docs page on training and fine-tuning.
.. _nlplib:
Using the 🤗 NLP Datasets & Metrics library
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This tutorial demonstrates how to read in datasets from various raw text formats and prepare them for training with
🤗 Transformers so that you can do the same thing with your own custom datasets. However, we recommend users use the
`🤗 NLP library <https://github.com/huggingface/nlp>`_ for working with the 150+ datasets included in the
`hub <https://huggingface.co/datasets>`_, including the three datasets used in this tutorial. As a very brief overview,
we will show how to use the NLP library to download and prepare the IMDb dataset from the first example,
:ref:`seq_imdb`.
Start by downloading the dataset:
.. code-block:: python
from nlp import load_dataset
train = load_dataset("imdb", split="train")
Each dataset has multiple columns corresponding to different features. Let's see what our columns are.
.. code-block:: python
>>> print(train.column_names)
['label', 'text']
Great. Now let's tokenize the text. We can do this using the ``map`` method. We'll also rename the ``label`` column
to ``labels`` to match the model's input arguments.
.. code-block:: python
train = train.map(lambda batch: tokenizer(batch["text"], truncation=True, padding=True), batched=True)
train.rename_column_("label", "labels")
Lastly, we can use the ``set_format`` method to determine which columns and in what data format we want to access
dataset elements.
.. code-block:: python
## PYTORCH CODE
>>> train.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
>>> {key: val.shape for key, val in train[0].items()})
{'labels': torch.Size([]), 'input_ids': torch.Size([512]), 'attention_mask': torch.Size([512])}
## TENSORFLOW CODE
>>> train.set_format("tensorflow", columns=["input_ids", "attention_mask", "labels"])
>>> {key: val.shape for key, val in train[0].items()})
{'labels': TensorShape([]), 'input_ids': TensorShape([512]), 'attention_mask': TensorShape([512])}
We now have a fully-prepared dataset. Check out `the 🤗 NLP docs <https://huggingface.co/nlp/processing.html>`_ for
a more thorough introduction.

View File

@@ -1,13 +1,13 @@
Glossary
^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
General terms
-------------
-----------------------------------------------------------------------------------------------------------------------
- autoencoding models: see MLM
- autoregressive models: see CLM
- CLM: causal language modeling, a pretraining task where the model reads the texts in order and has to predict the
next word. It's usually done by reading the whole sentence but using a mask inside the model to hide the future
next word. It's usually done by reading the whole sentence but using a mask inside the model to hide the future
tokens at a certain timestep.
- MLM: masked language modeling, a pretraining task where the model sees a corrupted version of the texts, usually done
by masking some tokens randomly, and has to predict the original text.
@@ -18,7 +18,7 @@ General terms
- NLU: natural language understanding, all tasks related to understanding what is in a text (for instance classifying
the whole text, individual words)
- pretrained model: a model that has been pretrained on some data (for instance all of Wikipedia). Pretraining methods
involve a self-supervised objective, which can be reading the text and trying to predict the next word (see CLM) or
involve a self-supervised objective, which can be reading the text and trying to predict the next word (see CLM) or
masking some words and trying to predict them (see MLM).
- RNN: recurrent neural network, a type of model that uses a loop over a layer to process texts.
- seq2seq or sequence-to-sequence: models that generate a new sequence from an input, like translation models, or
@@ -27,7 +27,7 @@ General terms
or a punctuation symbol.
Model inputs
------------
-----------------------------------------------------------------------------------------------------------------------
Every model is different yet bears similarities with the others. Therefore most models use the same inputs, which are
detailed here alongside usage examples.
@@ -35,7 +35,7 @@ detailed here alongside usage examples.
.. _input-ids:
Input IDs
~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The input ids are often the only required parameters to be passed to the model as input. *They are token indices,
numerical representations of tokens building the sequences that will be used as input by the model*.
@@ -43,7 +43,7 @@ numerical representations of tokens building the sequences that will be used as
Each tokenizer works differently but the underlying mechanism remains the same. Here's an example using the BERT
tokenizer, which is a `WordPiece <https://arxiv.org/pdf/1609.08144.pdf>`__ tokenizer:
::
.. code-block::
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
@@ -52,15 +52,15 @@ tokenizer, which is a `WordPiece <https://arxiv.org/pdf/1609.08144.pdf>`__ token
The tokenizer takes care of splitting the sequence into tokens available in the tokenizer vocabulary.
::
.. code-block::
>>> tokenized_sequence = tokenizer.tokenize(sequence)
The tokens are either words or subwords. Here for instance, "VRAM" wasn't in the model vocabulary, so it's been split
in "V", "RA" and "M". To indicate those tokens are not separate words but parts of the same word, a double-dash is
in "V", "RA" and "M". To indicate those tokens are not separate words but parts of the same word, a double-hash prefix is
added for "RA" and "M":
::
.. code-block::
>>> print(tokenized_sequence)
['A', 'Titan', 'R', '##T', '##X', 'has', '24', '##GB', 'of', 'V', '##RA', '##M']
@@ -69,28 +69,31 @@ These tokens can then be converted into IDs which are understandable by the mode
the sentence to the tokenizer, which leverages the Rust implementation of
`huggingface/tokenizers <https://github.com/huggingface/tokenizers>`__ for peak performance.
::
.. code-block::
>>> encoded_sequence = tokenizer(sequence)["input_ids"]
>>> inputs = tokenizer(sequence)
The tokenizer returns a dictionary with all the arguments necessary for its corresponding model to work properly. The
token indices are under the key "input_ids":
::
.. code-block::
>>> encoded_sequence = inputs["input_ids"]
>>> print(encoded_sequence)
[101, 138, 18696, 155, 1942, 3190, 1144, 1572, 13745, 1104, 159, 9664, 2107, 102]
Note that the tokenizer automatically adds "special tokens" (if the associated model rely on them) which are special
IDs the model sometimes uses. If we decode the previous sequence of ids,
Note that the tokenizer automatically adds "special tokens" (if the associated model relies on them) which are special
IDs the model sometimes uses.
::
If we decode the previous sequence of ids,
.. code-block::
>>> decoded_sequence = tokenizer.decode(encoded_sequence)
we will see
we will see
::
.. code-block::
>>> print(decoded_sequence)
[CLS] A Titan RTX has 24GB of VRAM [SEP]
@@ -100,14 +103,14 @@ because this is the way a :class:`~transformers.BertModel` is going to expect it
.. _attention-mask:
Attention mask
~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The attention mask is an optional argument used when batching sequences together. This argument indicates to the
model which tokens should be attended to, and which should not.
For example, consider these two sequences:
::
.. code-block::
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
@@ -120,34 +123,34 @@ For example, consider these two sequences:
The encoded versions have different lengths:
::
.. code-block::
>>> len(encoded_sequence_a), len(encoded_sequence_b)
(8, 19)
Therefore, we can't be put then together in a same tensor as-is. The first sequence needs to be padded up to the length
Therefore, we can't put them together in the same tensor as-is. The first sequence needs to be padded up to the length
of the second one, or the second one needs to be truncated down to the length of the first one.
In the first case, the list of IDs will be extended by the padding indices. We can pass a list to the tokenizer and ask
it to pad like this:
::
.. code-block::
>>> padded_sequences = tokenizer([sequence_a, sequence_b], padding=True)
We can see that 0s have been added on the right of the first sentence to make it the same length as the second one:
::
.. code-block::
>>> padded_sequences["input_ids"]
[[101, 1188, 1110, 170, 1603, 4954, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [101, 1188, 1110, 170, 1897, 1263, 4954, 119, 1135, 1110, 1120, 1655, 2039, 1190, 1103, 4954, 138, 119, 102]]
This can then be converted into a tensor in PyTorch or TensorFlow. The attention mask is a binary tensor indicating
the position of the padded indices so that the model does not attend to them. For the
:class:`~transformers.BertTokenizer`, :obj:`1` indicate a value that should be attended to while :obj:`0` indicate
:class:`~transformers.BertTokenizer`, :obj:`1` indicates a value that should be attended to, while :obj:`0` indicates
a padded value. This attention mask is in the dictionary returned by the tokenizer under the key "attention_mask":
::
.. code-block::
>>> padded_sequences["attention_mask"]
[[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
@@ -155,20 +158,20 @@ a padded value. This attention mask is in the dictionary returned by the tokeniz
.. _token-type-ids:
Token Type IDs
~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some models' purpose is to do sequence classification or question answering. These require two different sequences to
be encoded in the same input IDs. They are usually separated by special tokens, such as the classifier and separator
be joined in a single "input_ids" entry, which usually is performed with the help of special tokens, such as the classifier (``[CLS]``) and separator (``[SEP]``)
tokens. For example, the BERT model builds its two sequence input as such:
::
.. code-block::
>>> # [CLS] SEQUENCE_A [SEP] SEQUENCE_B [SEP]
We can use our tokenizer to automatically generate such a sentence by passing the two sequences as two arguments (and
not a list like before) like this:
We can use our tokenizer to automatically generate such a sentence by passing the two sequences to ``tokenizer`` as two arguments (and
not a list, like before) like this:
::
.. code-block::
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
@@ -180,36 +183,36 @@ not a list like before) like this:
which will return:
::
.. code-block::
>>> print(decoded)
[CLS] HuggingFace is based in NYC [SEP] Where is HuggingFace based? [SEP]
This is enough for some models to understand where one sequence ends and where another begins. However, other models
such as BERT have an additional mechanism, which are the token type IDs (also called segment IDs). They are a binary
mask identifying the different sequences in the model.
This is enough for some models to understand where one sequence ends and where another begins. However, other models,
such as BERT, also deploy token type IDs (also called segment IDs). They are represented as a binary
mask identifying the two types of sequence in the model.
The tokenizer returns in the dictionary under the key "token_type_ids":
The tokenizer returns this mask as the "token_type_ids" entry:
::
.. code-block::
>>> encoded_dict['token_type_ids']
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]
The first sequence, the "context" used for the question, has all its tokens represented by :obj:`0`, whereas the
question has all its tokens represented by :obj:`1`. Some models, like :class:`~transformers.XLNetModel` use an
additional token represented by a :obj:`2`.
The first sequence, the "context" used for the question, has all its tokens represented by a :obj:`0`, whereas the
second sequence, corresponding to the "question", has all its tokens represented by a :obj:`1`.
Some models, like :class:`~transformers.XLNetModel` use an additional token represented by a :obj:`2`.
.. _position-ids:
Position IDs
~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The position IDs are used by the model to identify which token is at which position. Contrary to RNNs that have the
position of each token embedded within them, transformers are unaware of the position of each token. The position
IDs are created for this purpose.
Contrary to RNNs that have the position of each token embedded within them,
transformers are unaware of the position of each token. Therefore, the position IDs (``position_ids``) are used by the model to identify each token's position in the list of tokens.
They are an optional parameter. If no position IDs are passed to the model, they are automatically created as absolute
They are an optional parameter. If no ``position_ids`` is passed to the model, the IDs are automatically created as absolute
positional embeddings.
Absolute positional embeddings are selected in the range ``[0, config.max_position_embeddings - 1]``. Some models
@@ -218,17 +221,17 @@ use other types of positional embeddings, such as sinusoidal position embeddings
.. _feed-forward-chunking:
Feed Forward Chunking
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In transformers two feed forward layers usually follows the self attention layer in each residual attention block.
In each residual attention block in transformers the self-attention layer is usually followed by 2 feed forward layers.
The intermediate embedding size of the feed forward layers is often bigger than the hidden size of the model (e.g.,
for ``bert-base-uncased``).
for ``bert-base-uncased``).
For an input of size ``[batch_size, sequence_length]``, the memory required to store the intermediate feed forward
embeddings ``[batch_size, sequence_length, config.intermediate_size]`` can account for a large fraction of the memory
use. The authors of `Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451>`_ noticed that since the
computation is independent of the ``sequence_length`` dimension, it is mathematically equivalent to compute the output
embeddings of both feed forward layers ``[batch_size, config.hidden_size]_0, ..., [batch_size, config.hidden_size]_n``
embeddings of both feed forward layers ``[batch_size, config.hidden_size]_0, ..., [batch_size, config.hidden_size]_n``
individually and concat them afterward to ``[batch_size, sequence_length, config.hidden_size]`` with
``n = sequence_length``, which trades increased computation time against reduced memory use, but yields a
mathematically **equivalent** result.

Binary file not shown.

After

Width:  |  Height:  |  Size: 352 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 418 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 373 KiB

View File

@@ -1,17 +1,17 @@
Transformers
================================================================================================================================================
=======================================================================================================================
State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
This is the documentation of our repository `transformers <https://github.com/huggingface/transformers>`_.
Features
---------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
- High performance on NLU and NLG tasks
- Low barrier to entry for educators and practitioners
@@ -36,7 +36,7 @@ Choose the right framework for every part of a model's lifetime:
- Seamlessly pick the right framework for training, evaluation, production
Contents
---------------------------------
-----------------------------------------------------------------------------------------------------------------------
The documentation is organized in five parts:
@@ -46,7 +46,10 @@ The documentation is organized in five parts:
- **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library.
- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general resarch in
transformers model
- **PACKAGE REFERENCE** contains the documentation of each public class and function.
- The three last section contain the documentation of each public class and function, grouped in:
- **MAIN CLASSES** for the main classes exposing the important APIs of the library.
- **MODELS** for the classes and functions related to each model implemented in the library.
- **INTERNAL HELPERS** for the classes and functions we use internally.
The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and
conversion utilities for the following models:
@@ -121,7 +124,26 @@ conversion utilities for the following models:
trained using `OPUS <http://opus.nlpl.eu/>`_ pretrained_models data by Jörg Tiedemann.
21. `Longformer <https://github.com/allenai/longformer>`_ (from AllenAI) released with the paper `Longformer: The
Long-Document Transformer <https://arxiv.org/abs/2004.05150>`_ by Iz Beltagy, Matthew E. Peters, and Arman Cohan.
22. `Other community models <https://huggingface.co/models>`_, contributed by the `community
22. `DPR <https://github.com/facebookresearch/DPR>`_ (from Facebook) released with the paper `Dense Passage Retrieval
for Open-Domain Question Answering <https://arxiv.org/abs/2004.04906>`_ by Vladimir Karpukhin, Barlas Oğuz, Sewon
Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
23. `Pegasus <https://github.com/google-research/pegasus>`_ (from Google) released with the paper `PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
<https://arxiv.org/abs/1912.08777>`_ by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
24. `MBart <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`_ (from Facebook) released with the paper `Multilingual Denoising Pre-training for Neural Machine Translation <https://arxiv.org/abs/2001.08210>`_ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov,
Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
25. `LXMERT <https://github.com/airsplay/lxmert>`_ (from UNC Chapel Hill) released with the paper `LXMERT: Learning
Cross-Modality Encoder Representations from Transformers for Open-Domain Question
Answering <https://arxiv.org/abs/1908.07490>`_ by Hao Tan and Mohit Bansal.
26. `Funnel Transformer <https://github.com/laiguokun/Funnel-Transformer>`_ (from CMU/Google Brain) released with the paper
`Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
<https://arxiv.org/abs/2006.03236>`_ by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
27. `Bert For Sequence Generation <https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder>`_ (from Google) released with the paper
`Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
<https://arxiv.org/abs/1907.12461>`_ by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
28. `LayoutLM <https://github.com/microsoft/unilm/tree/master/layoutlm>`_ (from Microsoft Research Asia) released with the paper
`LayoutLM: Pre-training of Text and Layout for Document Image Understanding
<https://arxiv.org/abs/1912.13318>`_ by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
29. `Other community models <https://huggingface.co/models>`_, contributed by the `community
<https://huggingface.co/users>`_.
.. toctree::
@@ -151,51 +173,78 @@ conversion utilities for the following models:
pretrained_models
examples
custom_datasets
notebooks
converting_tensorflow_models
migration
torchscript
contributing
testing
serialization
.. toctree::
:maxdepth: 2
:caption: Research
bertology
perplexity
benchmarks
.. toctree::
:maxdepth: 2
:caption: Package Reference
:caption: Main Classes
main_classes/configuration
main_classes/logging
main_classes/model
main_classes/tokenizer
main_classes/pipelines
main_classes/optimizer_schedules
main_classes/output
main_classes/pipelines
main_classes/processors
main_classes/tokenizer
main_classes/trainer
model_doc/auto
model_doc/encoderdecoder
model_doc/bert
model_doc/gpt
model_doc/transformerxl
model_doc/gpt2
model_doc/xlm
model_doc/xlnet
model_doc/roberta
model_doc/distilbert
model_doc/ctrl
model_doc/camembert
.. toctree::
:maxdepth: 2
:caption: Models
model_doc/albert
model_doc/xlmroberta
model_doc/flaubert
model_doc/auto
model_doc/bart
model_doc/t5
model_doc/electra
model_doc/bert
model_doc/bertgeneration
model_doc/camembert
model_doc/ctrl
model_doc/dialogpt
model_doc/reformer
model_doc/marian
model_doc/distilbert
model_doc/dpr
model_doc/electra
model_doc/encoderdecoder
model_doc/flaubert
model_doc/fsmt
model_doc/funnel
model_doc/layoutlm
model_doc/longformer
model_doc/retribert
model_doc/lxmert
model_doc/marian
model_doc/mbart
model_doc/mobilebert
model_doc/gpt
model_doc/gpt2
model_doc/pegasus
model_doc/rag
model_doc/reformer
model_doc/retribert
model_doc/roberta
model_doc/t5
model_doc/transformerxl
model_doc/xlm
model_doc/xlmroberta
model_doc/xlnet
.. toctree::
:maxdepth: 2
:caption: Internal Helpers
internal/modeling_utils
internal/pipelines_utils
internal/tokenization_utils

View File

@@ -22,13 +22,13 @@ When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be
pip install transformers
```
Alternatively, for CPU-support only, you can install 🤗 Transformers and PyTorch in one line with
Alternatively, for CPU-support only, you can install 🤗 Transformers and PyTorch in one line with:
```bash
pip install transformers[torch]
```
or 🤗 Transformers and TensorFlow 2.0 in one line with
or 🤗 Transformers and TensorFlow 2.0 in one line with:
```bash
pip install transformers[tf-cpu]
@@ -73,8 +73,8 @@ This library provides pretrained models that will be downloaded and cached local
folder given by the shell environment variable ``TRANSFORMERS_CACHE``. The default value for it will be the PyTorch
cache home followed by ``/transformers/`` (even if you don't have PyTorch installed). This is (by order of priority):
* shell environment variable ``ENV_TORCH_HOME``
* shell environment variable ``ENV_XDG_CACHE_HOME`` + ``/torch/``
* shell environment variable ``TORCH_HOME``
* shell environment variable ``XDG_CACHE_HOME`` + ``/torch/``
* default: ``~/.cache/torch/``
So if you don't have any specific environment variable set, the cache directory will be at

View File

@@ -0,0 +1,88 @@
Custom Layers and Utilities
-----------------------------------------------------------------------------------------------------------------------
This page lists all the custom layers used by the library, as well as the utility functions it provides for modeling.
Most of those are only useful if you are studying the code of the models in the library.
Pytorch custom modules
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_utils.Conv1D
.. autoclass:: transformers.modeling_utils.PoolerStartLogits
:members: forward
.. autoclass:: transformers.modeling_utils.PoolerEndLogits
:members: forward
.. autoclass:: transformers.modeling_utils.PoolerAnswerClass
:members: forward
.. autoclass:: transformers.modeling_utils.SquadHeadOutput
.. autoclass:: transformers.modeling_utils.SQuADHead
:members: forward
.. autoclass:: transformers.modeling_utils.SequenceSummary
:members: forward
PyTorch Helper Functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.apply_chunking_to_forward
.. autofunction:: transformers.modeling_utils.find_pruneable_heads_and_indices
.. autofunction:: transformers.modeling_utils.prune_layer
.. autofunction:: transformers.modeling_utils.prune_conv1d_layer
.. autofunction:: transformers.modeling_utils.prune_linear_layer
TensorFlow custom layers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_utils.TFConv1D
.. autoclass:: transformers.modeling_tf_utils.TFSharedEmbeddings
:members: call
.. autoclass:: transformers.modeling_tf_utils.TFSequenceSummary
:members: call
TensorFlow loss functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_utils.TFCausalLanguageModelingLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFMaskedLanguageModelingLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFMultipleChoiceLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFQuestionAnsweringLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFSequenceClassificationLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFTokenClassificationLoss
:members:
TensorFlow Helper Functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.modeling_tf_utils.cast_bool_to_primitive
.. autofunction:: transformers.modeling_tf_utils.get_initializer
.. autofunction:: transformers.modeling_tf_utils.keras_serializable
.. autofunction:: transformers.modeling_tf_utils.shape_list

View File

@@ -0,0 +1,40 @@
Utilities for pipelines
-----------------------------------------------------------------------------------------------------------------------
This page lists all the utility functions the library provides for pipelines.
Most of those are only useful if you are studying the code of the models in the library.
Argument handling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.pipelines.ArgumentHandler
.. autoclass:: transformers.pipelines.ZeroShotClassificationArgumentHandler
.. autoclass:: transformers.pipelines.QuestionAnsweringArgumentHandler
Data format
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.pipelines.PipelineDataFormat
:members:
.. autoclass:: transformers.pipelines.CsvPipelineDataFormat
:members:
.. autoclass:: transformers.pipelines.JsonPipelineDataFormat
:members:
.. autoclass:: transformers.pipelines.PipedPipelineDataFormat
:members:
Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.pipelines.get_framework
.. autoclass:: transformers.pipelines.PipelineException

View File

@@ -0,0 +1,38 @@
Utilities for Tokenizers
-----------------------------------------------------------------------------------------------------------------------
This page lists all the utility functions used by the tokenizers, mainly the class
:class:`~transformers.tokenization_utils_base.PreTrainedTokenizerBase` that implements the common methods between
:class:`~transformers.PreTrainedTokenizer` and :class:`~transformers.PreTrainedTokenizerFast` and the mixin
:class:`~transformers.tokenization_utils_base.SpecialTokensMixin`.
Most of those are only useful if you are studying the code of the tokenizers in the library.
PreTrainedTokenizerBase
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.tokenization_utils_base.PreTrainedTokenizerBase
:special-members: __call__
:members:
SpecialTokensMixin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.tokenization_utils_base.SpecialTokensMixin
:members:
Enums and namedtuples
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.tokenization_utils_base.ExplicitEnum
.. autoclass:: transformers.tokenization_utils_base.PaddingStrategy
.. autoclass:: transformers.tokenization_utils_base.TensorType
.. autoclass:: transformers.tokenization_utils_base.TruncationStrategy
.. autoclass:: transformers.tokenization_utils_base.CharSpan
.. autoclass:: transformers.tokenization_utils_base.TokenSpan

View File

@@ -1,10 +1,13 @@
Configuration
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
The base class ``PretrainedConfig`` implements the common methods for loading/saving a configuration either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded from HuggingFace's AWS S3 repository).
The base class :class:`~transformers.PretrainedConfig` implements the common methods for loading/saving a configuration
either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded
from HuggingFace's AWS S3 repository).
``PretrainedConfig``
~~~~~~~~~~~~~~~~~~~~~
PretrainedConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PretrainedConfig
:members:

View File

@@ -0,0 +1,58 @@
Logging
-----------------------------------------------------------------------------------------------------------------------
🤗 Transformers has a centralized logging system, so that you can setup the verbosity of the library easily.
Currently the default verbosity of the library is ``WARNING``.
To change the level of verbosity, just use one of the direct setters. For instance, here is how to change the verbosity
to the INFO level.
.. code-block:: python
import transformers
transformers.logging.set_verbosity_info()
You can also use the environment variable ``TRANSFORMERS_VERBOSITY`` to override the default verbosity. You can set it
to one of the following: ``debug``, ``info``, ``warning``, ``error``, ``critical``. For example:
.. code-block:: bash
TRANSFORMERS_VERBOSITY=error ./myprogram.py
All the methods of this logging module are documented below, the main ones are
:func:`transformers.logging.get_verbosity` to get the current level of verbosity in the logger and
:func:`transformers.logging.set_verbosity` to set the verbosity to the level of your choice. In order (from the least
verbose to the most verbose), those levels (with their corresponding int values in parenthesis) are:
- :obj:`transformers.logging.CRITICAL` or :obj:`transformers.logging.FATAL` (int value, 50): only report the most
critical errors.
- :obj:`transformers.logging.ERROR` (int value, 40): only report errors.
- :obj:`transformers.logging.WARNING` or :obj:`transformers.logging.WARN` (int value, 30): only reports error and
warnings. This the default level used by the library.
- :obj:`transformers.logging.INFO` (int value, 20): reports error, warnings and basic information.
- :obj:`transformers.logging.DEBUG` (int value, 10): report all information.
Base setters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.logging.set_verbosity_error
.. autofunction:: transformers.logging.set_verbosity_warning
.. autofunction:: transformers.logging.set_verbosity_info
.. autofunction:: transformers.logging.set_verbosity_debug
Other functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.logging.get_verbosity
.. autofunction:: transformers.logging.set_verbosity
.. autofunction:: transformers.logging.get_logger
.. autofunction:: transformers.logging.enable_explicit_format
.. autofunction:: transformers.logging.reset_format

View File

@@ -1,27 +1,55 @@
Models
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
The base class ``PreTrainedModel`` implements the common methods for loading/saving a model either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded from HuggingFace's AWS S3 repository).
The base classes :class:`~transformers.PreTrainedModel` and :class:`~transformers.TFPreTrainedModel` implement the
common methods for loading/saving a model either from a local file or directory, or from a pretrained model
configuration provided by the library (downloaded from HuggingFace's AWS S3 repository).
``PreTrainedModel`` also implements a few methods which are common among all the models to:
:class:`~transformers.PreTrainedModel` and :class:`~transformers.TFPreTrainedModel` also implement a few methods which
are common among all the models to:
- resize the input token embeddings when new tokens are added to the vocabulary
- prune the attention heads of the model.
``PreTrainedModel``
~~~~~~~~~~~~~~~~~~~~~
The other methods that are common to each model are defined in :class:`~transformers.modeling_utils.ModuleUtilsMixin`
(for the PyTorch models) and :class:`~transformers.modeling_tf_utils.TFModuleUtilsMixin` (for the TensorFlow models) or
for text generation, :class:`~transformers.generation_utils.GenerationMixin` (for the PyTorch models) and
:class:`~transformers.generation_tf_utils.TFGenerationMixin` (for the TensorFlow models)
PreTrainedModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PreTrainedModel
:members:
``Helper Functions``
~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.apply_chunking_to_forward
ModuleUtilsMixin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_utils.ModuleUtilsMixin
:members:
``TFPreTrainedModel``
~~~~~~~~~~~~~~~~~~~~~
TFPreTrainedModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFPreTrainedModel
:members:
TFModelUtilsMixin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_utils.TFModelUtilsMixin
:members:
Generative models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.generation_utils.GenerationMixin
:members:
.. autoclass:: transformers.generation_tf_utils.TFGenerationMixin
:members:

View File

@@ -1,5 +1,5 @@
Optimization
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
The ``.optimization`` module provides:
@@ -7,24 +7,29 @@ The ``.optimization`` module provides:
- several schedules in the form of schedule objects that inherit from ``_LRSchedule``:
- a gradient accumulation class to accumulate the gradients of multiple batches
``AdamW`` (PyTorch)
~~~~~~~~~~~~~~~~~~~
AdamW (PyTorch)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AdamW
:members:
``AdamWeightDecay`` (TensorFlow)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
AdaFactor (PyTorch)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.Adafactor
AdamWeightDecay (TensorFlow)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AdamWeightDecay
.. autofunction:: transformers.create_optimizer
Schedules
~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Learning Rate Schedules (Pytorch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: transformers.get_constant_schedule
@@ -57,16 +62,16 @@ Learning Rate Schedules (Pytorch)
:target: /imgs/warmup_linear_schedule.png
:alt:
``Warmup`` (TensorFlow)
^^^^^^^^^^^^^^^^^^^^^^^
Warmup (TensorFlow)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.WarmUp
:members:
Gradient Strategies
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``GradientAccumulator`` (TensorFlow)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
GradientAccumulator (TensorFlow)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.GradientAccumulator

View File

@@ -0,0 +1,260 @@
Model outputs
-----------------------------------------------------------------------------------------------------------------------
PyTorch models have outputs that are instances of subclasses of :class:`~transformers.file_utils.ModelOutput`. Those
are data structures containing all the information returned by the model, but that can also be used as tuples or
dictionaries.
Let's see of this looks on an example:
.. code-block::
from transformers import BertTokenizer, BertForSequenceClassification
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
outputs = model(**inputs, labels=labels)
The ``outputs`` object is a :class:`~transformers.modeling_outputs.SequenceClassifierOutput`, as we can see in the
documentation of that class below, it means it has an optional ``loss``, a ``logits`` an optional ``hidden_states`` and
an optional ``attentions`` attribute. Here we have the ``loss`` since we passed along ``labels``, but we don't have
``hidden_states`` and ``attentions`` because we didn't pass ``output_hidden_states=True`` or
``output_attentions=True``.
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
will get ``None``. Here for instance ``outputs.loss`` is the loss computed by the model, and ``outputs.attentions`` is
``None``.
When considering our ``outputs`` object as tuple, it only considers the attributes that don't have ``None`` values.
Here for instance, it has two elements, ``loss`` then ``logits``, so
.. code-block::
outputs[:2]
will return the tuple ``(outputs.loss, outputs.logits)`` for instance.
When considering our ``outputs`` object as dictionary, it only considers the attributes that don't have ``None``
values. Here for instance, it has two keys that are ``loss`` and ``logits``.
We document here the generic model outputs that are used by more than one model type. Specific output types are
documented on their corresponding model page.
ModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.file_utils.ModelOutput
:members:
BaseModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.BaseModelOutput
:members:
BaseModelOutputWithPooling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.BaseModelOutputWithPooling
:members:
BaseModelOutputWithPast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.BaseModelOutputWithPast
:members:
Seq2SeqModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.Seq2SeqModelOutput
:members:
CausalLMOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.CausalLMOutput
:members:
CausalLMOutputWithPast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.CausalLMOutputWithPast
:members:
MaskedLMOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.MaskedLMOutput
:members:
Seq2SeqLMOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.Seq2SeqLMOutput
:members:
NextSentencePredictorOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.NextSentencePredictorOutput
:members:
SequenceClassifierOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.SequenceClassifierOutput
:members:
Seq2SeqSequenceClassifierOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.Seq2SeqSequenceClassifierOutput
:members:
MultipleChoiceModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.MultipleChoiceModelOutput
:members:
TokenClassifierOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.TokenClassifierOutput
:members:
QuestionAnsweringModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.QuestionAnsweringModelOutput
:members:
Seq2SeqQuestionAnsweringModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.Seq2SeqQuestionAnsweringModelOutput
:members:
TFBaseModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFBaseModelOutput
:members:
TFBaseModelOutputWithPooling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFBaseModelOutputWithPooling
:members:
TFBaseModelOutputWithPast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFBaseModelOutputWithPast
:members:
TFSeq2SeqModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFSeq2SeqModelOutput
:members:
TFCausalLMOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFCausalLMOutput
:members:
TFCausalLMOutputWithPast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFCausalLMOutputWithPast
:members:
TFMaskedLMOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFMaskedLMOutput
:members:
TFSeq2SeqLMOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFSeq2SeqLMOutput
:members:
TFNextSentencePredictorOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFNextSentencePredictorOutput
:members:
TFSequenceClassifierOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFSequenceClassifierOutput
:members:
TFSeq2SeqSequenceClassifierOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFSeq2SeqSequenceClassifierOutput
:members:
TFMultipleChoiceModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFMultipleChoiceModelOutput
:members:
TFTokenClassifierOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFTokenClassifierOutput
:members:
TFQuestionAnsweringModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFQuestionAnsweringModelOutput
:members:
TFSeq2SeqQuestionAnsweringModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFSeq2SeqQuestionAnsweringModelOutput
:members:

View File

@@ -1,18 +1,30 @@
Pipelines
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most
of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity
Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering.
Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the
:doc:`task summary <../task_summary>` for examples of use.
There are two categories of pipeline abstractions to be aware about:
- The :func:`~transformers.pipeline` which is the most powerful object encapsulating all other pipelines
- The other task-specific pipelines, such as :class:`~transformers.TokenClassificationPipeline`
or :class:`~transformers.QuestionAnsweringPipeline`
- The :func:`~transformers.pipeline` which is the most powerful object encapsulating all other pipelines.
- The other task-specific pipelines:
- :class:`~transformers.ConversationalPipeline`
- :class:`~transformers.FeatureExtractionPipeline`
- :class:`~transformers.FillMaskPipeline`
- :class:`~transformers.QuestionAnsweringPipeline`
- :class:`~transformers.SummarizationPipeline`
- :class:`~transformers.TextClassificationPipeline`
- :class:`~transformers.TextGenerationPipeline`
- :class:`~transformers.TokenClassificationPipeline`
- :class:`~transformers.TranslationPipeline`
- :class:`~transformers.ZeroShotClassificationPipeline`
- :class:`~transformers.Text2TextGenerationPipeline`
The pipeline abstraction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The `pipeline` abstraction is a wrapper around all the other available pipelines. It is instantiated as any
other pipeline but requires an additional argument which is the `task`.
@@ -21,53 +33,88 @@ other pipeline but requires an additional argument which is the `task`.
The task specific pipelines
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Parent class: Pipeline
=========================================
ConversationalPipeline
=======================================================================================================================
.. autoclass:: transformers.Pipeline
:members: predict, transform, save_pretrained
.. autoclass:: transformers.Conversation
TokenClassificationPipeline
==========================================
.. autoclass:: transformers.TokenClassificationPipeline
NerPipeline
==========================================
This class is an alias of the :class:`~transformers.TokenClassificationPipeline` defined above. Please refer to that pipeline for
documentation and usage examples.
FillMaskPipeline
==========================================
.. autoclass:: transformers.FillMaskPipeline
.. autoclass:: transformers.ConversationalPipeline
:special-members: __call__
:members:
FeatureExtractionPipeline
==========================================
=======================================================================================================================
.. autoclass:: transformers.FeatureExtractionPipeline
:special-members: __call__
:members:
TextClassificationPipeline
==========================================
FillMaskPipeline
=======================================================================================================================
.. autoclass:: transformers.TextClassificationPipeline
.. autoclass:: transformers.FillMaskPipeline
:special-members: __call__
:members:
NerPipeline
=======================================================================================================================
This class is an alias of the :class:`~transformers.TokenClassificationPipeline` defined below. Please refer to that
pipeline for documentation and usage examples.
QuestionAnsweringPipeline
==========================================
=======================================================================================================================
.. autoclass:: transformers.QuestionAnsweringPipeline
:special-members: __call__
:members:
SummarizationPipeline
==========================================
=======================================================================================================================
.. autoclass:: transformers.SummarizationPipeline
:special-members: __call__
:members:
TextClassificationPipeline
=======================================================================================================================
.. autoclass:: transformers.TextClassificationPipeline
:special-members: __call__
:members:
TextGenerationPipeline
==========================================
=======================================================================================================================
.. autoclass:: transformers.TextGenerationPipeline
:special-members: __call__
:members:
Text2TextGenerationPipeline
=======================================================================================================================
.. autoclass:: transformers.Text2TextGenerationPipeline
:special-members: __call__
:members:
TokenClassificationPipeline
=======================================================================================================================
.. autoclass:: transformers.TokenClassificationPipeline
:special-members: __call__
:members:
ZeroShotClassificationPipeline
=======================================================================================================================
.. autoclass:: transformers.ZeroShotClassificationPipeline
:special-members: __call__
:members:
Parent class: :obj:`Pipeline`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.Pipeline
:members:

View File

@@ -1,11 +1,11 @@
Processors
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
This library includes processors for several traditional tasks. These processors can be used to process a dataset into
examples that can be fed to a model.
Processors
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All processors follow the same architecture which is that of the
:class:`~transformers.data.processors.utils.DataProcessor`. The processor returns a list
@@ -26,7 +26,7 @@ of :class:`~transformers.data.processors.utils.InputExample`. These
GLUE
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`General Language Understanding Evaluation (GLUE) <https://gluebenchmark.com/>`__ is a benchmark that evaluates
the performance of models across a diverse set of existing NLU tasks. It was released together with the paper
@@ -52,13 +52,13 @@ Additionally, the following method can be used to load values from a data file
.. automethod:: transformers.data.processors.glue.glue_convert_examples_to_features
Example usage
^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
An example using these processors is given in the `run_glue.py <https://github.com/huggingface/pytorch-transformers/blob/master/examples/text-classification/run_glue.py>`__ script.
XNLI
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`The Cross-Lingual NLI Corpus (XNLI) <https://www.nyu.edu/projects/bowman/xnli/>`__ is a benchmark that evaluates
the quality of cross-lingual text representations.
@@ -78,7 +78,7 @@ An example using these processors is given in the
SQuAD
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`The Stanford Question Answering Dataset (SQuAD) <https://rajpurkar.github.io/SQuAD-explorer//>`__ is a benchmark that evaluates
the performance of models on question answering. Two versions are available, v1.1 and v2.0. The first version (v1.1) was released together with the paper
@@ -88,7 +88,7 @@ the paper `Know What You Don't Know: Unanswerable Questions for SQuAD <https://a
This library hosts a processor for each of the two versions:
Processors
^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Those processors are:
- :class:`~transformers.data.processors.utils.SquadV1Processor`
@@ -109,7 +109,7 @@ Examples are given below.
Example usage
^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example using the processors as well as the conversion method using data files:
Example::

View File

@@ -1,40 +1,59 @@
Tokenizer
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
A tokenizer is in charge of preparing the inputs for a model. The library comprise tokenizers for all the models. Most of the tokenizers are available in two flavors: a full python implementation and a "Fast" implementation based on the Rust library `tokenizers`. The "Fast" implementations allows (1) a significant speed-up in particular when doing batched tokenization and (2) additional methods to map between the original string (character and words) and the token space (e.g. getting the index of the token comprising a given character or the span of characters corresponding to a given token). Currently no "Fast" implementation is available for the SentencePiece-based tokenizers (for T5, ALBERT, CamemBERT, XLMRoBERTa and XLNet models).
A tokenizer is in charge of preparing the inputs for a model. The library contains tokenizers for all the models. Most
of the tokenizers are available in two flavors: a full python implementation and a "Fast" implementation based on the
Rust library `tokenizers <https://github.com/huggingface/tokenizers>`__. The "Fast" implementations allows:
The base classes ``PreTrainedTokenizer`` and ``PreTrainedTokenizerFast`` implements the common methods for encoding string inputs in model inputs (see below) and instantiating/saving python and "Fast" tokenizers either from a local file or directory or from a pretrained tokenizer provided by the library (downloaded from HuggingFace's AWS S3 repository).
1. a significant speed-up in particular when doing batched tokenization and
2. additional methods to map between the original string (character and words) and the token space (e.g. getting the
index of the token comprising a given character or the span of characters corresponding to a given token). Currently
no "Fast" implementation is available for the SentencePiece-based tokenizers (for T5, ALBERT, CamemBERT, XLMRoBERTa
and XLNet models).
``PreTrainedTokenizer`` and ``PreTrainedTokenizerFast`` thus implements the main methods for using all the tokenizers:
The base classes :class:`~transformers.PreTrainedTokenizer` and :class:`~transformers.PreTrainedTokenizerFast`
implement the common methods for encoding string inputs in model inputs (see below) and instantiating/saving python and
"Fast" tokenizers either from a local file or directory or from a pretrained tokenizer provided by the library
(downloaded from HuggingFace's AWS S3 repository). They both rely on
:class:`~transformers.tokenization_utils_base.PreTrainedTokenizerBase` that contains the common methods, and
:class:`~transformers.tokenization_utils_base.SpecialTokensMixin`.
- tokenizing (spliting strings in sub-word token strings), converting tokens strings to ids and back, and encoding/decoding (i.e. tokenizing + convert to integers),
- adding new tokens to the vocabulary in a way that is independant of the underlying structure (BPE, SentencePiece...),
- managing special tokens like mask, beginning-of-sentence, etc tokens (adding them, assigning them to attributes in the tokenizer for easy access and making sure they are not split during tokenization)
:class:`~transformers.PreTrainedTokenizer` and :class:`~transformers.PreTrainedTokenizerFast` thus implement the main
methods for using all the tokenizers:
``BatchEncoding`` holds the output of the tokenizer's encoding methods (``__call__``, ``encode_plus`` and ``batch_encode_plus``) and is derived from a Python dictionary. When the tokenizer is a pure python tokenizer, this class behave just like a standard python dictionary and hold the various model inputs computed by these methodes (``input_ids``, ``attention_mask``...). When the tokenizer is a "Fast" tokenizer (i.e. backed by HuggingFace tokenizers library), this class provides in addition several advanced alignement methods which can be used to map between the original string (character and words) and the token space (e.g. getting the index of the token comprising a given character or the span of characters corresponding to a given token).
- Tokenizing (splitting strings in sub-word token strings), converting tokens strings to ids and back, and
encoding/decoding (i.e., tokenizing and converting to integers).
- Adding new tokens to the vocabulary in a way that is independent of the underlying structure (BPE, SentencePiece...).
- Managing special tokens (like mask, beginning-of-sentence, etc.): adding them, assigning them to attributes in the
tokenizer for easy access and making sure they are not split during tokenization.
``PreTrainedTokenizer``
~~~~~~~~~~~~~~~~~~~~~~~~
:class:`~transformers.BatchEncoding` holds the output of the tokenizer's encoding methods (``__call__``,
``encode_plus`` and ``batch_encode_plus``) and is derived from a Python dictionary. When the tokenizer is a pure python
tokenizer, this class behaves just like a standard python dictionary and holds the various model inputs computed by these
methods (``input_ids``, ``attention_mask``...). When the tokenizer is a "Fast" tokenizer (i.e., backed by HuggingFace
`tokenizers library <https://github.com/huggingface/tokenizers>`__), this class provides in addition several advanced
alignment methods which can be used to map between the original string (character and words) and the token space (e.g.,
getting the index of the token comprising a given character or the span of characters corresponding to a given token).
PreTrainedTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PreTrainedTokenizer
:special-members: __call__
:members:
``PreTrainedTokenizerFast``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PreTrainedTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PreTrainedTokenizerFast
:special-members: __call__
:members:
``BatchEncoding``
~~~~~~~~~~~~~~~~~~~~~~~~
BatchEncoding
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BatchEncoding
:members:
``SpecialTokensMixin``
~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SpecialTokensMixin
:members:

View File

@@ -1,45 +1,75 @@
Trainer
----------
The :class:`~transformers.Trainer` and :class:`~transformers.TFTrainer` classes provide an API for feature-complete
training in most standard use cases. It's used in most of the :doc:`example scripts <../examples>`.
Before instantiating your :class:`~transformers.Trainer`/:class:`~transformers.TFTrainer`, create a
:class:`~transformers.TrainingArguments`/:class:`~transformers.TFTrainingArguments` to access all the points of
customization during training.
The API supports distributed training on multiple GPUs/TPUs, mixed precision through `NVIDIA Apex
<https://github.com/NVIDIA/apex>`__ for PyTorch and :obj:`tf.keras.mixed_precision` for TensorFlow.
``Trainer``
~~~~~~~~~~~
.. autoclass:: transformers.Trainer
:members:
``TFTrainer``
~~~~~~~~~~~~~
.. autoclass:: transformers.TFTrainer
:members:
``TrainingArguments``
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TrainingArguments
:members:
``TFTrainingArguments``
~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFTrainingArguments
:members:
Utilities
~~~~~~~~~
.. autoclass:: transformers.EvalPrediction
.. autofunction:: transformers.set_seed
.. autofunction:: transformers.torch_distributed_zero_first
Trainer
-----------------------------------------------------------------------------------------------------------------------
The :class:`~transformers.Trainer` and :class:`~transformers.TFTrainer` classes provide an API for feature-complete
training in most standard use cases. It's used in most of the :doc:`example scripts <../examples>`.
Before instantiating your :class:`~transformers.Trainer`/:class:`~transformers.TFTrainer`, create a
:class:`~transformers.TrainingArguments`/:class:`~transformers.TFTrainingArguments` to access all the points of
customization during training.
The API supports distributed training on multiple GPUs/TPUs, mixed precision through `NVIDIA Apex
<https://github.com/NVIDIA/apex>`__ for PyTorch and :obj:`tf.keras.mixed_precision` for TensorFlow.
Both :class:`~transformers.Trainer` and :class:`~transformers.TFTrainer` contain the basic training loop supporting the
previous features. To inject custom behavior you can subclass them and override the following methods:
- **get_train_dataloader**/**get_train_tfdataset** -- Creates the training DataLoader (PyTorch) or TF Dataset.
- **get_eval_dataloader**/**get_eval_tfdataset** -- Creates the evaulation DataLoader (PyTorch) or TF Dataset.
- **get_test_dataloader**/**get_test_tfdataset** -- Creates the test DataLoader (PyTorch) or TF Dataset.
- **log** -- Logs information on the various objects watching training.
- **setup_wandb** -- Setups wandb (see `here <https://docs.wandb.com/huggingface>`__ for more information).
- **create_optimizer_and_scheduler** -- Setups the optimizer and learning rate scheduler if they were not passed at
init.
- **compute_loss** - Computes the loss on a batch of training inputs.
- **training_step** -- Performs a training step.
- **prediction_step** -- Performs an evaluation/test step.
- **run_model** (TensorFlow only) -- Basic pass through the model.
- **evaluate** -- Runs an evaluation loop and returns metrics.
- **predict** -- Returns predictions (with metrics if labels are available) on a test set.
Here is an example of how to customize :class:`~transformers.Trainer` using a custom loss function:
.. code-block:: python
from transformers import Trainer
class MyTrainer(Trainer):
def compute_loss(self, model, inputs):
labels = inputs.pop("labels")
outputs = models(**inputs)
logits = outputs[0]
return my_custom_loss(logits, labels)
Trainer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.Trainer
:members:
TFTrainer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFTrainer
:members:
TrainingArguments
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TrainingArguments
:members:
TFTrainingArguments
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFTrainingArguments
:members:
Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.EvalPrediction
.. autofunction:: transformers.set_seed
.. autofunction:: transformers.torch_distributed_zero_first

View File

@@ -1,15 +1,16 @@
ALBERT
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ALBERT model was proposed in `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_
by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. It presents
two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT:
The ALBERT model was proposed in `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
<https://arxiv.org/abs/1909.11942>`__ by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma,
Radu Soricut. It presents two parameter-reduction techniques to lower memory consumption and increase the training
speed of BERT:
- Splitting the embedding matrix into two smaller matrices
- Using repeating layers split among groups
- Splitting the embedding matrix into two smaller matrices.
- Using repeating layers split among groups.
The abstract from the paper is the following:
@@ -30,102 +31,126 @@ Tips:
similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same
number of (repeating) layers.
The original code can be found `here <https://github.com/google-research/ALBERT>`_.
The original code can be found `here <https://github.com/google-research/ALBERT>`__.
AlbertConfig
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertConfig
:members:
AlbertTokenizer
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
Albert specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_albert.AlbertForPreTrainingOutput
:members:
.. autoclass:: transformers.modeling_tf_albert.TFAlbertForPreTrainingOutput
:members:
AlbertModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertModel
:members:
:members: forward
AlbertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForPreTraining
:members: forward
AlbertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForMaskedLM
:members:
:members: forward
AlbertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForSequenceClassification
:members:
:members: forward
AlbertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForMultipleChoice
:members:
AlbertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForTokenClassification
:members:
:members: forward
AlbertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForQuestionAnswering
:members:
:members: forward
TFAlbertModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertModel
:members:
:members: call
TFAlbertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForPreTraining
:members: call
TFAlbertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForMaskedLM
:members:
:members: call
TFAlbertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForSequenceClassification
:members:
:members: call
TFAlbertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForMultipleChoice
:members:
:members: call
TFAlbertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForTokenClassification
:members:
:members: call
TFAlbertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForQuestionAnswering
:members:
:members: call

View File

@@ -1,109 +1,131 @@
AutoModels
-----------
AutoClasses
-----------------------------------------------------------------------------------------------------------------------
In many cases, the architecture you want to use can be guessed from the name or the path of the pretrained model you
are supplying to the ``from_pretrained`` method.
are supplying to the :obj:`from_pretrained()` method.
AutoClasses are here to do this job for you so that you automatically retrieve the relevant model given the name/path
to the pretrained weights/config/vocabulary:
to the pretrained weights/config/vocabulary.
Instantiating one of ``AutoModel``, ``AutoConfig`` and ``AutoTokenizer`` will directly create a class of the relevant
architecture (ex: ``model = AutoModel.from_pretrained('bert-base-cased')`` will create a instance of
:class:`~transformers.BertModel`).
Instantiating one of :class:`~transformers.AutoConfig`, :class:`~transformers.AutoModel`, and
:class:`~transformers.AutoTokenizer` will directly create a class of the relevant architecture. For instance
``AutoConfig``
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
model = AutoModel.from_pretrained('bert-base-cased')
will create a model that is an instance of :class:`~transformers.BertModel`.
There is one class of :obj:`AutoModel` for each task, and for each backend (PyTorch or TensorFlow).
AutoConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoConfig
:members:
``AutoTokenizer``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
AutoTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoTokenizer
:members:
``AutoModel``
~~~~~~~~~~~~~~~~~~~~~
AutoModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModel
:members:
``AutoModelForPreTraining``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
AutoModelForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForPreTraining
:members:
``AutoModelWithLMHead``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
AutoModelWithLMHead
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelWithLMHead
:members:
``AutoModelForSequenceClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
AutoModelForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForSequenceClassification
:members:
``AutoModelForQuestionAnswering``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
AutoModelForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForMultipleChoice
:members:
AutoModelForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForTokenClassification
:members:
AutoModelForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForQuestionAnswering
:members:
``AutoModelForTokenClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForTokenClassification
:members:
``TFAutoModel``
~~~~~~~~~~~~~~~~~~~~~
TFAutoModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModel
:members:
``TFAutoModelForPreTraining``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TFAutoModelForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForPreTraining
:members:
``TFAutoModelWithLMHead``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TFAutoModelWithLMHead
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelWithLMHead
:members:
``TFAutoModelForSequenceClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TFAutoModelForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForSequenceClassification
:members:
``TFAutoModelForQuestionAnswering``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TFAutoModelForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForQuestionAnswering
.. autoclass:: transformers.TFAutoModelForMultipleChoice
:members:
``TFAutoModelForTokenClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TFAutoModelForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForTokenClassification
:members:
TFAutoModelForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForQuestionAnswering
:members:

View File

@@ -1,11 +1,11 @@
Bart
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** If you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
@sshleifer
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Bart model was `proposed <https://arxiv.org/abs/1910.13461>`_ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer on 29 Oct, 2019.
According to the abstract,
@@ -17,30 +17,41 @@ According to the abstract,
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/bart>`_
Implementation Notes:
Implementation Notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Bart doesn't use :obj:`token_type_ids` for sequence classification. Use BartTokenizer.encode to get the proper splitting.
- The forward pass of ``BartModel`` will create decoder inputs (using the helper function ``transformers.modeling_bart._prepare_bart_decoder_inputs``) if they are not passed. This is different than some other modeling APIs.
- Model predictions are intended to be identical to the original implementation. This only works, however, if the string you pass to ``fairseq.encode`` starts with a space.
- ``BartForConditionalGeneration.generate`` should be used for conditional generation tasks like summarization, see the example in that docstrings
- Models that load the ``"facebook/bart-large-cnn"`` weights will not have a ``mask_token_id``, or be able to perform mask filling tasks.
- for training/forward passes that don't involve beam search, pass ``use_cache=False``
BartForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartForConditionalGeneration
:members: forward
BartConfig
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartConfig
:members:
BartTokenizer
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartTokenizer
:members:
BartModel
~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartModel
:members: forward
@@ -49,23 +60,16 @@ BartModel
BartForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartForSequenceClassification
:members: forward
BartForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartForQuestionAnswering
:members: forward
BartForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartForConditionalGeneration
:members: generate, forward

View File

@@ -1,13 +1,13 @@
BERT
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The BERT model was proposed in `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`__
by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. It's a bidirectional transformer
pre-trained using a combination of masked language modeling objective and next sentence prediction
on a large corpus comprising the Toronto Book Corpus and Wikipedia.
The BERT model was proposed in `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
<https://arxiv.org/abs/1810.04805>`__ by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. It's a
bidirectional transformer pretrained using a combination of masked language modeling objective and next sentence
prediction on a large corpus comprising the Toronto Book Corpus and Wikipedia.
The abstract from the paper is the following:
@@ -27,25 +27,20 @@ Tips:
- BERT is a model with absolute position embeddings so it's usually advised to pad the inputs on
the right rather than the left.
- BERT was trained with a masked language modeling (MLM) objective. It is therefore efficient at predicting masked
tokens and at NLU in general, but is not optimal for text generation. Models trained with a causal language
modeling (CLM) objective are better in that regard.
- Alongside MLM, BERT was trained using a next sentence prediction (NSP) objective using the [CLS] token as a sequence
approximate. The user may use this token (the first token in a sequence built with special tokens) to get a sequence
prediction rather than a token prediction. However, averaging over the sequence may yield better results than using
the [CLS] token.
- BERT was trained with the masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation.
The original code can be found `here <https://github.com/google-research/bert>`_.
The original code can be found `here <https://github.com/google-research/bert>`__.
BertConfig
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertConfig
:members:
BertTokenizer
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
@@ -53,120 +48,143 @@ BertTokenizer
BertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertTokenizerFast
:members:
Bert specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_bert.BertForPreTrainingOutput
:members:
.. autoclass:: transformers.modeling_tf_bert.TFBertForPreTrainingOutput
:members:
BertModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertModel
:members:
:members: forward
BertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForPreTraining
:members:
:members: forward
BertModelLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertLMHeadModel
:members: forward
BertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForMaskedLM
:members:
:members: forward
BertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForNextSentencePrediction
:members:
:members: forward
BertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForSequenceClassification
:members:
:members: forward
BertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForMultipleChoice
:members:
:members: forward
BertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForTokenClassification
:members:
:members: forward
BertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForQuestionAnswering
:members:
:members: forward
TFBertModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertModel
:members:
:members: call
TFBertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForPreTraining
:members:
:members: call
TFBertModelLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertLMHeadModel
:members: call
TFBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForMaskedLM
:members:
:members: call
TFBertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForNextSentencePrediction
:members:
:members: call
TFBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForSequenceClassification
:members:
:members: call
TFBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForMultipleChoice
:members:
:members: call
TFBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForTokenClassification
:members:
:members: call
TFBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForQuestionAnswering
:members:
:members: call

View File

@@ -0,0 +1,96 @@
BertGeneration
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The BertGeneration model is a BERT model that can be leveraged for sequence-to-sequence tasks using
:class:`~transformers.EncoderDecoderModel` as proposed in `Leveraging Pre-trained Checkpoints for Sequence Generation
Tasks <https://arxiv.org/abs/1907.12461>`__ by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
The abstract from the paper is the following:
*Unsupervised pre-training of large neural models has recently revolutionized Natural Language Processing. By
warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple
benchmarks while saving significant amounts of compute time. So far the focus has been mainly on the Natural Language
Understanding tasks. In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. We
developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT,
GPT-2 and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model, both
encoder and decoder, with these checkpoints. Our models result in new state-of-the-art results on Machine Translation,
Text Summarization, Sentence Splitting, and Sentence Fusion.*
Usage:
- The model can be used in combination with the :class:`~transformers.EncoderDecoderModel` to leverage two pretrained
BERT checkpoints for subsequent fine-tuning.
:: code-block
# leverage checkpoints for Bert2Bert model...
# use BERT's cls token as BOS token and sep token as EOS token
encoder = BertGenerationEncoder.from_pretrained("bert-large-uncased", bos_token_id=101, eos_token_id=102)
# add cross attention layers and use BERT's cls token as BOS token and sep token as EOS token
decoder = BertGenerationDecoder.from_pretrained("bert-large-uncased", add_cross_attention=True, is_decoder=True, bos_token_id=101, eos_token_id=102)
bert2bert = EncoderDecoderModel(encoder=encoder, decoder=decoder)
# create tokenizer...
tokenizer = BertTokenizer.from_pretrained("bert-large-uncased")
input_ids = tokenizer('This is a long article to summarize', add_special_tokens=False, return_tensors="pt").input_ids
labels = tokenizer('This is a short summary', return_tensors="pt").input_ids
# train...
loss = bert2bert(input_ids=input_ids, decoder_input_ids=labels, labels=labels, return_dict=True).loss
loss.backward()
- Pretrained :class:`~transformers.EncoderDecoderModel` are also directly available in the model hub, e.g.,
:: code-block
# instantiate sentence fusion model
sentence_fuser = EncoderDecoderModel.from_pretrained("google/roberta2roberta_L-24_discofuse")
tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_discofuse")
input_ids = tokenizer('This is the first sentence. This is the second sentence.', add_special_tokens=False, return_tensors="pt").input_ids
outputs = sentence_fuser.generate(input_ids)
print(tokenizer.decode(outputs[0]))
Tips:
- :class:`~transformers.BertGenerationEncoder` and :class:`~transformers.BertGenerationDecoder` should be used in
combination with :class:`~transformers.EncoderDecoder`.
- For summarization, sentence splitting, sentence fusion and translation, no special tokens are required for the input.
Therefore, no EOS token should be added to the end of the input.
The original code can be found `here <https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder>`__.
BertGenerationConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertGenerationConfig
:members:
BertGenerationTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertGenerationTokenizer
:members: save_vocabulary
BertGenerationEncoder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertGenerationEncoder
:members: forward
BertGenerationDecoder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertGenerationDecoder
:members: forward

View File

@@ -1,8 +1,8 @@
CamemBERT
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The CamemBERT model was proposed in `CamemBERT: a Tasty French Language Model <https://arxiv.org/abs/1911.03894>`__
by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la
@@ -22,20 +22,20 @@ pretrained model for CamemBERT hoping to foster research and downstream applicat
Tips:
- This implementation is the same as RoBERTa. Refer to the `documentation of RoBERTa <./roberta.html>`__ for usage
- This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa <roberta>` for usage
examples as well as the information relative to the inputs and outputs.
The original code can be found `here <https://camembert-model.fr/>`_.
The original code can be found `here <https://camembert-model.fr/>`__.
CamembertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertConfig
:members:
CamembertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
@@ -43,77 +43,91 @@ CamembertTokenizer
CamembertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertModel
:members:
CamembertForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForCausalLM
:members:
CamembertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForMaskedLM
:members:
CamembertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForSequenceClassification
:members:
CamembertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForMultipleChoice
:members:
CamembertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForTokenClassification
:members:
CamembertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForQuestionAnswering
:members:
TFCamembertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertModel
:members:
TFCamembertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForMaskedLM
:members:
TFCamembertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForSequenceClassification
:members:
TFCamembertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForMultipleChoice
:members:
TFCamembertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForTokenClassification
:members:
TFCamembertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForQuestionAnswering
:members:

View File

@@ -1,12 +1,12 @@
CTRL
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CTRL model was proposed in `CTRL: A Conditional Transformer Language Model for Controllable Generation <https://arxiv.org/abs/1909.05858>`_
by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
It's a causal (unidirectional) transformer pre-trained using language modeling on a very large
CTRL model was proposed in `CTRL: A Conditional Transformer Language Model for Controllable Generation
<https://arxiv.org/abs/1909.05858>`_ by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and
Richard Socher. It's a causal (unidirectional) transformer pre-trained using language modeling on a very large
corpus of ~140 GB of text data with the first token reserved as a control code (such as Links, Books, Wikipedia etc.).
The abstract from the paper is the following:
@@ -31,50 +31,50 @@ Tips:
it can be observed in the `run_generation.py` example script.
- The PyTorch models can take the `past` as input, which is the previously computed key/value attention pairs. Using
this `past` value prevents the model from re-computing pre-computed values in the context of text generation.
See `reusing the past in generative models <../quickstart.html#using-the-past>`_ for more information on the usage
See `reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage
of this argument.
The original code can be found `here <https://github.com/salesforce/ctrl>`_.
The original code can be found `here <https://github.com/salesforce/ctrl>`__.
CTRLConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CTRLConfig
:members:
CTRLTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CTRLTokenizer
:members: save_vocabulary
CTRLModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CTRLModel
:members:
:members: forward
CTRLLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CTRLLMHeadModel
:members:
:members: forward
TFCTRLModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCTRLModel
:members:
:members: call
TFCTRLLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCTRLLMHeadModel
:members:
:members: call

View File

@@ -1,8 +1,8 @@
DialoGPT
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DialoGPT was proposed in
`DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536>`_

View File

@@ -1,14 +1,15 @@
DistilBERT
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The DistilBERT model was proposed in the blog post
`Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT <https://medium.com/huggingface/distilbert-8cf3380435b5>`__,
and the paper `DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__.
DistilBERT is a small, fast, cheap and light Transformer model trained by distilling Bert base. It has 40% less
parameters than `bert-base-uncased`, runs 60% faster while preserving over 95% of Bert's performances as measured on
`Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT
<https://medium.com/huggingface/distilbert-8cf3380435b5>`__, and the paper `DistilBERT, a distilled version of BERT:
smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__.
DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less
parameters than `bert-base-uncased`, runs 60% faster while preserving over 95% of BERT's performances as measured on
the GLUE language understanding benchmark.
The abstract from the paper is the following:
@@ -27,113 +28,115 @@ on-device study.*
Tips:
- DistilBert doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token `tokenizer.sep_token` (or `[SEP]`)
- DistilBert doesn't have options to select the input positions (`position_ids` input). This could be added if necessary though, just let's us know if you need this option.
- DistilBERT doesn't have :obj:`token_type_ids`, you don't need to indicate which token belongs to which segment. Just
separate your segments with the separation token :obj:`tokenizer.sep_token` (or :obj:`[SEP]`).
- DistilBERT doesn't have options to select the input positions (:obj:`position_ids` input). This could be added if
necessary though, just let us know if you need this option.
The original code can be found `here <https://github.com/huggingface/transformers/tree/master/examples/distillation>`_.
The original code can be found `here <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
DistilBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertConfig
:members:
DistilBertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertTokenizer
:members:
DistilBertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertTokenizerFast
:members:
DistilBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertModel
:members:
:members: forward
DistilBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForMaskedLM
:members:
:members: forward
DistilBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForSequenceClassification
:members:
:members: forward
DistilBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForMultipleChoice
:members:
:members: forward
DistilBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForTokenClassification
:members:
:members: forward
DistilBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForQuestionAnswering
:members:
:members: forward
TFDistilBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertModel
:members:
:members: call
TFDistilBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForMaskedLM
:members:
:members: call
TFDistilBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForSequenceClassification
:members:
:members: call
TFDistilBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForMultipleChoice
:members:
:members: call
TFDistilBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForTokenClassification
:members:
:members: call
TFDistilBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForQuestionAnswering
:members:
:members: call

View File

@@ -0,0 +1,101 @@
DPR
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dense Passage Retrieval (DPR) is a set of tools and models for state-of-the-art open-domain Q&A research.
It was intorduced in `Dense Passage Retrieval for Open-Domain Question Answering <https://arxiv.org/abs/2004.04906>`__
by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih.
The abstract from the paper is the following:
*Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional
sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can
be practically implemented using dense representations alone, where embeddings are learned from a small number of
questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets,
our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage
retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA
benchmarks.*
The original code can be found `here <https://github.com/facebookresearch/DPR>`__.
DPRConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRConfig
:members:
DPRContextEncoderTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRContextEncoderTokenizer
:members:
DPRContextEncoderTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRContextEncoderTokenizerFast
:members:
DPRQuestionEncoderTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRQuestionEncoderTokenizer
:members:
DPRQuestionEncoderTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRQuestionEncoderTokenizerFast
:members:
DPRReaderTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRReaderTokenizer
:members:
DPRReaderTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRReaderTokenizerFast
:members:
DPR specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_dpr.DPRContextEncoderOutput
:members:
.. autoclass:: transformers.modeling_dpr.DPRQuestionEncoderOutput
:members:
.. autoclass:: transformers.modeling_dpr.DPRReaderOutput
:members:
DPRContextEncoder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRContextEncoder
:members: forward
DPRQuestionEncoder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRQuestionEncoder
:members: forward
DPRReader
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRReader
:members: forward

View File

@@ -1,14 +1,14 @@
ELECTRA
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ELECTRA model was proposed in the paper.
`ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators <https://openreview.net/pdf?id=r1xMH1BtvB>`__.
ELECTRA is a new pre-training approach which trains two transformer models: the generator and the discriminator. The
generator's role is to replace tokens in a sequence, and is therefore trained as a masked language model. The discriminator,
which is the model we're interested in, tries to identify which tokens were replaced by the generator in the sequence.
The ELECTRA model was proposed in the paper `ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
Generators <https://openreview.net/pdf?id=r1xMH1BtvB>`__. ELECTRA is a new pretraining approach which trains two
transformer models: the generator and the discriminator. The generator's role is to replace tokens in a sequence, and
is therefore trained as a masked language model. The discriminator, which is the model we're interested in, tries to
identify which tokens were replaced by the generator in the sequence.
The abstract from the paper is the following:
@@ -35,114 +35,146 @@ compute and outperforms them when using the same amount of compute.*
Tips:
- ELECTRA is the pre-training approach, therefore there is nearly no changes done to the underlying model: BERT. The
only change is the separation of the embedding size and the hidden size -> The embedding size is generally smaller,
- ELECTRA is the pretraining approach, therefore there is nearly no changes done to the underlying model: BERT. The
only change is the separation of the embedding size and the hidden size: the embedding size is generally smaller,
while the hidden size is larger. An additional projection layer (linear) is used to project the embeddings from
their embedding size to the hidden size. In the case where the embedding size is the same as the hidden size, no
projection layer is used.
- The ELECTRA checkpoints saved using `Google Research's implementation <https://github.com/google-research/electra>`__
contain both the generator and discriminator. The conversion script requires the user to name which model to export
into the correct architecture. Once converted to the HuggingFace format, these checkpoints may be loaded into all
available ELECTRA models, however. This means that the discriminator may be loaded in the `ElectraForMaskedLM` model,
and the generator may be loaded in the `ElectraForPreTraining` model (the classification head will be randomly
initialized as it doesn't exist in the generator).
available ELECTRA models, however. This means that the discriminator may be loaded in the
:class:`~transformers.ElectraForMaskedLM` model, and the generator may be loaded in the
:class:`~transformers.ElectraForPreTraining` model (the classification head will be randomly initialized as it
doesn't exist in the generator).
The original code can be found `here <https://github.com/google-research/electra>`_.
The original code can be found `here <https://github.com/google-research/electra>`__.
ElectraConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraConfig
:members:
ElectraTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraTokenizer
:members:
ElectraTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraTokenizerFast
:members:
Electra specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_electra.ElectraForPreTrainingOutput
:members:
.. autoclass:: transformers.modeling_tf_electra.TFElectraForPreTrainingOutput
:members:
ElectraModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraModel
:members:
:members: forward
ElectraForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForPreTraining
:members:
:members: forward
ElectraForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForMaskedLM
:members:
:members: forward
ElectraForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForSequenceClassification
:members:
:members: forward
ElectraForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForMultipleChoice
:members: forward
ElectraForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForTokenClassification
:members:
:members: forward
ElectraForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForQuestionAnswering
:members:
:members: forward
TFElectraModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraModel
:members:
:members: call
TFElectraForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForPreTraining
:members:
:members: call
TFElectraForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForMaskedLM
:members:
:members: call
TFElectraForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForSequenceClassification
:members: call
TFElectraForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForMultipleChoice
:members: call
TFElectraForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForTokenClassification
:members:
:members: call
TFElectraForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForQuestionAnswering
:members:
:members: call

View File

@@ -1,23 +1,30 @@
Encoder Decoder Models
------------------------
-----------------------------------------------------------------------------------------------------------------------
This class can wrap an encoder model, such as ``BertModel`` and a decoder modeling with a language modeling head, such as ``BertForMaskedLM`` into a encoder-decoder model.
The :class:`~transformers.EncoderDecoderModel` can be used to initialize a sequence-to-sequence model with any
pretrained autoencoding model as the encoder and any pretrained autoregressive model as the decoder.
The ``EncoderDecoderModel`` class allows to instantiate a encoder decoder model using the ``from_encoder_decoder_pretrain`` class method taking a pretrained encoder and pretrained decoder model as an input.
The ``EncoderDecoderModel`` is saved using the standard ``save_pretrained()`` method and can also again be loaded using the standard ``from_pretrained()`` method.
The effectiveness of initializing sequence-to-sequence models with pretrained checkpoints for sequence generation tasks
was shown in `Leveraging Pre-trained Checkpoints for Sequence Generation Tasks <https://arxiv.org/abs/1907.12461>`__ by
Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
An application of this architecture could be *summarization* using two pretrained Bert models as is shown in the paper: `Text Summarization with Pretrained Encoders <https://arxiv.org/abs/1910.13461>`_ by Yang Liu and Mirella Lapata.
After such an :class:`~transformers.EncoderDecoderModel` has been trained/fine-tuned, it can be saved/loaded just like
any other models (see the examples for more information).
An application of this architecture could be to leverage two pretrained :class:`~transformers.BertModel` as the encoder
and decoder for a summarization model as was shown in: `Text Summarization with Pretrained Encoders
<https://arxiv.org/abs/1908.08345>`__ by Yang Liu and Mirella Lapata.
``EncoderDecoderConfig``
~~~~~~~~~~~~~~~~~~~~~~~~~
EncoderDecoderConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.EncoderDecoderConfig
:members:
``EncoderDecoderModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
EncoderDecoderModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.EncoderDecoderModel
:members:
:members: forward

View File

@@ -1,12 +1,12 @@
FlauBERT
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The FlauBERT model was proposed in the paper
`FlauBERT: Unsupervised Language Model Pre-training for French <https://arxiv.org/abs/1912.05372>`__ by Hang Le et al.
It's a transformer pre-trained using a masked language modeling (MLM) objective (BERT-like).
The FlauBERT model was proposed in the paper `FlauBERT: Unsupervised Language Model Pre-training for French
<https://arxiv.org/abs/1912.05372>`__ by Hang Le et al. It's a transformer model pretrained using a masked language
modeling (MLM) objective (like BERT).
The abstract from the paper is the following:
@@ -23,95 +23,109 @@ of the time they outperform other pre-training approaches. Different versions of
evaluation protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared
to the research community for further reproducible experiments in French NLP.*
The original code can be found `here <https://github.com/getalp/Flaubert>`_.
The original code can be found `here <https://github.com/getalp/Flaubert>`__.
FlaubertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertConfig
:members:
FlaubertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertTokenizer
:members:
FlaubertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertModel
:members:
:members: forward
FlaubertWithLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertWithLMHeadModel
:members:
:members: forward
FlaubertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertForSequenceClassification
:members:
:members: forward
FlaubertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertForMultipleChoice
:members: forward
FlaubertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertForTokenClassification
:members: forward
FlaubertForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertForQuestionAnsweringSimple
:members:
:members: forward
FlaubertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertForQuestionAnswering
:members:
:members: forward
TFFlaubertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertModel
:members:
:members: call
TFFlaubertWithLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertWithLMHeadModel
:members:
:members: call
TFFlaubertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForSequenceClassification
:members:
:members: call
TFFlaubertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForMultipleChoice
:members:
:members: call
TFFlaubertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForTokenClassification
:members:
:members: call
TFFlaubertForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForQuestionAnsweringSimple
:members:
:members: call

View File

@@ -0,0 +1,61 @@
FSMT
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** If you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
@stas00.
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
FSMT (FairSeq MachineTranslation) models were introduced in `Facebook FAIR's WMT19 News Translation Task Submission
<https://arxiv.org/abs/1907.06616>`__ by Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, Sergey Edunov.
The abstract of the paper is the following:
*This paper describes Facebook FAIR's submission to the WMT19 shared news translation task. We participate in two
language pairs and four language directions, English <-> German and English <-> Russian. Following our submission from
last year, our baseline systems are large BPE-based transformer models trained with the Fairseq sequence modeling
toolkit which rely on sampled back-translations. This year we experiment with different bitext data filtering schemes,
as well as with adding filtered back-translated data. We also ensemble and fine-tune our models on domain-specific
data, then decode using noisy channel model reranking. Our submissions are ranked first in all four directions of the
human evaluation campaign. On En->De, our system significantly outperforms other systems as well as human translations.
This system improves upon our WMT'18 submission by 4.5 BLEU points.*
The original code can be found here <https://github.com/pytorch/fairseq/tree/master/examples/wmt19>__.
Implementation Notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- FSMT uses source and target vocabulary pairs that aren't combined into one. It doesn't share embeddings tokens
either. Its tokenizer is very similar to :class:`~transformers.XLMTokenizer` and the main model is derived from
:class:`~transformers.BartModel`.
FSMTConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FSMTConfig
:members:
FSMTTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FSMTTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, prepare_seq2seq_batch, save_vocabulary
FSMTModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FSMTModel
:members: forward
FSMTForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FSMTForConditionalGeneration
:members: forward

View File

@@ -0,0 +1,184 @@
Funnel Transformer
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Funnel Transformer model was proposed in the paper `Funnel-Transformer: Filtering out Sequential Redundancy for
Efficient Language Processing <https://arxiv.org/abs/2006.03236>`__. It is a bidirectional transformer model, like
BERT, but with a pooling operation after each block of layers, a bit like in traditional convolutional neural networks
(CNN) in computer vision.
The abstract from the paper is the following:
*With the success of language pretraining, it is highly desirable to develop more efficient architectures of good
scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the
much-overlooked redundancy in maintaining a full-length token-level presentation, especially for tasks that only
require a single-vector presentation of the sequence. With this intuition, we propose Funnel-Transformer which
gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More
importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, we further
improve the model capacity. In addition, to perform token-level predictions as required by common pretraining
objectives, Funnel-Transformer is able to recover a deep representation for each token from the reduced hidden sequence
via a decoder. Empirically, with comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on
a wide variety of sequence-level prediction tasks, including text classification, language understanding, and reading
comprehension.*
Tips:
- Since Funnel Transformer uses pooling, the sequence length of the hidden states changes after each block of layers.
The base model therefore has a final sequence length that is a quarter of the original one. This model can be used
directly for tasks that just require a sentence summary (like sequence classification or multiple choice). For other
tasks, the full model is used; this full model has a decoder that upsamples the final hidden states to the same
sequence length as the input.
- The Funnel Transformer checkpoints are all available with a full version and a base version. The first ones should
be used for :class:`~transformers.FunnelModel`, :class:`~transformers.FunnelForPreTraining`,
:class:`~transformers.FunnelForMaskedLM`, :class:`~transformers.FunnelForTokenClassification` and
class:`~transformers.FunnelForQuestionAnswering`. The second ones should be used for
:class:`~transformers.FunnelBaseModel`, :class:`~transformers.FunnelForSequenceClassification` and
:class:`~transformers.FunnelForMultipleChoice`.
The original code can be found `here <https://github.com/laiguokun/Funnel-Transformer>`__.
FunnelConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelConfig
:members:
FunnelTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
FunnelTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelTokenizerFast
:members:
Funnel specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_funnel.FunnelForPreTrainingOutput
:members:
.. autoclass:: transformers.modeling_tf_funnel.TFFunnelForPreTrainingOutput
:members:
FunnelBaseModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelBaseModel
:members: forward
FunnelModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelModel
:members: forward
FunnelModelForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelForPreTraining
:members: forward
FunnelForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelForMaskedLM
:members: forward
FunnelForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelForSequenceClassification
:members: forward
FunnelForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelForMultipleChoice
:members: forward
FunnelForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelForTokenClassification
:members: forward
FunnelForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelForQuestionAnswering
:members: forward
TFFunnelBaseModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelBaseModel
:members: call
TFFunnelModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelModel
:members: call
TFFunnelModelForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelForPreTraining
:members: call
TFFunnelForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelForMaskedLM
:members: call
TFFunnelForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelForSequenceClassification
:members: call
TFFunnelForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelForMultipleChoice
:members: call
TFFunnelForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelForTokenClassification
:members: call
TFFunnelForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelForQuestionAnswering
:members: call

View File

@@ -1,12 +1,14 @@
OpenAI GPT
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OpenAI GPT model was proposed in `Improving Language Understanding by Generative Pre-Training <https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf>`__
OpenAI GPT model was proposed in `Improving Language Understanding by Generative Pre-Training
<https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf>`__
by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. It's a causal (unidirectional)
transformer pre-trained using language modeling on a large corpus will long range dependencies, the Toronto Book Corpus.
transformer pre-trained using language modeling on a large corpus will long range dependencies, the Toronto Book
Corpus.
The abstract from the paper is the following:
@@ -36,7 +38,7 @@ Tips:
`Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by
Hugging Face showcasing the generative capabilities of several models. GPT is one of them.
The original code can be found `here <https://github.com/openai/finetune-transformer-lm>`_.
The original code can be found `here <https://github.com/openai/finetune-transformer-lm>`__.
Note:
@@ -46,68 +48,78 @@ If you want to reproduce the original tokenization process of the `OpenAI GPT` p
pip install spacy ftfy==4.4.3
python -m spacy download en
If you don't install ``ftfy`` and ``SpaCy``, the :class:`transformers.OpenAIGPTTokenizer` will default to tokenize using
BERT's :obj:`BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't
If you don't install ``ftfy`` and ``SpaCy``, the :class:`~transformers.OpenAIGPTTokenizer` will default to tokenize
using BERT's :obj:`BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't
worry).
OpenAIGPTConfig
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTConfig
:members:
OpenAIGPTTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTTokenizer
:members: save_vocabulary
OpenAIGPTTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTTokenizerFast
:members:
OpenAI specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_openai.OpenAIGPTDoubleHeadsModelOutput
:members:
.. autoclass:: transformers.modeling_tf_openai.TFOpenAIGPTDoubleHeadsModelOutput
:members:
OpenAIGPTModel
~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTModel
:members:
:members: forward
OpenAIGPTLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTLMHeadModel
:members:
:members: forward
OpenAIGPTDoubleHeadsModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTDoubleHeadsModel
:members:
:members: forward
TFOpenAIGPTModel
~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFOpenAIGPTModel
:members:
:members: call
TFOpenAIGPTLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFOpenAIGPTLMHeadModel
:members:
:members: call
TFOpenAIGPTDoubleHeadsModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFOpenAIGPTDoubleHeadsModel
:members:
:members: call

View File

@@ -1,14 +1,13 @@
OpenAI GPT2
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OpenAI GPT-2 model was proposed in
`Language Models are Unsupervised Multitask Learners <https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf>`_
by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
It's a causal (unidirectional) transformer pre-trained using language modeling on a very large
corpus of ~40 GB of text data.
OpenAI GPT-2 model was proposed in `Language Models are Unsupervised Multitask Learners
<https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf>`_
by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever. It's a causal (unidirectional)
transformer pretrained using language modeling on a very large corpus of ~40 GB of text data.
The abstract from the paper is the following:
@@ -27,74 +26,84 @@ Tips:
it can be observed in the `run_generation.py` example script.
- The PyTorch models can take the `past` as input, which is the previously computed key/value attention pairs. Using
this `past` value prevents the model from re-computing pre-computed values in the context of text generation.
See `reusing the past in generative models <../quickstart.html#using-the-past>`_ for more information on the usage
See `reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage
of this argument.
`Write With Transformer <https://transformer.huggingface.co/doc/gpt2-large>`__ is a webapp created and hosted by
Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
different sizes: small, medium, large, xl and a distilled version of the small checkpoint: distilgpt-2.
different sizes: small, medium, large, xl and a distilled version of the small checkpoint: `distilgpt-2`.
The original code can be found `here <https://openai.com/blog/better-language-models/>`_.
The original code can be found `here <https://openai.com/blog/better-language-models/>`__.
GPT2Config
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2Config
:members:
GPT2Tokenizer
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2Tokenizer
:members: save_vocabulary
GPT2TokenizerFast
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2TokenizerFast
:members:
GPT2 specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_gpt2.GPT2DoubleHeadsModelOutput
:members:
.. autoclass:: transformers.modeling_tf_gpt2.TFGPT2DoubleHeadsModelOutput
:members:
GPT2Model
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2Model
:members:
:members: forward
GPT2LMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2LMHeadModel
:members:
:members: forward
GPT2DoubleHeadsModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2DoubleHeadsModel
:members:
:members: forward
TFGPT2Model
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFGPT2Model
:members:
:members: call
TFGPT2LMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFGPT2LMHeadModel
:members:
:members: call
TFGPT2DoubleHeadsModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFGPT2DoubleHeadsModel
:members:
:members: call

View File

@@ -0,0 +1,55 @@
LayoutLM
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The LayoutLM model was proposed in `LayoutLM: Pre-training of Text and Layout for Document Image Understanding <https://arxiv.org/abs/1912.13318>`__
by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou. It's a simple but effective pre-training method
of text and layout for document image understanding and information extraction tasks, such as form understanding and receipt understanding.
The abstract from the paper is the following:
*Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42).*
Tips:
- LayoutLM has an extra input called :obj:`bbox`, which is the bounding boxes of the input tokens.
- The :obj:`bbox` requires the data that on 0-1000 scale, which means you should normalize the bounding box before passing them into model.
The original code can be found `here <https://github.com/microsoft/unilm/tree/master/layoutlm>`_.
LayoutLMConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LayoutLMConfig
:members:
LayoutLMTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LayoutLMTokenizer
:members:
LayoutLMModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LayoutLMModel
:members:
LayoutLMForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LayoutLMForMaskedLM
:members:
LayoutLMForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LayoutLMForTokenClassification
:members:

View File

@@ -1,104 +1,155 @@
Longformer
----------------------------------------------------
**DISCLAIMER:** This model is still a work in progress, if you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** This model is still a work in progress, if you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__.
Overview
~~~~~~~~~
The Longformer model was presented in `Longformer: The Long-Document Transformer <https://arxiv.org/pdf/2004.05150.pdf>`_ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
Here the abstract:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task motivated global attention. Following prior work on long-sequence transformers, we evaluate Longformer on character-level language modeling and achieve state-of-the-art results on text8 and enwik8. In contrast to most prior work, we also pretrain Longformer and finetune it on a variety of downstream tasks. Our pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on WikiHop and TriviaQA.*
The Longformer model was presented in `Longformer: The Long-Document Transformer
<https://arxiv.org/pdf/2004.05150.pdf>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
The Authors' code can be found `here <https://github.com/allenai/longformer>`_ .
The abstract from the paper is the following:
*Transformer-based models are unable to process long sequences due to their self-attention operation, which scales
quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention
mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or
longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local
windowed attention with a task motivated global attention. Following prior work on long-sequence transformers, we
evaluate Longformer on character-level language modeling and achieve state-of-the-art results on text8 and enwik8. In
contrast to most prior work, we also pretrain Longformer and finetune it on a variety of downstream tasks. Our
pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on
WikiHop and TriviaQA.*
The Authors' code can be found `here <https://github.com/allenai/longformer>`__.
Longformer Self Attention
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Longformer self attention employs self attention on both a "local" context and a "global" context.
Most tokens only attend "locally" to each other meaning that each token attends to its :math:`\frac{1}{2} w` previous tokens and :math:`\frac{1}{2} w` succeding tokens with :math:`w` being the window length as defined in `config.attention_window`. Note that `config.attention_window` can be of type ``list`` to define a different :math:`w` for each layer.
A selecetd few tokens attend "globally" to all other tokens, as it is conventionally done for all tokens in *e.g.* `BertSelfAttention`.
Most tokens only attend "locally" to each other meaning that each token attends to its :math:`\frac{1}{2} w` previous
tokens and :math:`\frac{1}{2} w` succeding tokens with :math:`w` being the window length as defined in
:obj:`config.attention_window`. Note that :obj:`config.attention_window` can be of type :obj:`List` to define a
different :math:`w` for each layer. A selected few tokens attend "globally" to all other tokens, as it is
conventionally done for all tokens in :obj:`BertSelfAttention`.
Note that "locally" and "globally" attending tokens are projected by different query, key and value matrices.
Also note that every "locally" attending token not only attends to tokens within its window :math:`w`, but also to all "globally" attending tokens so that global attention is *symmetric*.
Also note that every "locally" attending token not only attends to tokens within its window :math:`w`, but also to all
"globally" attending tokens so that global attention is *symmetric*.
The user can define which tokens attend "locally" and which tokens attend "globally" by setting the tensor `global_attention_mask` at run-time appropriately. `Longformer` employs the following logic for `global_attention_mask`: `0` - the token attends "locally", `1` - token attends "globally". For more information please also refer to :func:`~transformers.LongformerModel.forward` method.
The user can define which tokens attend "locally" and which tokens attend "globally" by setting the tensor
:obj:`global_attention_mask` at run-time appropriately. All Longformer models employ the following logic for
:obj:`global_attention_mask`:
Using Longformer self attention, the memory and time complexity of the query-key matmul operation, which usually represents the memory and time bottleneck, can be reduced from :math:`\mathcal{O}(n_s \times n_s)` to :math:`\mathcal{O}(n_s \times w)`, with :math:`n_s` being the sequence length and :math:`w` being the average window size. It is assumed that the number of "globally" attending tokens is insignificant as compared to the number of "locally" attending tokens.
- 0: the token attends "locally",
- 1: the token attends "globally".
For more information, please refer to the official `paper <https://arxiv.org/pdf/2004.05150.pdf>`_ .
For more information please also refer to :meth:`~transformers.LongformerModel.forward` method.
Using Longformer self attention, the memory and time complexity of the query-key matmul operation, which usually
represents the memory and time bottleneck, can be reduced from :math:`\mathcal{O}(n_s \times n_s)` to
:math:`\mathcal{O}(n_s \times w)`, with :math:`n_s` being the sequence length and :math:`w` being the average window
size. It is assumed that the number of "globally" attending tokens is insignificant as compared to the number of
"locally" attending tokens.
For more information, please refer to the official `paper <https://arxiv.org/pdf/2004.05150.pdf>`__.
Training
~~~~~~~~~~~~~~~~~~~~
``LongformerForMaskedLM`` is trained the exact same way, ``RobertaForMaskedLM`` is trained and
should be used as follows:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
::
:class:`~transformers.LongformerForMaskedLM` is trained the exact same way :class:`~transformers.RobertaForMaskedLM` is
trained and should be used as follows:
input_ids = tokenizer.encode('This is a sentence from [MASK] training data', return_tensors='pt')
mlm_labels = tokenizer.encode('This is a sentence from the training data', return_tensors='pt')
.. code-block::
loss = model(input_ids, labels=input_ids, masked_lm_labels=mlm_labels)[0]
input_ids = tokenizer.encode('This is a sentence from [MASK] training data', return_tensors='pt')
mlm_labels = tokenizer.encode('This is a sentence from the training data', return_tensors='pt')
loss = model(input_ids, labels=input_ids, masked_lm_labels=mlm_labels)[0]
LongformerConfig
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerConfig
:members:
LongformerTokenizer
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerTokenizer
:members:
LongformerTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerTokenizerFast
:members:
LongformerModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerModel
:members:
:members: forward
LongformerForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForMaskedLM
:members:
:members: forward
LongformerForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForSequenceClassification
:members:
:members: forward
LongformerForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForMultipleChoice
:members:
:members: forward
LongformerForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForTokenClassification
:members:
:members: forward
LongformerForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForQuestionAnswering
:members:
:members: forward
TFLongformerModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLongformerModel
:members: call
TFLongformerForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLongformerForMaskedLM
:members: call
TFLongformerForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLongformerForQuestionAnswering
:members: call

View File

@@ -0,0 +1,116 @@
LXMERT
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The LXMERT model was proposed in `LXMERT: Learning Cross-Modality Encoder Representations from Transformers
<https://arxiv.org/abs/1908.07490>`__ by Hao Tan & Mohit Bansal. It is a series of bidirectional transformer encoders
(one for the vision modality, one for the language modality, and then one to fuse both modalities) pretrained using a
combination of masked language modeling, visual-language text alignment, ROI-feature regression, masked
visual-attribute modeling, masked visual-object modeling, and visual-question answering objectives.
The pretraining consists of multiple multi-modal datasets: MSCOCO, Visual-Genome + Visual-Genome Question Answering,
VQA 2.0, and GQA.
The abstract from the paper is the following:
*Vision-and-language reasoning requires an understanding of visual concepts, language semantics, and, most importantly,
the alignment and relationships between these two modalities. We thus propose the LXMERT (Learning Cross-Modality
Encoder Representations from Transformers) framework to learn these vision-and-language connections. In LXMERT, we
build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language
encoder, and a cross-modality encoder. Next, to endow our model with the capability of connecting vision and language
semantics, we pre-train the model with large amounts of image-and-sentence pairs, via five diverse representative
pre-training tasks: masked language modeling, masked object prediction (feature regression and label classification),
cross-modality matching, and image question answering. These tasks help in learning both intra-modality and
cross-modality relationships. After fine-tuning from our pretrained parameters, our model achieves the state-of-the-art
results on two visual question answering datasets (i.e., VQA and GQA). We also show the generalizability of our
pretrained cross-modality model by adapting it to a challenging visual-reasoning task, NLVR, and improve the previous
best result by 22% absolute (54% to 76%). Lastly, we demonstrate detailed ablation studies to prove that both our novel
model components and pretraining strategies significantly contribute to our strong results; and also present several
attention visualizations for the different encoders*
Tips:
- Bounding boxes are not necessary to be used in the visual feature embeddings, any kind of visual-spacial features
will work.
- Both the language hidden states and the visual hidden states that LXMERT outputs are passed through the
cross-modality layer, so they contain information from both modalities. To access a modality that only attends to
itself, select the vision/language hidden states from the first input in the tuple.
- The bidirectional cross-modality encoder attention only returns attention values when the language modality is used
as the input and the vision modality is used as the context vector. Further, while the cross-modality encoder
contains self-attention for each respective modality and cross-attention, only the cross attention is returned and
both self attention outputs are disregarded.
The original code can be found `here <https://github.com/airsplay/lxmert>`__.
LxmertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LxmertConfig
:members:
LxmertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LxmertTokenizer
:members:
LxmertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LxmertTokenizerFast
:members:
Lxmert specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_lxmert.LxmertModelOutput
:members:
.. autoclass:: transformers.modeling_lxmert.LxmertForPreTrainingOutput
:members:
.. autoclass:: transformers.modeling_lxmert.LxmertForQuestionAnsweringOutput
:members:
.. autoclass:: transformers.modeling_tf_lxmert.TFLxmertModelOutput
:members:
.. autoclass:: transformers.modeling_tf_lxmert.TFLxmertForPreTrainingOutput
:members:
LxmertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LxmertModel
:members: forward
LxmertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LxmertForPreTraining
:members: forward
LxmertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LxmertForQuestionAnswering
:members: forward
TFLxmertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLxmertModel
:members: call
TFLxmertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLxmertForPreTraining
:members: call

View File

@@ -1,14 +1,14 @@
MarianMT
----------------------------------------------------
**DISCLAIMER:** If you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
-----------------------------------------------------------------------------------------------------------------------
**Bugs:** If you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=sshleifer&labels=&template=bug-report.md&title>`__ and assign
@sshleifer. Translations should be similar, but not identical to, output in the test set linked to in each model card.
Implementation Notes
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Each model is about 298 MB on disk, there are 1,000+ models.
- The list of supported language pairs can be found `here <https://huggingface.co/Helsinki-NLP>`__.
- The 1,000+ models were originally trained by `Jörg Tiedemann <https://researchportal.helsinki.fi/en/persons/j%C3%B6rg-tiedemann>`__ using the `Marian <https://marian-nmt.github.io/>`_ C++ library, which supports fast training and translation.
- models were originally trained by `Jörg Tiedemann <https://researchportal.helsinki.fi/en/persons/j%C3%B6rg-tiedemann>`__ using the `Marian <https://marian-nmt.github.io/>`_ C++ library, which supports fast training and translation.
- All models are transformer encoder-decoders with 6 layers in each component. Each model's performance is documented in a model card.
- The 80 opus models that require BPE preprocessing are not supported.
- The modeling code is the same as ``BartForConditionalGeneration`` with a few minor modifications:
@@ -19,14 +19,14 @@ Implementation Notes
- Code to bulk convert models can be found in ``convert_marian_to_pytorch.py``
Naming
~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- All model names use the following format: ``Helsinki-NLP/opus-mt-{src}-{tgt}``
- The language codes used to name models are inconsistent. Two digit codes can usually be found `here <https://developers.google.com/admin-sdk/directory/v1/languages>`_, three digit codes require googling "language code {code}".
- Codes formatted like ``es_AR`` are usually ``code_{region}``. That one is spanish documents from Argentina.
Multilingual Models
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All model names use the following format: ``Helsinki-NLP/opus-mt-{src}-{tgt}``:
- if ``src`` is in all caps, the model supports multiple input languages, you can figure out which ones by looking at the model card, or the Group Members `mapping <https://gist.github.com/sshleifer/6d20e7761931b08e73c3219027b97b8a>`_ .
@@ -48,7 +48,7 @@ Example of translating english to many romance languages, using language codes:
tokenizer = MarianTokenizer.from_pretrained(model_name)
print(tokenizer.supported_language_codes)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer.prepare_translation_batch(src_text))
translated = model.generate(**tokenizer.prepare_seq2seq_batch(src_text))
tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
# ["c'est une phrase en anglais que nous voulons traduire en français",
# 'Isto deve ir para o português.',
@@ -86,26 +86,26 @@ Code to see available pretrained models:
suffix = [x.split('/')[1] for x in model_ids]
multi_models = [f'{org}/{s}' for s in suffix if s != s.lower()]
MarianMTModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Pytorch version of marian-nmt's transformer.h (c++). Designed for the OPUS-NMT translation checkpoints.
Model API is identical to BartForConditionalGeneration.
Available models are listed at `Model List <https://huggingface.co/models?search=Helsinki-NLP>`__
This class inherits nearly all functionality from ``BartForConditionalGeneration``, see that page for method signatures.
MarianConfig
~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MarianConfig
:members:
MarianTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MarianTokenizer
:members: prepare_translation_batch
:members: prepare_seq2seq_batch
MarianMTModel
~~~~~~~~~~~~~
Pytorch version of marian-nmt's transformer.h (c++). Designed for the OPUS-NMT translation checkpoints.
Model API is identical to BartForConditionalGeneration.
Available models are listed at `Model List <https://huggingface.co/models?search=Helsinki-NLP>`__
This class inherits all functionality from ``BartForConditionalGeneration``, see that page for method signatures.
.. autoclass:: transformers.MarianMTModel
:members:

View File

@@ -0,0 +1,76 @@
MBart
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** If you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
@sshleifer
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The MBart model was presented in `Multilingual Denoising Pre-training for Neural Machine Translation <https://arxiv.org/abs/2001.08210>`_ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov
Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. According to the abstract,
MBART is a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective. mBART is one of the first methods for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text.
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__
Training
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
MBart is a multilingual encoder-decoder (seq-to-seq) model primarily intended for translation task.
As the model is multilingual it expects the sequences in a different format. A special language id token
is added in both the source and target text. The source text format is ``X [eos, src_lang_code]``
where ``X`` is the source text. The target text format is ```[tgt_lang_code] X [eos]```. ```bos``` is never used.
The ```MBartTokenizer.prepare_seq2seq_batch``` handles this automatically and should be used to encode
the sequences for seq-2-seq fine-tuning.
- Supervised training
.. code-block::
example_english_phrase = "UN Chief Says There Is No Military Solution in Syria"
expected_translation_romanian = "Şeful ONU declară că nu există o soluţie militară în Siria"
batch = tokenizer.prepare_seq2seq_batch(example_english_phrase, src_lang="en_XX", tgt_lang="ro_RO", tgt_texts=expected_translation_romanian)
input_ids = batch["input_ids"]
target_ids = batch["decoder_input_ids"]
decoder_input_ids = target_ids[:, :-1].contiguous()
labels = target_ids[:, 1:].clone()
model(input_ids=input_ids, decoder_input_ids=decoder_input_ids, labels=labels) #forward
- Generation
While generating the target text set the `decoder_start_token_id` to the target language id.
The following example shows how to translate English to Romanian using the ```facebook/mbart-large-en-ro``` model.
.. code-block::
from transformers import MBartForConditionalGeneration, MBartTokenizer
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-en-ro")
tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-en-ro")
article = "UN Chief Says There Is No Military Solution in Syria"
batch = tokenizer.prepare_seq2seq_batch(src_texts=[article], src_lang="en_XX")
translated_tokens = model.generate(**batch, decoder_start_token_id=tokenizer.lang_code_to_id["ro_RO"])
translation = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
assert translation == "Şeful ONU declară că nu există o soluţie militară în Siria"
MBartConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MBartConfig
:members:
MBartTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MBartTokenizer
:members: build_inputs_with_special_tokens, prepare_seq2seq_batch
MBartForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MBartForConditionalGeneration
:members: generate, forward

View File

@@ -1,13 +1,13 @@
MobileBERT
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The MobileBERT model was proposed in `MobileBERT: a Compact Task-Agnostic BERT
for Resource-Limited Devices <https://arxiv.org/abs/2004.02984>`__
by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou. It's a bidirectional transformer
based on the BERT model, which is compressed and accelerated using several approaches.
The MobileBERT model was proposed in `MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
<https://arxiv.org/abs/2004.02984>`__ by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny
Zhou. It's a bidirectional transformer based on the BERT model, which is compressed and accelerated using several
approaches.
The abstract from the paper is the following:
@@ -32,138 +32,146 @@ Tips:
It is therefore efficient at predicting masked tokens and at NLU in general, but is not optimal for
text generation. Models trained with a causal language modeling (CLM) objective are better in that regard.
The original code can be found `here <https://github.com/google-research/mobilebert>`_.
The original code can be found `here <https://github.com/google-research/mobilebert>`__.
MobileBertConfig
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertConfig
:members:
MobileBertTokenizer
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
:members:
MobileBertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertTokenizerFast
:members:
MobileBert specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_mobilebert.MobileBertForPreTrainingOutput
:members:
.. autoclass:: transformers.modeling_tf_mobilebert.TFMobileBertForPreTrainingOutput
:members:
MobileBertModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertModel
:members:
:members: forward
MobileBertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForPreTraining
:members:
:members: forward
MobileBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForMaskedLM
:members:
:members: forward
MobileBertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForNextSentencePrediction
:members:
:members: forward
MobileBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForSequenceClassification
:members:
:members: forward
MobileBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForMultipleChoice
:members:
:members: forward
MobileBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForTokenClassification
:members:
:members: forward
MobileBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForQuestionAnswering
:members:
:members: forward
TFMobileBertModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertModel
:members:
:members: call
TFMobileBertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForPreTraining
:members:
:members: call
TFMobileBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForMaskedLM
:members:
:members: call
TFMobileBertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForNextSentencePrediction
:members:
:members: call
TFMobileBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForSequenceClassification
:members:
:members: call
TFMobileBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForMultipleChoice
:members:
:members: call
TFMobileBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForTokenClassification
:members:
:members: call
TFMobileBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForQuestionAnswering
:members:
:members: call

View File

@@ -0,0 +1,117 @@
Pegasus
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** If you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=sshleifer&labels=&template=bug-report.md&title>`__ and assign
@sshleifer.
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Pegasus model was proposed in `PEGASUS: Pre-training with Extracted Gap-sentences for
Abstractive Summarization <https://arxiv.org/pdf/1912.08777.pdf>`_ by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019.
According to the abstract,
- Pegasus' pretraining task is intentionally similar to summarization: important sentences are removed/masked from an input document and are generated together as one output sequence from the remaining sentences, similar to an extractive summary.
- Pegasus achieves SOTA summarization performance on all 12 downstream tasks, as measured by ROUGE and human eval.
The Authors' code can be found `here <https://github.com/google-research/pegasus>`_.
Checkpoints
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All the `checkpoints <https://huggingface.co/models?search=pegasus>`_ are finetuned for summarization, besides ``pegasus-large``, whence the other checkpoints are finetuned.
- Each checkpoint is 2.2 GB on disk and 568M parameters.
- FP16 is not supported (help/ideas on this appreciated!).
- Summarizing xsum in fp32 takes about 400ms/sample, with default parameters on a v100 GPU.
- For XSUM, The paper reports rouge1,rouge2, rougeL of paper: 47.21/24.56/39.25. As of Aug 9, this port scores 46.91/24.34/39.1.
The gap is likely because of different alpha/length_penalty implementations in beam search.
Implementation Notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- All models are transformer encoder-decoders with 16 layers in each component.
- The implementation is completely inherited from ``BartForConditionalGeneration``
- Some key configuration differences:
- static, sinusoidal position embeddings
- no ``layernorm_embedding`` (``PegasusConfig.normalize_embedding=False``)
- the model starts generating with pad_token_id (which has 0 token_embedding) as the prefix.
- ``num_beams=8``
- All pretrained pegasus checkpoints are the same besides three attributes: ``tokenizer.model_max_length`` (max input size), ``max_length`` (max num tokens to generate) and ``length_penalty``
- Code to convert checkpoints trained in the author's `repo <https://github.com/google-research/pegasus>`_ can be found in ``convert_pegasus_tf_to_pytorch.py``
Usage Example
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
from transformers import PegasusForConditionalGeneration, PegasusTokenizer
import torch
src_text = [
""" PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow."""
]
model_name = 'google/pegasus-xsum'
torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)
batch = tokenizer.prepare_seq2seq_batch(src_text, truncation=True, padding='longest').to(torch_device)
translated = model.generate(**batch)
tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
assert tgt_text[0] == "California's largest electricity provider has turned off power to hundreds of thousands of customers."
PegasusForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This class inherits all functionality from ``BartForConditionalGeneration``, see that page for method signatures.
Available models are listed at `Model List <https://huggingface.co/models?search=pegasus>`__
.. autoclass:: transformers.PegasusForConditionalGeneration
:members:
PegasusConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This config fully inherits from ``BartConfig``, but pegasus uses different default values:
Up to date parameter values can be seen in `S3 <https://s3.amazonaws.com/models.huggingface.co/bert/google/pegasus-xsum/config.json>`_.
As of Aug 10, 2020, they are:
.. code-block:: python
dict(
vocab_size=96103,
max_position_embeddings=512,
d_model=1024,
encoder_ffn_dim=4096,
decoder_ffn_dim=4096,
encoder_attention_heads=16,
decoder_attention_heads=16,
encoder_layers=16,
decoder_layers=16,
dropout=0.1,
attention_dropout=0.1,
activation_dropout=0.1,
pad_token_id=0,
eos_token_id=1,
is_encoder_decoder=True,
normalize_before=True,
scale_embedding=True,
normalize_embedding=False,
add_final_layer_norm=True,
static_position_embeddings=True,
num_beams=8,
activation_function="relu",
)
PegasusTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
warning: ``add_tokens`` does not work at the moment.
.. autoclass:: transformers.PegasusTokenizer
:members: __call__, prepare_seq2seq_batch

View File

@@ -0,0 +1,91 @@
RAG
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
Retrieval-augmented generation ("RAG") models combine the powers of pretrained dense retrieval (DPR) and
sequence-to-sequence models. RAG models retrieve documents, pass them to a seq2seq model, then marginalize to generate
outputs. The retriever and seq2seq modules are initialized from pretrained models, and fine-tuned jointly, allowing
both retrieval and generation to adapt to downstream tasks.
It is based on the paper `Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
<https://arxiv.org/abs/2005.11401>`__ by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir
Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
The abstract from the paper is the following:
*Large pre-trained language models have been shown to store factual knowledge
in their parameters, and achieve state-of-the-art results when fine-tuned on
downstream NLP tasks. However, their ability to access and precisely manipulate
knowledge is still limited, and hence on knowledge-intensive tasks, their
performance lags behind task-specific architectures. Additionally, providing
provenance for their decisions and updating their world knowledge remain open
research problems. Pre-trained models with a differentiable access mechanism to
explicit nonparametric memory can overcome this issue, but have so far been only
investigated for extractive downstream tasks. We explore a general-purpose
fine-tuning recipe for retrieval-augmented generation (RAG) — models which combine
pre-trained parametric and non-parametric memory for language generation. We
introduce RAG models where the parametric memory is a pre-trained seq2seq model and
the non-parametric memory is a dense vector index of Wikipedia, accessed with
a pre-trained neural retriever. We compare two RAG formulations, one which
conditions on the same retrieved passages across the whole generated sequence, the
other can use different passages per token. We fine-tune and evaluate our models
on a wide range of knowledge-intensive NLP tasks and set the state-of-the-art
on three open domain QA tasks, outperforming parametric seq2seq models and
task-specific retrieve-and-extract architectures. For language generation tasks, we
find that RAG models generate more specific, diverse and factual language than a
state-of-the-art parametric-only seq2seq baseline.*
RagConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RagConfig
:members:
RagTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RagTokenizer
:members: prepare_seq2seq_batch
Rag specific outputs
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_rag.RetrievAugLMMarginOutput
:members:
.. autoclass:: transformers.modeling_rag.RetrievAugLMOutput
:members:
RAGRetriever
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RagRetriever
:members:
RagModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RagModel
:members: forward
RagSequenceForGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RagSequenceForGeneration
:members: forward, generate
RagTokenForGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RagTokenForGeneration
:members: forward, generate

View File

@@ -1,20 +1,37 @@
Reformer
----------------------------------------------------
**DISCLAIMER:** This model is still a work in progress, if you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** This model is still a work in progress, if you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__.
Overview
~~~~~~~~~~
The Reformer model was presented in `Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451.pdf>`_ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
Here the abstract:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O(L^2) to O(Llog(L)), where L is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.*
The Reformer model was proposed in the paper `Reformer: The Efficient Transformer
<https://arxiv.org/abs/2001.04451.pdf>`__ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
The Authors' code can be found `here <https://github.com/google/trax/tree/master/trax/models/reformer>`_ .
The abstract from the paper is the following:
*Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can
be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of
Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its
complexity from O(L^2) to O(Llog(L)), where L is the length of the sequence. Furthermore, we use reversible residual
layers instead of the standard residuals, which allows storing activations only once in the training process instead of
N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models
while being much more memory-efficient and much faster on long sequences.*
The Authors' code can be found `here <https://github.com/google/trax/tree/master/trax/models/reformer>`__.
Axial Positional Encodings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Axial Positional Encodings were first implemented in Google's `trax library <https://github.com/google/trax/blob/4d99ad4965bab1deba227539758d59f0df0fef48/trax/layers/research/position_encodings.py#L29>`_ and developed by the authors of this model's paper. In models that are treating very long input sequences, the conventional position id encodings store an embedings vector of size :math:`d` being the ``config.hidden_size`` for every position :math:`i, \ldots, n_s`, with :math:`n_s` being ``config.max_embedding_size``. *E.g.*, having a sequence length of :math:`n_s = 2^{19} \approx 0.5M` and a ``config.hidden_size`` of :math:`d = 2^{10} \approx 1000` would result in a position encoding matrix:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Axial Positional Encodings were first implemented in Google's `trax library
<https://github.com/google/trax/blob/4d99ad4965bab1deba227539758d59f0df0fef48/trax/layers/research/position_encodings.py#L29>`__
and developed by the authors of this model's paper. In models that are treating very long input sequences, the
conventional position id encodings store an embedings vector of size :math:`d` being the :obj:`config.hidden_size` for
every position :math:`i, \ldots, n_s`, with :math:`n_s` being :obj:`config.max_embedding_size`. This means that having
a sequence length of :math:`n_s = 2^{19} \approx 0.5M` and a ``config.hidden_size`` of :math:`d = 2^{10} \approx 1000`
would result in a position encoding matrix:
.. math::
X_{i,j}, \text{ with } i \in \left[1,\ldots, d\right] \text{ and } j \in \left[1,\ldots, n_s\right]
@@ -42,87 +59,127 @@ Therefore the following holds:
X^{2}_{i - d^1, l}, & \text{if } i \ge d^1 \text{ with } l = \lfloor\frac{j}{n_s^1}\rfloor
\end{cases}
Intuitively, this means that a position embedding vector :math:`x_j \in \mathbb{R}^{d}` is now the composition of two factorized embedding vectors: :math:`x^1_{k, l} + x^2_{l, k}`, where as the ``config.max_embedding_size`` dimension :math:`j` is factorized into :math:`k \text{ and } l`.
This design ensures that each position embedding vector :math:`x_j` is unique.
Intuitively, this means that a position embedding vector :math:`x_j \in \mathbb{R}^{d}` is now the composition of two
factorized embedding vectors: :math:`x^1_{k, l} + x^2_{l, k}`, where as the :obj:`config.max_embedding_size` dimension
:math:`j` is factorized into :math:`k \text{ and } l`. This design ensures that each position embedding vector
:math:`x_j` is unique.
Using the above example again, axial position encoding with :math:`d^1 = 2^5, d^2 = 2^5, n_s^1 = 2^9, n_s^2 = 2^{10}` can drastically reduced the number of parameters to :math:`2^{14} + 2^{15} \approx 49000` parameters.
In practice, the parameter ``config.axial_pos_embds_dim`` is set to ``list``:math:`(d^1, d^2)` which sum has to be equal to ``config.hidden_size`` and ``config.axial_pos_shape`` is set to ``list``:math:`(n_s^1, n_s^2)` and which product has to be equal to ``config.max_embedding_size`` which during training has to be equal to the ``sequence length`` of the ``input_ids``.
Using the above example again, axial position encoding with :math:`d^1 = 2^5, d^2 = 2^5, n_s^1 = 2^9, n_s^2 = 2^{10}`
can drastically reduced the number of parameters to :math:`2^{14} + 2^{15} \approx 49000` parameters.
In practice, the parameter :obj:`config.axial_pos_embds_dim` is set to a tuple :math:`(d^1, d^2)` which sum has to
be equal to :obj:`config.hidden_size` and :obj:`config.axial_pos_shape` is set to a tuple :math:`(n_s^1, n_s^2)` which
product has to be equal to :obj:`config.max_embedding_size`, which during training has to be equal to the
`sequence length` of the :obj:`input_ids`.
LSH Self Attention
~~~~~~~~~~~~~~~~~~~~
In Locality sensitive hashing (LSH) self attention the key and query projection weights are tied. Therefore, the key query embedding vectors are also tied.
LSH self attention uses the locality sensitive
hashing mechanism proposed in `Practical and Optimal LSH for Angular Distance <https://arxiv.org/abs/1509.02897>`_ to assign each of the tied key query embedding vectors to one of ``config.num_buckets`` possible buckets. The premise is that the more "similar" key query embedding vectors (in terms of *cosine similarity*) are to each other, the more likely they are assigned to the same bucket.
The accuracy of the LSH mechanism can be improved by increasing ``config.num_hashes`` or directly the argument ``num_hashes`` of the forward function so that the output of the LSH self attention better approximates the output of the "normal" full self attention.
The buckets are then sorted and chunked into query key embedding vector chunks each of length ``config.lsh_chunk_length``. For each chunk, the query embedding vectors attend to its key vectors (which are tied to themselves) and to the key embedding vectors of ``config.lsh_num_chunks_before`` previous neighboring chunks and ``config.lsh_num_chunks_after`` following neighboring chunks.
For more information, see the `original Paper <https://arxiv.org/abs/2001.04451>`_ or this great `blog post <https://www.pragmatic.ml/reformer-deep-dive/>`_.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In Locality sensitive hashing (LSH) self attention the key and query projection weights are tied. Therefore, the key
query embedding vectors are also tied. LSH self attention uses the locality sensitive hashing mechanism proposed in
`Practical and Optimal LSH for Angular Distance <https://arxiv.org/abs/1509.02897>`__ to assign each of the tied key
query embedding vectors to one of :obj:`config.num_buckets` possible buckets. The premise is that the more "similar"
key query embedding vectors (in terms of *cosine similarity*) are to each other, the more likely they are assigned to
the same bucket.
Note that ``config.num_buckets`` can also be factorized into a ``list``:math:`(n_{\text{buckets}}^1, n_{\text{buckets}}^2)`. This way instead of assigning the query key embedding vectors to one of :math:`(1,\ldots, n_{\text{buckets}})` they are assigned to one of :math:`(1-1,\ldots, n_{\text{buckets}}^1-1, \ldots, 1-n_{\text{buckets}}^2, \ldots, n_{\text{buckets}}^1-n_{\text{buckets}}^2)`. This is crucial for very long sequences to save memory.
The accuracy of the LSH mechanism can be improved by increasing :obj:`config.num_hashes` or directly the argument
:obj:`num_hashes` of the forward function so that the output of the LSH self attention better approximates the output
of the "normal" full self attention. The buckets are then sorted and chunked into query key embedding vector chunks
each of length :obj:`config.lsh_chunk_length`. For each chunk, the query embedding vectors attend to its key vectors
(which are tied to themselves) and to the key embedding vectors of :obj:`config.lsh_num_chunks_before` previous
neighboring chunks and :obj:`config.lsh_num_chunks_after` following neighboring chunks.
When training a model from scratch, it is recommended to leave ``config.num_buckets=None``, so that depending on the sequence length a good value for ``num_buckets`` is calculated on the fly. This value will then automatically be saved in the config and should be reused for inference.
For more information, see the `original Paper <https://arxiv.org/abs/2001.04451>`__ or this great `blog post
<https://www.pragmatic.ml/reformer-deep-dive/>`__.
Using LSH self attention, the memory and time complexity of the query-key matmul operation can be reduced from :math:`\mathcal{O}(n_s \times n_s)` to :math:`\mathcal{O}(n_s \times \log(n_s))`, which usually represents the memory and time bottleneck in a transformer model, with :math:`n_s` being the sequence length.
Note that :obj:`config.num_buckets` can also be factorized into a list
:math:`(n_{\text{buckets}}^1, n_{\text{buckets}}^2)`. This way instead of assigning the query key embedding vectors to
one of :math:`(1,\ldots, n_{\text{buckets}})` they are assigned to one of
:math:`(1-1,\ldots, n_{\text{buckets}}^1-1, \ldots, 1-n_{\text{buckets}}^2, \ldots, n_{\text{buckets}}^1-n_{\text{buckets}}^2)`.
This is crucial for very long sequences to save memory.
When training a model from scratch, it is recommended to leave :obj:`config.num_buckets=None`, so that depending on the
sequence length a good value for :obj:`num_buckets` is calculated on the fly. This value will then automatically be
saved in the config and should be reused for inference.
Using LSH self attention, the memory and time complexity of the query-key matmul operation can be reduced from
:math:`\mathcal{O}(n_s \times n_s)` to :math:`\mathcal{O}(n_s \times \log(n_s))`, which usually represents the memory
and time bottleneck in a transformer model, with :math:`n_s` being the sequence length.
Local Self Attention
~~~~~~~~~~~~~~~~~~~~
Local self attention is essentially a "normal" self attention layer with
key, query and value projections, but is chunked so that in each chunk of length ``config.local_chunk_length`` the query embedding vectors only attends to the key embedding vectors in its chunk and to the key embedding vectors of ``config.local_num_chunks_before`` previous neighboring chunks and ``config.local_num_chunks_after`` following neighboring chunks.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Using Local self attention, the memory and time complexity of the query-key matmul operation can be reduced from :math:`\mathcal{O}(n_s \times n_s)` to :math:`\mathcal{O}(n_s \times \log(n_s))`, which usually represents the memory and time bottleneck in a transformer model, with :math:`n_s` being the sequence length.
Local self attention is essentially a "normal" self attention layer with key, query and value projections, but is
chunked so that in each chunk of length :obj:`config.local_chunk_length` the query embedding vectors only attends to
the key embedding vectors in its chunk and to the key embedding vectors of :obj:`config.local_num_chunks_before`
previous neighboring chunks and :obj:`config.local_num_chunks_after` following neighboring chunks.
Using Local self attention, the memory and time complexity of the query-key matmul operation can be reduced from
:math:`\mathcal{O}(n_s \times n_s)` to :math:`\mathcal{O}(n_s \times \log(n_s))`, which usually represents the memory
and time bottleneck in a transformer model, with :math:`n_s` being the sequence length.
Training
~~~~~~~~~~~~~~~~~~~~
During training, we must ensure that the sequence length is set to a value that can be divided by the least common multiple of ``config.lsh_chunk_length`` and ``config.local_chunk_length`` and that the parameters of the Axial Positional Encodings are correctly set as described above. Reformer is very memory efficient so that the model can easily be trained on sequences as long as 64000 tokens.
For training, the ``ReformerModelWithLMHead`` should be used as follows:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
::
During training, we must ensure that the sequence length is set to a value that can be divided by the least common
multiple of :obj:`config.lsh_chunk_length` and :obj:`config.local_chunk_length` and that the parameters of the Axial
Positional Encodings are correctly set as described above. Reformer is very memory efficient so that the model can
easily be trained on sequences as long as 64000 tokens.
For training, the :class:`~transformers.ReformerModelWithLMHead` should be used as follows:
.. code-block::
input_ids = tokenizer.encode('This is a sentence from the training data', return_tensors='pt')
loss = model(input_ids, labels=input_ids)[0]
ReformerConfig
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ReformerConfig
:members:
ReformerTokenizer
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ReformerTokenizer
:members:
:members: save_vocabulary
ReformerModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ReformerModel
:members:
:members: forward
ReformerModelWithLMHead
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ReformerModelWithLMHead
:members:
:members: forward
ReformerForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ReformerForMaskedLM
:members:
:members: forward
ReformerForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ReformerForSequenceClassification
:members: forward
ReformerForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ReformerForQuestionAnswering
:members:
:members: forward

View File

@@ -1,39 +1,40 @@
RetriBERT
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The RetriBERT model was proposed in the blog post
`Explain Anything Like I'm Five: A Model for Open Domain Long Form Question Answering <https://yjernite.github.io/lfqa.html>`__,
RetriBERT is a small model that uses either a single or pair of Bert encoders with lower-dimension projection for dense semantic indexing of text.
The RetriBERT model was proposed in the blog post `Explain Anything Like I'm Five: A Model for Open Domain Long Form
Question Answering <https://yjernite.github.io/lfqa.html>`__. RetriBERT is a small model that uses either a single or
pair of BERT encoders with lower-dimension projection for dense semantic indexing of text.
Code to train and use the model can be found `here <https://github.com/huggingface/transformers/tree/master/examples/distillation>`_.
Code to train and use the model can be found `here
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
RetriBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RetriBertConfig
:members:
RetriBertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RetriBertTokenizer
:members:
RetriBertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RetriBertTokenizerFast
:members:
RetriBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RetriBertModel
:members:
:members: forward

View File

@@ -1,12 +1,12 @@
RoBERTa
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The RoBERTa model was proposed in `RoBERTa: A Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_
by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer,
Veselin Stoyanov. It is based on Google's BERT model released in 2018.
The RoBERTa model was proposed in `RoBERTa: A Robustly Optimized BERT Pretraining Approach
<https://arxiv.org/abs/1907.11692>`_ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer
Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. It is based on Google's BERT model released in 2018.
It builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining
objective and training with much larger mini-batches and learning rates.
@@ -27,22 +27,23 @@ Tips:
- This implementation is the same as :class:`~transformers.BertModel` with a tiny embeddings tweak as well as a
setup for Roberta pretrained models.
- RoBERTa has the same architecture as BERT, but uses a byte-level BPE as a tokenizer (same as GPT-2) and uses a
different pre-training scheme.
- RoBERTa doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token `tokenizer.sep_token` (or `</s>`)
- `Camembert <./camembert.html>`__ is a wrapper around RoBERTa. Refer to this page for usage examples.
different pretraining scheme.
- RoBERTa doesn't have :obj:`token_type_ids`, you don't need to indicate which token belongs to which segment. Just
separate your segments with the separation token :obj:`tokenizer.sep_token` (or :obj:`</s>`)
- :doc:`CamemBERT <camembert>` is a wrapper around RoBERTa. Refer to this page for usage examples.
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_.
RobertaConfig
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaConfig
:members:
RobertaTokenizer
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
@@ -50,91 +51,98 @@ RobertaTokenizer
RobertaTokenizerFast
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaTokenizerFast
:members: build_inputs_with_special_tokens
RobertaModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaModel
:members:
:members: forward
RobertaForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaForCausalLM
:members: forward
RobertaForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaForMaskedLM
:members:
:members: forward
RobertaForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaForSequenceClassification
:members:
:members: forward
RobertaForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaForMultipleChoice
:members:
:members: forward
RobertaForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaForTokenClassification
:members:
:members: forward
RobertaForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaForQuestionAnswering
:members:
:members: forward
TFRobertaModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaModel
:members:
:members: call
TFRobertaForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaForMaskedLM
:members:
:members: call
TFRobertaForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaForSequenceClassification
:members:
:members: call
TFRobertaForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaForMultipleChoice
:members:
:members: call
TFRobertaForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaForTokenClassification
:members:
:members: call
TFRobertaForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaForQuestionAnswering
:members:
:members: call

View File

@@ -1,60 +1,79 @@
T5
----------------------------------------------------
**DISCLAIMER:** This model is still a work in progress, if you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** This model is still a work in progress, if you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__.
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The T5 model was presented in `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer <https://arxiv.org/pdf/1910.10683.pdf>`_ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu in
Here the abstract:
The T5 model was presented in `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
<https://arxiv.org/pdf/1910.10683.pdf>`_ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang,
Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
*Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice.
In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format.
Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks.
By combining the insights from our exploration with scale and our new "Colossal Clean Crawled Corpus", we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.*
The abstract from the paper is the following:
*Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream
task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning
has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of
transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a
text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer
approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration
with scale and our new "Colossal Clean Crawled Corpus", we achieve state-of-the-art results on many benchmarks covering
summarization, question answering, text classification, and more. To facilitate future work on transfer learning for
NLP, we release our dataset, pre-trained models, and code.*
Tips:
- T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised
and supervised tasks and for which each task is converted into a text-to-text format.
T5 works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g.: for translation: *translate English to German: ..., summarize: ...*.
For more information about which prefix to use, it is easiest to look into Appendix D of the `paper <https://arxiv.org/pdf/1910.10683.pdf>`_ .
- For sequence to sequence generation, it is recommended to use ``T5ForConditionalGeneration.generate()``. The method takes care of feeding the encoded input via cross-attention layers to the decoder and auto-regressively generates the decoder output.
- T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which
each task is converted into a text-to-text format. T5 works well on a variety of tasks out-of-the-box by prepending a
different prefix to the input corresponding to each task, e.g., for translation: *translate English to German: ...*,
for summarization: *summarize: ...*.
For more information about which prefix to use, it is easiest to look into Appendix D of the `paper
<https://arxiv.org/pdf/1910.10683.pdf>`__.
- For sequence-to-sequence generation, it is recommended to use :obj:`T5ForConditionalGeneration.generate()``. This
method takes care of feeding the encoded input via cross-attention layers to the decoder and auto-regressively
generates the decoder output.
- T5 uses relative scalar embeddings. Encoder input padding can be done on the left and on the right.
The original code can be found `here <https://github.com/google-research/text-to-text-transfer-transformer>`_.
The original code can be found `here <https://github.com/google-research/text-to-text-transfer-transformer>`__.
Training
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher forcing.
This means that for training we always need an input sequence and a target sequence.
The input sequence is fed to the model using ``input_ids``. The target sequence is shifted to the right, *i.e.* prepended by a start-sequence token and fed to the decoder using the `decoder_input_ids`. In teacher-forcing style, the target sequence is then appended by the EOS token and corresponds to the ``labels``. The PAD token is hereby used as the start-sequence token.
T5 can be trained / fine-tuned both in a supervised and unsupervised fashion.
T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher
forcing. This means that for training we always need an input sequence and a target sequence. The input sequence is fed
to the model using :obj:`input_ids``. The target sequence is shifted to the right, i.e., prepended by a start-sequence
token and fed to the decoder using the :obj:`decoder_input_ids`. In teacher-forcing style, the target sequence is then
appended by the EOS token and corresponds to the :obj:`labels`. The PAD token is hereby used as the start-sequence
token. T5 can be trained / fine-tuned both in a supervised and unsupervised fashion.
- Unsupervised denoising training
In this setup spans of the input sequence are masked by so-called sentinel tokens (*a.k.a* unique mask tokens)
and the output sequence is formed as a concatenation of the same sentinel tokens and the *real* masked tokens.
Each sentinel token represents a unique mask token for this sentence and should start with ``<extra_id_1>``, ``<extra_id_2>``, ... up to ``<extra_id_100>``. As a default 100 sentinel tokens are available in ``T5Tokenizer``.
*E.g.* the sentence "The cute dog walks in the park" with the masks put on "cute dog" and "the" should be processed as follows:
Each sentinel token represents a unique mask token for this sentence and should start with :obj:`<extra_id_0>`,
:obj:`<extra_id_1>`, ... up to :obj:`<extra_id_99>`. As a default, 100 sentinel tokens are available in
:class:`~transformers.T5Tokenizer`.
For instance, the sentence "The cute dog walks in the park" with the masks put on "cute dog" and "the" should be
processed as follows:
::
.. code-block::
input_ids = tokenizer.encode('The <extra_id_1> walks in <extra_id_2> park', return_tensors='pt')
labels = tokenizer.encode('<extra_id_1> cute dog <extra_id_2> the <extra_id_3> </s>', return_tensors='pt')
input_ids = tokenizer.encode('The <extra_id_0> walks in <extra_id_1> park', return_tensors='pt')
labels = tokenizer.encode('<extra_id_0> cute dog <extra_id_1> the <extra_id_2> </s>', return_tensors='pt')
# the forward function automatically creates the correct decoder_input_ids
model(input_ids=input_ids, labels=labels)
- Supervised training
In this setup the input sequence and output sequence are standard sequence to sequence input output mapping.
In translation, *e.g.* the input sequence "The house is wonderful." and output sequence "Das Haus ist wunderbar." should
be processed as follows:
In this setup the input sequence and output sequence are standard sequence-to-sequence input output mapping.
In translation, for instance with the input sequence "The house is wonderful." and output sequence "Das Haus ist
wunderbar.", the sentences should be processed as follows:
::
.. code-block::
input_ids = tokenizer.encode('translate English to German: The house is wonderful. </s>', return_tensors='pt')
labels = tokenizer.encode('Das Haus ist wunderbar. </s>', return_tensors='pt')
@@ -63,43 +82,43 @@ T5 can be trained / fine-tuned both in a supervised and unsupervised fashion.
T5Config
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.T5Config
:members:
T5Tokenizer
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.T5Tokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
create_token_type_ids_from_sequences, prepare_seq2seq_batch, save_vocabulary
T5Model
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.T5Model
:members:
:members: forward
T5ForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.T5ForConditionalGeneration
:members:
:members: forward
TFT5Model
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFT5Model
:members:
:members: call
TFT5ForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFT5ForConditionalGeneration
:members:
:members: call

View File

@@ -1,15 +1,14 @@
Transformer XL
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Transformer-XL model was proposed in
`Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__
by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
It's a causal (uni-directional) transformer with relative positioning (sinusoïdal) embeddings which can reuse
previously computed hidden-states to attend to longer context (memory).
This model also uses adaptive softmax inputs and outputs (tied).
The Transformer-XL model was proposed in `Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
<https://arxiv.org/abs/1901.02860>`__ by Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan
Salakhutdinov. It's a causal (uni-directional) transformer with relative positioning (sinusoïdal) embeddings which can
reuse previously computed hidden-states to attend to longer context (memory). This model also uses adaptive softmax
inputs and outputs (tied).
The abstract from the paper is the following:
@@ -30,53 +29,69 @@ Tips:
The original implementation trains on SQuAD with padding on the left, therefore the padding defaults are set to left.
- Transformer-XL is one of the few models that has no sequence length limit.
The original code can be found `here <https://github.com/kimiyoung/transformer-xl>`_.
The original code can be found `here <https://github.com/kimiyoung/transformer-xl>`__.
TransfoXLConfig
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TransfoXLConfig
:members:
TransfoXLTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TransfoXLTokenizer
:members: save_vocabulary
TransfoXLTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TransfoXLTokenizerFast
:members:
TransfoXL specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_transfo_xl.TransfoXLModelOutput
:members:
.. autoclass:: transformers.modeling_transfo_xl.TransfoXLLMHeadModelOutput
:members:
.. autoclass:: transformers.modeling_tf_transfo_xl.TFTransfoXLModelOutput
:members:
.. autoclass:: transformers.modeling_tf_transfo_xl.TFTransfoXLLMHeadModelOutput
:members:
TransfoXLModel
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TransfoXLModel
:members:
:members: forward
TransfoXLLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TransfoXLLMHeadModel
:members:
:members: forward
TFTransfoXLModel
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFTransfoXLModel
:members:
:members: call
TFTransfoXLLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFTransfoXLLMHeadModel
:members:
:members: call

View File

@@ -1,15 +1,15 @@
XLM
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The XLM model was proposed in `Cross-lingual Language Model Pretraining <https://arxiv.org/abs/1901.07291>`_
by Guillaume Lample*, Alexis Conneau*. It's a transformer pre-trained using one of the following objectives:
The XLM model was proposed in `Cross-lingual Language Model Pretraining <https://arxiv.org/abs/1901.07291>`__ by
Guillaume Lample, Alexis Conneau. It's a transformer pretrained using one of the following objectives:
- a causal language modeling (CLM) objective (next token prediction),
- a masked language modeling (MLM) objective (Bert-like), or
- a Translation Language Modeling (TLM) object (extension of Bert's MLM to multiple language inputs)
- a masked language modeling (MLM) objective (BERT-like), or
- a Translation Language Modeling (TLM) object (extension of BERT's MLM to multiple language inputs)
The abstract from the paper is the following:
@@ -27,98 +27,120 @@ Tips:
- XLM has many different checkpoints, which were trained using different objectives: CLM, MLM or TLM. Make sure to
select the correct objective for your task (e.g. MLM checkpoints are not suitable for generation).
- XLM has multilingual checkpoints which leverage a specific `lang` parameter. Check out the
`multi-lingual <../multilingual.html>`__ page for more information.
- XLM has multilingual checkpoints which leverage a specific :obj:`lang` parameter. Check out the
:doc:`multi-lingual <../multilingual>` page for more information.
The original code can be found `here <https://github.com/facebookresearch/XLM/>`_.
The original code can be found `here <https://github.com/facebookresearch/XLM/>`__.
XLMConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMConfig
:members:
XLMTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
XLM specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_xlm.XLMForQuestionAnsweringOutput
:members:
XLMModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMModel
:members:
:members: forward
XLMWithLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMWithLMHeadModel
:members:
:members: forward
XLMForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMForSequenceClassification
:members:
:members: forward
XLMForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMForMultipleChoice
:members: forward
XLMForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMForTokenClassification
:members: forward
XLMForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMForQuestionAnsweringSimple
:members:
:members: forward
XLMForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMForQuestionAnswering
:members:
:members: forward
TFXLMModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMModel
:members:
:members: call
TFXLMWithLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMWithLMHeadModel
:members:
:members: call
TFXLMForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMForSequenceClassification
:members:
:members: call
TFXLMForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMForMultipleChoice
:members:
:members: call
TFXLMForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMForTokenClassification
:members:
:members: call
TFXLMForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMForQuestionAnsweringSimple
:members:
:members: call

View File

@@ -1,13 +1,14 @@
XLM-RoBERTa
------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The XLM-RoBERTa model was proposed in `Unsupervised Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__
by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán,
Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019.
It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data.
The XLM-RoBERTa model was proposed in `Unsupervised Cross-lingual Representation Learning at Scale
<https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume
Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's
RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl
data.
The abstract from the paper is the following:
@@ -25,24 +26,24 @@ and XNLI benchmarks. We will make XLM-R code, data, and models publicly availabl
Tips:
- XLM-R is a multilingual model trained on 100 different languages. Unlike some XLM multilingual models, it does
not require `lang` tensors to understand which language is used, and should be able to determine the correct
- XLM-RoBERTa is a multilingual model trained on 100 different languages. Unlike some XLM multilingual models, it does
not require :obj:`lang` tensors to understand which language is used, and should be able to determine the correct
language from the input ids.
- This implementation is the same as RoBERTa. Refer to the `documentation of RoBERTa <./roberta.html>`__ for usage
- This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa <roberta>` for usage
examples as well as the information relative to the inputs and outputs.
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/xlmr>`_.
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/xlmr>`__.
XLMRobertaConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaConfig
:members:
XLMRobertaTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
@@ -50,84 +51,91 @@ XLMRobertaTokenizer
XLMRobertaModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaModel
:members:
:members: forward
XLMRobertaForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaForCausalLM
:members: forward
XLMRobertaForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaForMaskedLM
:members:
:members: forward
XLMRobertaForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaForSequenceClassification
:members:
:members: forward
XLMRobertaForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaForMultipleChoice
:members:
:members: forward
XLMRobertaForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaForTokenClassification
:members:
:members: forward
XLMRobertaForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaForQuestionAnswering
:members:
:members: forward
TFXLMRobertaModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaModel
:members:
:members: call
TFXLMRobertaForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaForMaskedLM
:members:
:members: call
TFXLMRobertaForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaForSequenceClassification
:members:
:members: call
TFXLMRobertaForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaForMultipleChoice
:members:
:members: call
TFXLMRobertaForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaForTokenClassification
:members:
:members: call
TFXLMRobertaForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaForQuestionAnswering
:members:
:members: call

View File

@@ -1,14 +1,14 @@
XLNet
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The XLNet model was proposed in `XLNet: Generalized Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`_
by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
XLnet is an extension of the Transformer-XL model pre-trained using an autoregressive method
to learn bidirectional contexts by maximizing the expected likelihood over all permutations
of the input sequence factorization order.
The XLNet model was proposed in `XLNet: Generalized Autoregressive Pretraining for Language Understanding
<https://arxiv.org/abs/1906.08237>`_ by Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov,
Quoc V. Le. XLnet is an extension of the Transformer-XL model pre-trained using an autoregressive method to learn
bidirectional contexts by maximizing the expected likelihood over all permutations of the input sequence factorization
order.
The abstract from the paper is the following:
@@ -24,118 +24,161 @@ a large margin, including question answering, natural language inference, sentim
Tips:
- The specific attention pattern can be controlled at training and test time using the `perm_mask` input.
- Due to the difficulty of training a fully auto-regressive model over various factorization order,
XLNet is pretrained using only a sub-set of the output tokens as target which are selected
with the `target_mapping` input.
- To use XLNet for sequential decoding (i.e. not in fully bi-directional setting), use the `perm_mask` and
`target_mapping` inputs to control the attention span and outputs (see examples in `examples/text-generation/run_generation.py`)
- The specific attention pattern can be controlled at training and test time using the :obj:`perm_mask` input.
- Due to the difficulty of training a fully auto-regressive model over various factorization order, XLNet is pretrained
using only a sub-set of the output tokens as target which are selected with the :obj:`target_mapping` input.
- To use XLNet for sequential decoding (i.e. not in fully bi-directional setting), use the :obj:`perm_mask` and
:obj:`target_mapping` inputs to control the attention span and outputs (see examples in
`examples/text-generation/run_generation.py`)
- XLNet is one of the few models that has no sequence length limit.
The original code can be found `here <https://github.com/zihangdai/xlnet/>`_.
The original code can be found `here <https://github.com/zihangdai/xlnet/>`__.
XLNetConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetConfig
:members:
XLNetTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
XLNet specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_xlnet.XLNetModelOutput
:members:
.. autoclass:: transformers.modeling_xlnet.XLNetLMHeadModelOutput
:members:
.. autoclass:: transformers.modeling_xlnet.XLNetForSequenceClassificationOutput
:members:
.. autoclass:: transformers.modeling_xlnet.XLNetForMultipleChoiceOutput
:members:
.. autoclass:: transformers.modeling_xlnet.XLNetForTokenClassificationOutput
:members:
.. autoclass:: transformers.modeling_xlnet.XLNetForQuestionAnsweringSimpleOutput
:members:
.. autoclass:: transformers.modeling_xlnet.XLNetForQuestionAnsweringOutput
:members:
.. autoclass:: transformers.modeling_tf_xlnet.TFXLNetModelOutput
:members:
.. autoclass:: transformers.modeling_tf_xlnet.TFXLNetLMHeadModelOutput
:members:
.. autoclass:: transformers.modeling_tf_xlnet.TFXLNetForSequenceClassificationOutput
:members:
.. autoclass:: transformers.modeling_tf_xlnet.TFXLNetForMultipleChoiceOutput
:members:
.. autoclass:: transformers.modeling_tf_xlnet.TFXLNetForTokenClassificationOutput
:members:
.. autoclass:: transformers.modeling_tf_xlnet.TFXLNetForQuestionAnsweringSimpleOutput
:members:
XLNetModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetModel
:members:
:members: forward
XLNetLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetLMHeadModel
:members:
:members: forward
XLNetForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetForSequenceClassification
:members:
:members: forward
XLNetForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetForMultipleChoice
:members:
:members: forward
XLNetForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetForTokenClassification
:members:
:members: forward
XLNetForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetForQuestionAnsweringSimple
:members:
:members: forward
XLNetForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetForQuestionAnswering
:members:
:members: forward
TFXLNetModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetModel
:members:
:members: call
TFXLNetLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetLMHeadModel
:members:
:members: call
TFXLNetForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetForSequenceClassification
:members:
:members: call
TFLNetForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetForMultipleChoice
:members:
:members: call
TFXLNetForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetForTokenClassification
:members:
:members: call
TFXLNetForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetForQuestionAnsweringSimple
:members:
:members: call

View File

@@ -1,217 +1,222 @@
Model sharing and uploading
===========================
In this page, we will show you how to share a model you have trained or fine-tuned on new data with the community on
the `model hub <https://huggingface.co/models>`__.
.. note::
You will need to create an account on `huggingface.co <https://huggingface.co/join>`__ for this.
Optionally, you can join an existing organization or create a new one.
Prepare your model for uploading
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We have seen in the :doc:`training tutorial <training>`: how to fine-tune a model on a given task. You have probably
done something similar on your task, either using the model directly in your own training loop or using the
:class:`~.transformers.Trainer`/:class:`~.transformers.TFTrainer` class. Let's see how you can share the result on
the `model hub <https://huggingface.co/models>`__.
Basic steps
^^^^^^^^^^^
..
When #5258 is merged, we can remove the need to create the directory.
First, pick a directory with the name you want your model to have on the model hub (its full name will then be
`username/awesome-name-you-picked` or `organization/awesome-name-you-picked`) and create it with either
::
mkdir path/to/awesome-name-you-picked
or in python
::
import os
os.makedirs("path/to/awesome-name-you-picked")
then you can save your model and tokenizer with:
::
model.save_pretrained("path/to/awesome-name-you-picked")
tokenizer.save_pretrained("path/to/awesome-name-you-picked")
Or, if you're using the Trainer API
::
trainer.save_model("path/to/awesome-name-you-picked")
tokenizer.save_pretrained("path/to/awesome-name-you-picked")
Make your model work on all frameworks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
..
TODO Sylvain: make this automatic during the upload
You probably have your favorite framework, but so will other users! That's why it's best to upload your model with both
PyTorch `and` TensorFlow checkpoints to make it easier to use (if you skip this step, users will still be able to load
your model in another framework, but it will be slower, as it will have to be converted on the fly). Don't worry, it's super easy to do (and in a future version,
it will all be automatic). You will need to install both PyTorch and TensorFlow for this step, but you don't need to
worry about the GPU, so it should be very easy. Check the
`TensorFlow installation page <https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available>`__
and/or the `PyTorch installation page <https://pytorch.org/get-started/locally/#start-locally>`__ to see how.
First check that your model class exists in the other framework, that is try to import the same model by either adding
or removing TF. For instance, if you trained a :class:`~transformers.DistilBertForSequenceClassification`, try to
type
::
from transformers import TFDistilBertForSequenceClassification
and if you trained a :class:`~transformers.TFDistilBertForSequenceClassification`, try to
type
::
from transformers import DistilBertForSequenceClassification
This will give back an error if your model does not exist in the other framework (something that should be pretty rare
since we're aiming for full parity between the two frameworks). In this case, skip this and go to the next step.
Now, if you trained your model in PyTorch and have to create a TensorFlow version, adapt the following code to your
model class:
::
tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
tf_model.save_pretrained("path/to/awesome-name-you-picked")
and if you trained your model in TensorFlow and have to create a PyTorch version, adapt the following code to your
model class:
::
pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
pt_model.save_pretrained("path/to/awesome-name-you-picked")
That's all there is to it!
Check the directory before uploading
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Make sure there are no garbage files in the directory you'll upload. It should only have:
- a `config.json` file, which saves the :doc:`configuration <main_classes/configuration>` of your model ;
- a `pytorch_model.bin` file, which is the PyTorch checkpoint (unless you can't have it for some reason) ;
- a `tf_model.h5` file, which is the TensorFlow checkpoint (unless you can't have it for some reason) ;
- a `special_tokens_map.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save;
- a `tokenizer_config.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save;
- a `vocab.txt`, which is the vocabulary of your tokenizer, part of your :doc:`tokenizer <main_classes/tokenizer>`
save;
- maybe a `added_tokens.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save.
Other files can safely be deleted.
Upload your model with the CLI
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Now go in a terminal and run the following command. It should be in the virtual enviromnent where you installed 🤗
Transformers, since that command :obj:`transformers-cli` comes from the library.
::
transformers-cli login
Then log in using the same credentials as on huggingface.co. To upload your model, just type
::
transformers-cli upload path/to/awesome-name-you-picked/
This will upload the folder containing the weights, tokenizer and configuration we prepared in the previous section.
If you want to upload a single file (a new version of your model, or the other framework checkpoint you want to add),
just type:
::
transformers-cli upload path/to/awesome-name-you-picked/that-file
or
::
transformers-cli upload path/to/awesome-name-you-picked/that-file --filename awesome-name-you-picked/new_name
if you want to change its filename.
This uploads the model to your personal account. If you want your model to be namespaced by your organization name
rather than your username, add the following flag to any command:
::
--organization organization_name
so for instance:
::
transformers-cli upload path/to/awesome-name-you-picked/ --organization organization_name
Your model will then be accessible through its identifier, which is, as we saw above,
`username/awesome-name-you-picked` or `organization/awesome-name-you-picked`.
Add a model card
^^^^^^^^^^^^^^^^
To make sure everyone knows what your model can do, what its limitations and potential bias or ethetical
considerations, please add a README.md model card to the 🤗 Transformers repo under `model_cards/`. It should then be
placed in a subfolder with your username or organization, then another subfolder named like your model
(`awesome-name-you-picked`). Or just click on the "Create a model card on GitHub" button on the model page, it will
get you directly to the right location. If you need one, `here <https://github.com/huggingface/model_card>`__ is a
model card template (meta-suggestions are welcome).
If your model is fine-tuned from another model coming from the model hub (all 🤗 Transformers pretrained models do),
don't forget to link to its model card so that people can fully trace how your model was built.
If you have never made a pull request to the 🤗 Transformers repo, look at the
:doc:`contributing guide <contributing>` to see the steps to follow.
.. Note::
You can also send your model card in the folder you uploaded with the CLI by placing it in a `README.md` file
inside `path/to/awesome-name-you-picked/`.
Using your model
^^^^^^^^^^^^^^^^
Your model now has a page on huggingface.co/models 🔥
Anyone can load it from code:
::
tokenizer = AutoTokenizer.from_pretrained("namespace/awesome-name-you-picked")
model = AutoModel.from_pretrained("namespace/awesome-name-you-picked")
Additional commands
^^^^^^^^^^^^^^^^^^^
You can list all the files you uploaded on the hub like this:
::
transformers-cli s3 ls
You can also delete unneeded files with
::
transformers-cli s3 rm awesome-name-you-picked/filename
Model sharing and uploading
=======================================================================================================================
In this page, we will show you how to share a model you have trained or fine-tuned on new data with the community on
the `model hub <https://huggingface.co/models>`__.
.. note::
You will need to create an account on `huggingface.co <https://huggingface.co/join>`__ for this.
Optionally, you can join an existing organization or create a new one.
Prepare your model for uploading
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We have seen in the :doc:`training tutorial <training>`: how to fine-tune a model on a given task. You have probably
done something similar on your task, either using the model directly in your own training loop or using the
:class:`~.transformers.Trainer`/:class:`~.transformers.TFTrainer` class. Let's see how you can share the result on
the `model hub <https://huggingface.co/models>`__.
Basic steps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
..
When #5258 is merged, we can remove the need to create the directory.
First, pick a directory with the name you want your model to have on the model hub (its full name will then be
`username/awesome-name-you-picked` or `organization/awesome-name-you-picked`) and create it with either
.. code-block::
mkdir path/to/awesome-name-you-picked
or in python
.. code-block::
import os
os.makedirs("path/to/awesome-name-you-picked")
then you can save your model and tokenizer with:
.. code-block::
model.save_pretrained("path/to/awesome-name-you-picked")
tokenizer.save_pretrained("path/to/awesome-name-you-picked")
Or, if you're using the Trainer API
.. code-block::
trainer.save_model("path/to/awesome-name-you-picked")
tokenizer.save_pretrained("path/to/awesome-name-you-picked")
Make your model work on all frameworks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
..
TODO Sylvain: make this automatic during the upload
You probably have your favorite framework, but so will other users! That's why it's best to upload your model with both
PyTorch `and` TensorFlow checkpoints to make it easier to use (if you skip this step, users will still be able to load
your model in another framework, but it will be slower, as it will have to be converted on the fly). Don't worry, it's super easy to do (and in a future version,
it will all be automatic). You will need to install both PyTorch and TensorFlow for this step, but you don't need to
worry about the GPU, so it should be very easy. Check the
`TensorFlow installation page <https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available>`__
and/or the `PyTorch installation page <https://pytorch.org/get-started/locally/#start-locally>`__ to see how.
First check that your model class exists in the other framework, that is try to import the same model by either adding
or removing TF. For instance, if you trained a :class:`~transformers.DistilBertForSequenceClassification`, try to
type
.. code-block::
from transformers import TFDistilBertForSequenceClassification
and if you trained a :class:`~transformers.TFDistilBertForSequenceClassification`, try to
type
.. code-block::
from transformers import DistilBertForSequenceClassification
This will give back an error if your model does not exist in the other framework (something that should be pretty rare
since we're aiming for full parity between the two frameworks). In this case, skip this and go to the next step.
Now, if you trained your model in PyTorch and have to create a TensorFlow version, adapt the following code to your
model class:
.. code-block::
tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
tf_model.save_pretrained("path/to/awesome-name-you-picked")
and if you trained your model in TensorFlow and have to create a PyTorch version, adapt the following code to your
model class:
.. code-block::
pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
pt_model.save_pretrained("path/to/awesome-name-you-picked")
That's all there is to it!
Check the directory before uploading
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Make sure there are no garbage files in the directory you'll upload. It should only have:
- a `config.json` file, which saves the :doc:`configuration <main_classes/configuration>` of your model ;
- a `pytorch_model.bin` file, which is the PyTorch checkpoint (unless you can't have it for some reason) ;
- a `tf_model.h5` file, which is the TensorFlow checkpoint (unless you can't have it for some reason) ;
- a `special_tokens_map.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save;
- a `tokenizer_config.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save;
- files named `vocab.json`, `vocab.txt`, `merges.txt`, or similar, which contain the vocabulary of your tokenizer, part of your :doc:`tokenizer <main_classes/tokenizer>` save;
- maybe a `added_tokens.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save.
Other files can safely be deleted.
Upload your model with the CLI
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Now go in a terminal and run the following command. It should be in the virtual enviromnent where you installed 🤗
Transformers, since that command :obj:`transformers-cli` comes from the library.
.. code-block::
transformers-cli login
Then log in using the same credentials as on huggingface.co. To upload your model, just type
.. code-block::
transformers-cli upload path/to/awesome-name-you-picked/
This will upload the folder containing the weights, tokenizer and configuration we prepared in the previous section.
By default you will be prompted to confirm that you want these files to be uploaded. If you are uploading multiple models and need to script that process, you can add `-y` to bypass the prompt. For example:
.. code-block::
transformers-cli upload -y path/to/awesome-name-you-picked/
If you want to upload a single file (a new version of your model, or the other framework checkpoint you want to add),
just type:
.. code-block::
transformers-cli upload path/to/awesome-name-you-picked/that-file
or
.. code-block::
transformers-cli upload path/to/awesome-name-you-picked/that-file --filename awesome-name-you-picked/new_name
if you want to change its filename.
This uploads the model to your personal account. If you want your model to be namespaced by your organization name
rather than your username, add the following flag to any command:
.. code-block::
--organization organization_name
so for instance:
.. code-block::
transformers-cli upload path/to/awesome-name-you-picked/ --organization organization_name
Your model will then be accessible through its identifier, which is, as we saw above,
`username/awesome-name-you-picked` or `organization/awesome-name-you-picked`.
Add a model card
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To make sure everyone knows what your model can do, what its limitations and potential bias or ethetical
considerations, please add a README.md model card to the 🤗 Transformers repo under `model_cards/`. It should then be
placed in a subfolder with your username or organization, then another subfolder named like your model
(`awesome-name-you-picked`). Or just click on the "Create a model card on GitHub" button on the model page, it will
get you directly to the right location. If you need one, `here <https://github.com/huggingface/model_card>`__ is a
model card template (meta-suggestions are welcome).
If your model is fine-tuned from another model coming from the model hub (all 🤗 Transformers pretrained models do),
don't forget to link to its model card so that people can fully trace how your model was built.
If you have never made a pull request to the 🤗 Transformers repo, look at the
:doc:`contributing guide <contributing>` to see the steps to follow.
.. Note::
You can also send your model card in the folder you uploaded with the CLI by placing it in a `README.md` file
inside `path/to/awesome-name-you-picked/`.
Using your model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Your model now has a page on huggingface.co/models 🔥
Anyone can load it from code:
.. code-block::
tokenizer = AutoTokenizer.from_pretrained("namespace/awesome-name-you-picked")
model = AutoModel.from_pretrained("namespace/awesome-name-you-picked")
Additional commands
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can list all the files you uploaded on the hub like this:
.. code-block::
transformers-cli s3 ls
You can also delete unneeded files with
.. code-block::
transformers-cli s3 rm awesome-name-you-picked/filename

File diff suppressed because it is too large Load Diff

View File

@@ -1,5 +1,5 @@
Multi-lingual models
================================================
=======================================================================================================================
Most of the models available in this library are mono-lingual models (English, Chinese and German). A few
multi-lingual models are available and have a different mechanisms than mono-lingual models.
@@ -8,13 +8,13 @@ This page details the usage of these models.
The two models that currently support multiple languages are BERT and XLM.
XLM
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
XLM has a total of 10 different checkpoints, only one of which is mono-lingual. The 9 remaining model checkpoints can
be split in two categories: the checkpoints that make use of language embeddings, and those that don't
XLM & Language Embeddings
------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
This section concerns the following checkpoints:
@@ -82,7 +82,7 @@ The example `run_generation.py <https://github.com/huggingface/transformers/blob
can generate text using the CLM checkpoints from XLM, using the language embeddings.
XLM without Language Embeddings
------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
This section concerns the following checkpoints:
@@ -94,7 +94,7 @@ sentence representations, differently from previously-mentioned XLM checkpoints.
BERT
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
BERT has two checkpoints that can be used for multi-lingual tasks:
@@ -105,7 +105,7 @@ These checkpoints do not require language embeddings at inference time. They sho
used in the context and infer accordingly.
XLM-RoBERTa
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
XLM-RoBERTa was trained on 2.5TB of newly created clean CommonCrawl data in 100 languages. It provides strong
gains over previously released multi-lingual models like mBERT or XLM on downstream taks like classification,

151
docs/source/perplexity.rst Normal file
View File

@@ -0,0 +1,151 @@
Perplexity of fixed-length models
=======================================================================================================================
Perplexity (PPL) is one of the most common metrics for evaluating language
models. Before diving in, we should note that the metric applies specifically
to classical language models (sometimes called autoregressive or causal
language models) and is not well defined for masked language models like BERT
(see :doc:`summary of the models <model_summary>`).
Perplexity is defined as the exponentiated average log-likelihood of a
sequence. If we have a tokenized sequence :math:`X = (x_0, x_1, \dots, x_t)`,
then the perplexity of :math:`X` is,
.. math::
\text{PPL}(X)
= \exp \left\{ {-\frac{1}{t}\sum_i^t \log p_\theta (x_i|x_{<i}) } \right\}
where :math:`\log p_\theta (x_i|x_{<i})` is the log-likelihood of the ith
token conditioned on the preceding tokens :math:`x_{<i}` according to our
model. Intuitively, it can be thought of as an evaluation of the model's
ability to predict uniformly among the set of specified tokens in a corpus.
Importantly, this means that the tokenization procedure has a direct impact
on a model's perplexity which should always be taken into consideration when
comparing different models.
This is also equivalent to the exponentiation of the cross-entropy between
the data and model predictions. For more intuition about perplexity and its
relationship to Bits Per Character (BPC) and data compression, check out this
`fantastic blog post on The Gradient
<https://thegradient.pub/understanding-evaluation-metrics-for-language-models/>`_.
Calculating PPL with fixed-length models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If we weren't limited by a model's context size, we would evaluate the
model's perplexity by autoregressively factorizing a sequence and
conditioning on the entire preceding subsequence at each step, as shown
below.
.. image:: imgs/ppl_full.gif
:width: 600
:alt: Full decomposition of a sequence with unlimited context length
When working with approximate models, however, we typically have a constraint
on the number of tokens the model can process. The largest version
of :doc:`GPT-2 <model_doc/gpt2>`, for example, has a fixed length of 1024
tokens, so we cannot calculate :math:`p_\theta(x_t|x_{<t})` directly when
:math:`t` is greater than 1024.
Instead, the sequence is typically broken into subsequences equal to the
model's maximum input size. If a model's max input size is :math:`k`, we
then approximate the likelihood of a token :math:`x_t` by conditioning only
on the :math:`k-1` tokens that precede it rather than the entire context.
When evaluating the model's perplexity of a sequence, a tempting but
suboptimal approach is to break the sequence into disjoint chunks and
add up the decomposed log-likelihoods of each segment independently.
.. image:: imgs/ppl_chunked.gif
:width: 600
:alt: Suboptimal PPL not taking advantage of full available context
This is quick to compute since the perplexity of each segment can be computed
in one forward pass, but serves as a poor approximation of the
fully-factorized perplexity and will typically yield a higher (worse) PPL
because the model will have less context at most of the prediction steps.
Instead, the PPL of fixed-length models should be evaluated with a
sliding-window strategy. This involves repeatedly sliding the
context window so that the model has more context when making each
prediction.
.. image:: imgs/ppl_sliding.gif
:width: 600
:alt: Sliding window PPL taking advantage of all available context
This is a closer approximation to the true decomposition of the
sequence probability and will typically yield a more favorable score.
The downside is that it requires a separate forward pass for each token in
the corpus. A good practical compromise is to employ a strided sliding
window, moving the context by larger strides rather than sliding by 1 token a
time. This allows computation to procede much faster while still giving the
model a large context to make predictions at each step.
Example: Calculating perplexity with GPT-2 in 🤗 Transformers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Let's demonstrate this process with GPT-2.
.. code-block:: python
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
device = 'cuda'
model_id = 'gpt2-large'
model = GPT2LMHeadModel.from_pretrained(model_id).to(device)
tokenizer = GPT2TokenizerFast.from_pretrained(model_id)
We'll load in the WikiText-2 dataset and evaluate the perplexity using a few
different sliding-window strategies. Since this dataset is small and we're
just doing one forward pass over the set, we can just load and encode the
entire dataset in memory.
.. code-block:: python
from nlp import load_dataset
test = load_dataset('wikitext', 'wikitext-2-raw-v1', split='test')
encodings = tokenizer('\n\n'.join(test['text']), return_tensors='pt')
With 🤗 Transformers, we can simply pass the ``input_ids`` as the ``labels``
to our model, and the average log-likelihood for each token is returned as
the loss. With our sliding window approach, however, there is overlap in the
tokens we pass to the model at each iteration. We don't want the
log-likelihood for the tokens we're just treating as context to be included
in our loss, so we can set these targets to ``-100`` so that they are
ignored. The following is an example of how we could do this with a stride of
``512``. This means that the model will have at least 512 tokens for context
when calculating the conditional likelihood of any one token (provided there
are 512 preceding tokens available to condition on).
.. code-block:: python
max_length = model.config.n_positions
stride = 512
lls = []
for i in tqdm(range(0, encodings.input_ids.size(1), stride)):
begin_loc = max(i + stride - max_length, 0)
end_loc = i + stride
input_ids = encodings.input_ids[:,begin_loc:end_loc].to(device)
target_ids = input_ids.clone()
target_ids[:,:-stride] = -100
with torch.no_grad():
outputs = model(input_ids, labels=target_ids)
log_likelihood = outputs[0] * stride
lls.append(log_likelihood)
ppl = torch.exp(torch.stack(lls).sum() / i)
Running this with the stride length equal to the max input length is
equivalent to the suboptimal, non-sliding-window strategy we discussed above.
The smaller the stride, the more context the model will have in making each
prediction, and the better the reported perplexity will typically be.
When we run the above with ``stride = 1024``, i.e. no overlap, the resulting
PPL is ``19.64``, which is about the same as the ``19.93`` reported in the
GPT-2 paper. By using ``stride = 512`` and thereby employing our striding
window strategy, this jumps down to ``16.53``. This is not only a more
favorable score, but is calculated in a way that is closer to the true
autoregressive decomposition of a sequence likelihood.

View File

@@ -1,5 +1,5 @@
Philosophy
==========
=======================================================================================================================
🤗 Transformers is an opinionated library built for:
@@ -45,12 +45,12 @@ A few other goals:
- A simple/consistent way to add new tokens to the vocabulary and embeddings for fine-tuning.
- Simple ways to mask and prune transformer heads.
- Switch easily between PyTorch and TensorFlow 2.0, allowing training using one framwork and inference using another.
- Switch easily between PyTorch and TensorFlow 2.0, allowing training using one framework and inference using another.
Main concepts
~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The library is build around three types of classes for each model:
The library is built around three types of classes for each model:
- **Model classes** such as :class:`~transformers.BertModel`, which are 30+ PyTorch models
(`torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__) or Keras models
@@ -65,9 +65,9 @@ The library is build around three types of classes for each model:
All these classes can be instantiated from pretrained instances and saved locally using two methods:
- :obj:`from_pretrained()` let you instantiate a model/configuration/tokenizer from a pretrained version either
- :obj:`from_pretrained()` lets you instantiate a model/configuration/tokenizer from a pretrained version either
provided by the library itself (the suported models are provided in the list :doc:`here <pretrained_models>`
or stored locally (or on a server) by the user,
- :obj:`save_pretrained()` let you save a model/configuration/tokenizer locally so that it can be reloaded using
- :obj:`save_pretrained()` lets you save a model/configuration/tokenizer locally so that it can be reloaded using
:obj:`from_pretrained()`.

View File

@@ -1,373 +1,343 @@
Preprocessing data
==================
In this tutorial, we'll explore how to preprocess your data using 🤗 Transformers. The main tool for this is what we
call a :doc:`tokenizer <main_classes/tokenizer>`. You can build one using the tokenizer class associated to the model
you would like to use, or directly with the :class:`~transformers.AutoTokenizer` class.
As we saw in the :doc:`quicktour </quicktour>`, the tokenizer will first split a given text in words (or part of words,
punctuation symbols, etc.) usually called `tokens`. Then it will convert those `tokens` into numbers, to be able to
build a tensor out of them and feed them to the model. It will also add any additional inputs the model might expect to
work properly.
.. note::
If you plan on using a pretrained model, it's important to use the associated pretrained tokenizer: it will split
the text you give it in tokens the same way for the pretraining corpus, and it will use the same correspondence
token to index (that we usually call a `vocab`) as during pretraining.
To automatically download the vocab used during pretraining or fine-tuning a given model, you can use the
:func:`~transformers.AutoTokenizer.from_pretrained` method:
::
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
Base use
~~~~~~~~
A :class:`~transformers.PreTrainedTokenizer` has many methods, but the only one you need to remember for preprocessing
is its ``__call__``: you just need to feed your sentence to your tokenizer object.
::
encoded_input = tokenizer("Hello, I'm a single sentence!")
print(encoded_input)
This will return a dictionary string to list of ints like this one:
::
{'input_ids': [101, 138, 18696, 155, 1942, 3190, 1144, 1572, 13745, 1104, 159, 9664, 2107, 102],
'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
The `input_ids <glossary.html#input-ids>`__ are the indices corresponding to each token in our sentence. We will see
below what the `attention_mask <glossary.html#attention-mask>`__ is used for and in
:ref:`the next section <sentence-pairs>` the goal of `token_type_ids <glossary.html#token-type-ids>`__.
The tokenizer can decode a list of token ids in a proper sentence:
::
tokenizer.decode(encoded_input["input_ids"])
which should return
::
"[CLS] Hello, I'm a single sentence! [SEP]"
As you can see, the tokenizer automatically added some special tokens that the model expect. Not all model need special
tokens; for instance, if we had used` gtp2-medium` instead of `bert-base-cased` to create our tokenizer, we would have
seen the same sentence as the original one here. You can disable this behavior (which is only advised if you have added
those special tokens yourself) by passing ``add_special_tokens=False``.
If you have several sentences you want to process, you can do this efficiently by sending them as a list to the
tokenizer:
::
batch_sentences = ["Hello I'm a single sentence",
"And another sentence",
"And the very very last one"]
encoded_inputs = tokenizer(batch_sentences)
print(encoded_inputs)
We get back a dictionary once again, this time with values being list of list of ints:
::
{'input_ids': [[101, 8667, 146, 112, 182, 170, 1423, 5650, 102],
[101, 1262, 1330, 5650, 102],
[101, 1262, 1103, 1304, 1304, 1314, 1141, 102]],
'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0]],
'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1]]}
If the purpose of sending several sentences at a time to the tokenizer is to build a batch to feed the model, you will
probably want:
- To pad each sentence to the maximum length there is in your batch.
- To truncate each sentence to the maximum length the model can accept (if applicable).
- To return tensors.
You can do all of this by using the following options when feeding your list of sentences to the tokenizer:
::
## PYTORCH CODE
batch = tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="pt")
print(batch)
## TENSORFLOW CODE
batch = tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="tf")
print(batch)
which should now return a dictionary string to tensor like this:
::
{'input_ids': tensor([[ 101, 8667, 146, 112, 182, 170, 1423, 5650, 102],
[ 101, 1262, 1330, 5650, 102, 0, 0, 0, 0],
[ 101, 1262, 1103, 1304, 1304, 1314, 1141, 102, 0]]),
'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0]]),
'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 0]])}
We can now see what the `attention_mask <glossary.html#attention-mask>`__ is all about: it points out which tokens the
model should pay attention to and which ones it should not (because they represent padding in this case).
Note that if your model does not have a maximum length associated to it, the command above will throw a warning. You
can safely ignore it. You can also pass ``verbose=False`` to stop the tokenizer to throw those kinds of warnings.
.. _sentence-pairs:
Preprocessing pairs of sentences
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sometimes you need to feed pair of sentences to your model. For instance, if you want to classify if two sentences in a
pair are similar, or for question-answering models, which take a context and a question. For BERT models, the input is
then represented like this:
::
[CLS] Sequence A [SEP] Sequence B [SEP]
You can encode a pair of sentences in the format expected by your model by supplying the two sentences as two arguments
(not a list since a list of two sentences will be interpreted as a batch of two single sentences, as we saw before).
::
encoded_input = tokenizer("How old are you?", "I'm 6 years old")
print(encoded_input)
This will once again return a dict string to list of ints:
::
{'input_ids': [101, 1731, 1385, 1132, 1128, 136, 102, 146, 112, 182, 127, 1201, 1385, 102],
'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
This shows us what the `token_type_ids <glossary.html#token-type-ids>`__ are for: they indicate to the model which part
of the inputs correspond to the first sentence and which part corresponds to the second sentence. Note that
`token_type_ids` are not required or handled by all models. By default, a tokenizer will only return the inputs that
its associated model expects. You can force the return (or the non-return) of any of those special arguments by
using ``return_input_ids`` or ``return_token_type_ids``.
If we decode the token ids we obtained, we will see that the special tokens have been properly added.
::
tokenizer.decode(encoded_input["input_ids"])
will return:
::
"[CLS] How old are you? [SEP] I'm 6 years old [SEP]"
If you have a list of pairs of sequences you want to process, you should feed them as two lists to your tokenizer: the
list of first sentences and the list of second sentences:
::
batch_sentences = ["Hello I'm a single sentence",
"And another sentence",
"And the very very last one"]
batch_of_second_sentences = ["I'm a sentence that goes with the first sentence",
"And I should be encoded with the second sentence",
"And I go with the very last one"]
encoded_inputs = tokenizer(batch_sentences, batch_of_second_sentences)
print(encoded_inputs)
will return a dict with the values being list of lists of ints:
::
{'input_ids': [[101, 8667, 146, 112, 182, 170, 1423, 5650, 102, 146, 112, 182, 170, 5650, 1115, 2947, 1114, 1103, 1148, 5650, 102],
[101, 1262, 1330, 5650, 102, 1262, 146, 1431, 1129, 12544, 1114, 1103, 1248, 5650, 102],
[101, 1262, 1103, 1304, 1304, 1314, 1141, 102, 1262, 146, 1301, 1114, 1103, 1304, 1314, 1141, 102]],
'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]}
To double-check what is fed to the model, we can decode each list in `input_ids` one by one:
::
for ids in encoded_inputs["input_ids"]:
print(tokenizer.decode(ids))
which will return:
::
[CLS] Hello I'm a single sentence [SEP] I'm a sentence that goes with the first sentence [SEP]
[CLS] And another sentence [SEP] And I should be encoded with the second sentence [SEP]
[CLS] And the very very last one [SEP] And I go with the very last one [SEP]
Once again, you can automatically pad your inputs to the maximum sentence length in the batch, truncate to the maximum
length the model can accept and return tensors directly with the following:
::
## PYTORCH CODE
batch = tokenizer(batch_sentences, batch_of_second_sentences, padding=True, truncation=True, return_tensors="pt")
## TENSORFLOW CODE
batch = tokenizer(batch_sentences, batch_of_second_sentences, padding=True, truncation=True, return_tensors="tf")
Everything you always wanted to know about padding and truncation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We have seen the commands that will work for most cases (pad your batch to the length of the maximum sentence and
truncate to the maximum length the mode can accept). However, the API supports more strategies if you need them. The
three arguments you need to know for this are :obj:`padding`, :obj:`truncation` and :obj:`max_length`.
- :obj:`padding` controls the padding. It can be a boolean or a string which should be:
- :obj:`True` or :obj:`'longest'` to pad to the longest sequence in the batch (doing no padding if you only provide
a single sequence).
- :obj:`'max_length'` to pad to a length specified by the :obj:`max_length` argument or the maximum length accepted
by the model if no :obj:`max_length` is provided (``max_length=None``). If you only provide a single sequence,
padding will still be applied to it.
- :obj:`False` or :obj:`'do_not_pad'` to not pad the sequences. As we have seen before, this is the default
behavior.
- :obj:`truncation` controls the truncation. It can be a boolean or a string which should be:
- :obj:`True` or :obj:`'only_first'` truncate to a maximum length specified by the :obj:`max_length` argument or
the maximum length accepted by the model if no :obj:`max_length` is provided (``max_length=None``). This will
only truncate the first sentence of a pair if a pair of sequence (or a batch of pairs of sequences) is provided.
- :obj:`'only_second'` truncate to a maximum length specified by the :obj:`max_length` argument or the maximum
length accepted by the model if no :obj:`max_length` is provided (``max_length=None``). This will only truncate
the second sentence of a pair if a pair of sequence (or a batch of pairs of sequences) is provided.
- :obj:`'longest_first'` truncate to a maximum length specified by the :obj:`max_length` argument or the maximum
length accepted by the model if no :obj:`max_length` is provided (``max_length=None``). This will truncate token
by token, removing a token from the longest sequence in the pair until the proper length is reached.
- :obj:`False` or :obj:`'do_not_truncate'` to not truncate the sequences. As we have seen before, this is the
default behavior.
- :obj:`max_length` to control the length of the padding/truncation. It can be an integer or :obj:`None`, in which case
it will default to the maximum length the model can accept. If the model has no specific maximum input length,
truncation/padding to :obj:`max_length` is deactivated.
Here is a table summarizing the recommend way to setup padding and truncation. If you use pair of inputs sequence in
any of the following examples, you can replace :obj:`truncation=True` by a :obj:`STRATEGY` selected in
:obj:`['only_first', 'only_second', 'longest_first']`, i.e. :obj:`truncation='only_second'` or
:obj:`truncation= 'longest_first'` to control how both sequence in the pair are truncated as detailed before.
+--------------------------------------+-----------------------------------+---------------------------------------------------------------------------------------------+
| Truncation | Padding | Instruction |
+======================================+===================================+=============================================================================================+
| no truncation | no padding | :obj:`tokenizer(batch_sentences)` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to max sequence in batch | :obj:`tokenizer(batch_sentences, padding=True)` or |
| | | :obj:`tokenizer(batch_sentences, padding='longest')` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to max model input length | :obj:`tokenizer(batch_sentences, padding='max_length')` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to specific length | :obj:`tokenizer(batch_sentences, padding='max_length', max_length=42)` |
+--------------------------------------+-----------------------------------+---------------------------------------------------------------------------------------------+
| truncation to max model input length | no padding | :obj:`tokenizer(batch_sentences, truncation=True)` or |
| | | :obj:`tokenizer(batch_sentences, truncation=STRATEGY)` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to max sequence in batch | :obj:`tokenizer(batch_sentences, padding=True, truncation=True)` or |
| | | :obj:`tokenizer(batch_sentences, padding=True, truncation=STRATEGY)` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to max model input length | :obj:`tokenizer(batch_sentences, padding='max_length', truncation=True)` or |
| | | :obj:`tokenizer(batch_sentences, padding='max_length', truncation=STRATEGY)` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to specific length | Not possible |
+--------------------------------------+-----------------------------------+---------------------------------------------------------------------------------------------+
| truncation to specific length | no padding | :obj:`tokenizer(batch_sentences, truncation=True, max_length=42)` or |
| | | :obj:`tokenizer(batch_sentences, truncation=STRATEGY, max_length=42)` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to max sequence in batch | :obj:`tokenizer(batch_sentences, padding=True, truncation=True, max_length=42)` or |
| | | :obj:`tokenizer(batch_sentences, padding=True, truncation=STRATEGY, max_length=42)` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to max model input length | Not possible |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to specific length | :obj:`tokenizer(batch_sentences, padding='max_length', truncation=True, max_length=42)` or |
| | | :obj:`tokenizer(batch_sentences, padding='max_length', truncation=STRATEGY, max_length=42)` |
+--------------------------------------+-----------------------------------+---------------------------------------------------------------------------------------------+
Pre-tokenized inputs
~~~~~~~~~~~~~~~~~~~~
The tokenizer also accept pre-tokenized inputs. This is particularly useful when you want to compute labels and extract
predictions in `named entity recognition (NER) <https://en.wikipedia.org/wiki/Named-entity_recognition>`__ or
`part-of-speech tagging (POS tagging) <https://en.wikipedia.org/wiki/Part-of-speech_tagging>`__.
If you want to use pre-tokenized inputs, just set :obj:`is_pretokenized=True` when passing your inputs to the
tokenizer. For instance:
::
encoded_input = tokenizer(["Hello", "I'm", "a", "single", "sentence"], is_pretokenized=True)
print(encoded_input)
will return:
::
{'input_ids': [101, 8667, 146, 112, 182, 170, 1423, 5650, 102],
'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0],
'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]}
Note that the tokenizer still adds the ids of special tokens (if applicable) unless you pass
``add_special_tokens=False``.
This works exactly as before for batch of sentences or batch of pairs of sentences. You can encode a batch of sentences
like this:
::
batch_sentences = [["Hello", "I'm", "a", "single", "sentence"],
["And", "another", "sentence"],
["And", "the", "very", "very", "last", "one"]]
encoded_inputs = tokenizer(batch_sentences, is_pretokenized=True)
or a batch of pair sentences like this:
::
batch_of_second_sentences = [["I'm", "a", "sentence", "that", "goes", "with", "the", "first", "sentence"],
["And", "I", "should", "be", "encoded", "with", "the", "second", "sentence"],
["And", "I", "go", "with", "the", "very", "last", "one"]]
encoded_inputs = tokenizer(batch_sentences, batch_of_second_sentences, is_pretokenized=True)
And you can add padding, truncation as well as directly return tensors like before:
::
## PYTORCH CODE
batch = tokenizer(batch_sentences,
batch_of_second_sentences,
is_pretokenized=True,
padding=True,
truncation=True,
return_tensors="pt")
## TENSORFLOW CODE
batch = tokenizer(batch_sentences,
batch_of_second_sentences,
is_pretokenized=True,
padding=True,
truncation=True,
return_tensors="tf")
Preprocessing data
=======================================================================================================================
In this tutorial, we'll explore how to preprocess your data using 🤗 Transformers. The main tool for this is what we
call a :doc:`tokenizer <main_classes/tokenizer>`. You can build one using the tokenizer class associated to the model
you would like to use, or directly with the :class:`~transformers.AutoTokenizer` class.
As we saw in the :doc:`quicktour </quicktour>`, the tokenizer will first split a given text in words (or part of words,
punctuation symbols, etc.) usually called `tokens`. Then it will convert those `tokens` into numbers, to be able to
build a tensor out of them and feed them to the model. It will also add any additional inputs the model might expect to
work properly.
.. note::
If you plan on using a pretrained model, it's important to use the associated pretrained tokenizer: it will split
the text you give it in tokens the same way for the pretraining corpus, and it will use the same correspondence
token to index (that we usually call a `vocab`) as during pretraining.
To automatically download the vocab used during pretraining or fine-tuning a given model, you can use the
:func:`~transformers.AutoTokenizer.from_pretrained` method:
.. code-block::
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
Base use
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A :class:`~transformers.PreTrainedTokenizer` has many methods, but the only one you need to remember for preprocessing
is its ``__call__``: you just need to feed your sentence to your tokenizer object.
.. code-block::
>>> encoded_input = tokenizer("Hello, I'm a single sentence!")
>>> print(encoded_input)
{'input_ids': [101, 138, 18696, 155, 1942, 3190, 1144, 1572, 13745, 1104, 159, 9664, 2107, 102],
'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
This returns a dictionary string to list of ints.
The `input_ids <glossary.html#input-ids>`__ are the indices corresponding to each token in our sentence. We will see
below what the `attention_mask <glossary.html#attention-mask>`__ is used for and in
:ref:`the next section <sentence-pairs>` the goal of `token_type_ids <glossary.html#token-type-ids>`__.
The tokenizer can decode a list of token ids in a proper sentence:
.. code-block::
>>> tokenizer.decode(encoded_input["input_ids"])
"[CLS] Hello, I'm a single sentence! [SEP]"
As you can see, the tokenizer automatically added some special tokens that the model expect. Not all model need special
tokens; for instance, if we had used` gtp2-medium` instead of `bert-base-cased` to create our tokenizer, we would have
seen the same sentence as the original one here. You can disable this behavior (which is only advised if you have added
those special tokens yourself) by passing ``add_special_tokens=False``.
If you have several sentences you want to process, you can do this efficiently by sending them as a list to the
tokenizer:
.. code-block::
>>> batch_sentences = ["Hello I'm a single sentence",
... "And another sentence",
... "And the very very last one"]
>>> encoded_inputs = tokenizer(batch_sentences)
>>> print(encoded_inputs)
{'input_ids': [[101, 8667, 146, 112, 182, 170, 1423, 5650, 102],
[101, 1262, 1330, 5650, 102],
[101, 1262, 1103, 1304, 1304, 1314, 1141, 102]],
'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0]],
'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1]]}
We get back a dictionary once again, this time with values being list of list of ints.
If the purpose of sending several sentences at a time to the tokenizer is to build a batch to feed the model, you will
probably want:
- To pad each sentence to the maximum length there is in your batch.
- To truncate each sentence to the maximum length the model can accept (if applicable).
- To return tensors.
You can do all of this by using the following options when feeding your list of sentences to the tokenizer:
.. code-block::
>>> ## PYTORCH CODE
>>> batch = tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="pt")
>>> print(batch)
{'input_ids': tensor([[ 101, 8667, 146, 112, 182, 170, 1423, 5650, 102],
[ 101, 1262, 1330, 5650, 102, 0, 0, 0, 0],
[ 101, 1262, 1103, 1304, 1304, 1314, 1141, 102, 0]]),
'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0]]),
'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 0]])}
>>> ## TENSORFLOW CODE
>>> batch = tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="tf")
>>> print(batch)
{'input_ids': tf.Tensor([[ 101, 8667, 146, 112, 182, 170, 1423, 5650, 102],
[ 101, 1262, 1330, 5650, 102, 0, 0, 0, 0],
[ 101, 1262, 1103, 1304, 1304, 1314, 1141, 102, 0]]),
'token_type_ids': tf.Tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0]]),
'attention_mask': tf.Tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 0]])}
It returns a dictionary string to tensor. We can now see what the `attention_mask <glossary.html#attention-mask>`__ is
all about: it points out which tokens the model should pay attention to and which ones it should not (because they
represent padding in this case).
Note that if your model does not have a maximum length associated to it, the command above will throw a warning. You
can safely ignore it. You can also pass ``verbose=False`` to stop the tokenizer to throw those kinds of warnings.
.. _sentence-pairs:
Preprocessing pairs of sentences
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sometimes you need to feed pair of sentences to your model. For instance, if you want to classify if two sentences in a
pair are similar, or for question-answering models, which take a context and a question. For BERT models, the input is
then represented like this: :obj:`[CLS] Sequence A [SEP] Sequence B [SEP]`
You can encode a pair of sentences in the format expected by your model by supplying the two sentences as two arguments
(not a list since a list of two sentences will be interpreted as a batch of two single sentences, as we saw before).
This will once again return a dict string to list of ints:
.. code-block::
>>> encoded_input = tokenizer("How old are you?", "I'm 6 years old")
>>> print(encoded_input)
{'input_ids': [101, 1731, 1385, 1132, 1128, 136, 102, 146, 112, 182, 127, 1201, 1385, 102],
'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
This shows us what the `token_type_ids <glossary.html#token-type-ids>`__ are for: they indicate to the model which part
of the inputs correspond to the first sentence and which part corresponds to the second sentence. Note that
`token_type_ids` are not required or handled by all models. By default, a tokenizer will only return the inputs that
its associated model expects. You can force the return (or the non-return) of any of those special arguments by
using ``return_input_ids`` or ``return_token_type_ids``.
If we decode the token ids we obtained, we will see that the special tokens have been properly added.
.. code-block::
>>> tokenizer.decode(encoded_input["input_ids"])
"[CLS] How old are you? [SEP] I'm 6 years old [SEP]"
If you have a list of pairs of sequences you want to process, you should feed them as two lists to your tokenizer: the
list of first sentences and the list of second sentences:
.. code-block::
>>> batch_sentences = ["Hello I'm a single sentence",
... "And another sentence",
... "And the very very last one"]
>>> batch_of_second_sentences = ["I'm a sentence that goes with the first sentence",
... "And I should be encoded with the second sentence",
... "And I go with the very last one"]
>>> encoded_inputs = tokenizer(batch_sentences, batch_of_second_sentences)
>>> print(encoded_inputs)
{'input_ids': [[101, 8667, 146, 112, 182, 170, 1423, 5650, 102, 146, 112, 182, 170, 5650, 1115, 2947, 1114, 1103, 1148, 5650, 102],
[101, 1262, 1330, 5650, 102, 1262, 146, 1431, 1129, 12544, 1114, 1103, 1248, 5650, 102],
[101, 1262, 1103, 1304, 1304, 1314, 1141, 102, 1262, 146, 1301, 1114, 1103, 1304, 1314, 1141, 102]],
'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]}
As we can see, it returns a dictionary with the values being list of lists of ints.
To double-check what is fed to the model, we can decode each list in `input_ids` one by one:
.. code-block::
>>> for ids in encoded_inputs["input_ids"]:
>>> print(tokenizer.decode(ids))
[CLS] Hello I'm a single sentence [SEP] I'm a sentence that goes with the first sentence [SEP]
[CLS] And another sentence [SEP] And I should be encoded with the second sentence [SEP]
[CLS] And the very very last one [SEP] And I go with the very last one [SEP]
Once again, you can automatically pad your inputs to the maximum sentence length in the batch, truncate to the maximum
length the model can accept and return tensors directly with the following:
.. code-block::
## PYTORCH CODE
batch = tokenizer(batch_sentences, batch_of_second_sentences, padding=True, truncation=True, return_tensors="pt")
## TENSORFLOW CODE
batch = tokenizer(batch_sentences, batch_of_second_sentences, padding=True, truncation=True, return_tensors="tf")
Everything you always wanted to know about padding and truncation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We have seen the commands that will work for most cases (pad your batch to the length of the maximum sentence and
truncate to the maximum length the mode can accept). However, the API supports more strategies if you need them. The
three arguments you need to know for this are :obj:`padding`, :obj:`truncation` and :obj:`max_length`.
- :obj:`padding` controls the padding. It can be a boolean or a string which should be:
- :obj:`True` or :obj:`'longest'` to pad to the longest sequence in the batch (doing no padding if you only provide
a single sequence).
- :obj:`'max_length'` to pad to a length specified by the :obj:`max_length` argument or the maximum length accepted
by the model if no :obj:`max_length` is provided (``max_length=None``). If you only provide a single sequence,
padding will still be applied to it.
- :obj:`False` or :obj:`'do_not_pad'` to not pad the sequences. As we have seen before, this is the default
behavior.
- :obj:`truncation` controls the truncation. It can be a boolean or a string which should be:
- :obj:`True` or :obj:`'only_first'` truncate to a maximum length specified by the :obj:`max_length` argument or
the maximum length accepted by the model if no :obj:`max_length` is provided (``max_length=None``). This will
only truncate the first sentence of a pair if a pair of sequence (or a batch of pairs of sequences) is provided.
- :obj:`'only_second'` truncate to a maximum length specified by the :obj:`max_length` argument or the maximum
length accepted by the model if no :obj:`max_length` is provided (``max_length=None``). This will only truncate
the second sentence of a pair if a pair of sequence (or a batch of pairs of sequences) is provided.
- :obj:`'longest_first'` truncate to a maximum length specified by the :obj:`max_length` argument or the maximum
length accepted by the model if no :obj:`max_length` is provided (``max_length=None``). This will truncate token
by token, removing a token from the longest sequence in the pair until the proper length is reached.
- :obj:`False` or :obj:`'do_not_truncate'` to not truncate the sequences. As we have seen before, this is the
default behavior.
- :obj:`max_length` to control the length of the padding/truncation. It can be an integer or :obj:`None`, in which case
it will default to the maximum length the model can accept. If the model has no specific maximum input length,
truncation/padding to :obj:`max_length` is deactivated.
Here is a table summarizing the recommend way to setup padding and truncation. If you use pair of inputs sequence in
any of the following examples, you can replace :obj:`truncation=True` by a :obj:`STRATEGY` selected in
:obj:`['only_first', 'only_second', 'longest_first']`, i.e. :obj:`truncation='only_second'` or
:obj:`truncation= 'longest_first'` to control how both sequence in the pair are truncated as detailed before.
+--------------------------------------+-----------------------------------+---------------------------------------------------------------------------------------------+
| Truncation | Padding | Instruction |
+======================================+===================================+=============================================================================================+
| no truncation | no padding | :obj:`tokenizer(batch_sentences)` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to max sequence in batch | :obj:`tokenizer(batch_sentences, padding=True)` or |
| | | :obj:`tokenizer(batch_sentences, padding='longest')` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to max model input length | :obj:`tokenizer(batch_sentences, padding='max_length')` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to specific length | :obj:`tokenizer(batch_sentences, padding='max_length', max_length=42)` |
+--------------------------------------+-----------------------------------+---------------------------------------------------------------------------------------------+
| truncation to max model input length | no padding | :obj:`tokenizer(batch_sentences, truncation=True)` or |
| | | :obj:`tokenizer(batch_sentences, truncation=STRATEGY)` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to max sequence in batch | :obj:`tokenizer(batch_sentences, padding=True, truncation=True)` or |
| | | :obj:`tokenizer(batch_sentences, padding=True, truncation=STRATEGY)` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to max model input length | :obj:`tokenizer(batch_sentences, padding='max_length', truncation=True)` or |
| | | :obj:`tokenizer(batch_sentences, padding='max_length', truncation=STRATEGY)` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to specific length | Not possible |
+--------------------------------------+-----------------------------------+---------------------------------------------------------------------------------------------+
| truncation to specific length | no padding | :obj:`tokenizer(batch_sentences, truncation=True, max_length=42)` or |
| | | :obj:`tokenizer(batch_sentences, truncation=STRATEGY, max_length=42)` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to max sequence in batch | :obj:`tokenizer(batch_sentences, padding=True, truncation=True, max_length=42)` or |
| | | :obj:`tokenizer(batch_sentences, padding=True, truncation=STRATEGY, max_length=42)` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to max model input length | Not possible |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to specific length | :obj:`tokenizer(batch_sentences, padding='max_length', truncation=True, max_length=42)` or |
| | | :obj:`tokenizer(batch_sentences, padding='max_length', truncation=STRATEGY, max_length=42)` |
+--------------------------------------+-----------------------------------+---------------------------------------------------------------------------------------------+
Pre-tokenized inputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The tokenizer also accept pre-tokenized inputs. This is particularly useful when you want to compute labels and extract
predictions in `named entity recognition (NER) <https://en.wikipedia.org/wiki/Named-entity_recognition>`__ or
`part-of-speech tagging (POS tagging) <https://en.wikipedia.org/wiki/Part-of-speech_tagging>`__.
.. warning::
Pre-tokenized does not mean your inputs are already tokenized (you wouldn't need to pass them though the tokenizer
if that was the case) but just split into words (which is often the first step in subword tokenization algorithms
like BPE).
If you want to use pre-tokenized inputs, just set :obj:`is_split_into_words=True` when passing your inputs to the
tokenizer. For instance, we have:
.. code-block::
>>> encoded_input = tokenizer(["Hello", "I'm", "a", "single", "sentence"], is_split_into_words=True)
>>> print(encoded_input)
{'input_ids': [101, 8667, 146, 112, 182, 170, 1423, 5650, 102],
'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0],
'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]}
Note that the tokenizer still adds the ids of special tokens (if applicable) unless you pass
``add_special_tokens=False``.
This works exactly as before for batch of sentences or batch of pairs of sentences. You can encode a batch of sentences
like this:
.. code-block::
batch_sentences = [["Hello", "I'm", "a", "single", "sentence"],
["And", "another", "sentence"],
["And", "the", "very", "very", "last", "one"]]
encoded_inputs = tokenizer(batch_sentences, is_split_into_words=True)
or a batch of pair sentences like this:
.. code-block::
batch_of_second_sentences = [["I'm", "a", "sentence", "that", "goes", "with", "the", "first", "sentence"],
["And", "I", "should", "be", "encoded", "with", "the", "second", "sentence"],
["And", "I", "go", "with", "the", "very", "last", "one"]]
encoded_inputs = tokenizer(batch_sentences, batch_of_second_sentences, is_split_into_words=True)
And you can add padding, truncation as well as directly return tensors like before:
.. code-block::
## PYTORCH CODE
batch = tokenizer(batch_sentences,
batch_of_second_sentences,
is_split_into_words=True,
padding=True,
truncation=True,
return_tensors="pt")
## TENSORFLOW CODE
batch = tokenizer(batch_sentences,
batch_of_second_sentences,
is_split_into_words=True,
padding=True,
truncation=True,
return_tensors="tf")

View File

@@ -1,359 +1,418 @@
Pretrained models
================================================
=======================================================================================================================
Here is the full list of the currently provided pretrained models together with a short presentation of each model.
For a list that includes community-uploaded models, refer to `https://huggingface.co/models <https://huggingface.co/models>`__.
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Architecture | Shortcut name | Details of the model |
+===================+============================================================+=======================================================================================================================================+
| BERT | ``bert-base-uncased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on lower-cased English text. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-uncased`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | Trained on lower-cased English text. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased English text. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-cased`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | Trained on cased English text. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-multilingual-uncased`` | | (Original, not recommended) 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on lower-cased text in the top 102 languages with the largest Wikipedias |
| | | |
| | | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-multilingual-cased`` | | (New, **recommended**) 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased text in the top 104 languages with the largest Wikipedias |
| | | |
| | | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-chinese`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased Chinese Simplified and Traditional text. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-german-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased German text by Deepset.ai |
| | | |
| | | (see `details on deepset.ai website <https://deepset.ai/german-bert>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-uncased-whole-word-masking`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | Trained on lower-cased English text using Whole-Word-Masking |
| | | |
| | | (see `details <https://github.com/google-research/bert/#bert>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-cased-whole-word-masking`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | Trained on cased English text using Whole-Word-Masking |
| | | |
| | | (see `details <https://github.com/google-research/bert/#bert>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-uncased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD |
| | | |
| | | (see details of fine-tuning in the `example section <https://github.com/huggingface/transformers/tree/master/examples>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-cased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters |
| | | | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD |
| | | |
| | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-cased-finetuned-mrpc`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | The ``bert-base-cased`` model fine-tuned on MRPC |
| | | |
| | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-german-dbmdz-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased German text by DBMDZ |
| | | |
| | | (see `details on dbmdz repository <https://github.com/dbmdz/german-bert>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-german-dbmdz-uncased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on uncased German text by DBMDZ |
| | | |
| | | (see `details on dbmdz repository <https://github.com/dbmdz/german-bert>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``cl-tohoku/bert-base-japanese`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on Japanese text. Text is tokenized with MeCab and WordPiece. |
| | | | `MeCab <https://taku910.github.io/mecab/>`__ is required for tokenization. |
| | | |
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``cl-tohoku/bert-base-japanese-whole-word-masking`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on Japanese text using Whole-Word-Masking. Text is tokenized with MeCab and WordPiece. |
| | | | `MeCab <https://taku910.github.io/mecab/>`__ is required for tokenization. |
| | | |
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``cl-tohoku/bert-base-japanese-char`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on Japanese text. Text is tokenized into characters. |
| | | |
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``cl-tohoku/bert-base-japanese-char-whole-word-masking`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on Japanese text using Whole-Word-Masking. Text is tokenized into characters. |
| | | |
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``TurkuNLP/bert-base-finnish-cased-v1`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased Finnish text. |
| | | |
| | | (see `details on turkunlp.org <http://turkunlp.org/FinBERT/>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``TurkuNLP/bert-base-finnish-uncased-v1`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on uncased Finnish text. |
| | | |
| | | (see `details on turkunlp.org <http://turkunlp.org/FinBERT/>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``wietsedv/bert-base-dutch-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased Dutch text. |
| | | |
| | | (see `details on wietsedv repository <https://github.com/wietsedv/bertje/>`__). |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| GPT | ``openai-gpt`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | OpenAI GPT English model |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| GPT-2 | ``gpt2`` | | 12-layer, 768-hidden, 12-heads, 117M parameters. |
| | | | OpenAI GPT-2 English model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``gpt2-medium`` | | 24-layer, 1024-hidden, 16-heads, 345M parameters. |
| | | | OpenAI's Medium-sized GPT-2 English model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``gpt2-large`` | | 36-layer, 1280-hidden, 20-heads, 774M parameters. |
| | | | OpenAI's Large-sized GPT-2 English model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``gpt2-xl`` | | 48-layer, 1600-hidden, 25-heads, 1558M parameters. |
| | | | OpenAI's XL-sized GPT-2 English model |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Transformer-XL | ``transfo-xl-wt103`` | | 18-layer, 1024-hidden, 16-heads, 257M parameters. |
| | | | English model trained on wikitext-103 |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| XLNet | ``xlnet-base-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | XLNet English model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlnet-large-cased`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | XLNet Large English model |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| XLM | ``xlm-mlm-en-2048`` | | 12-layer, 2048-hidden, 16-heads |
| | | | XLM English model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-ende-1024`` | | 6-layer, 1024-hidden, 8-heads |
| | | | XLM English-German model trained on the concatenation of English and German wikipedia |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-enfr-1024`` | | 6-layer, 1024-hidden, 8-heads |
| | | | XLM English-French model trained on the concatenation of English and French wikipedia |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-enro-1024`` | | 6-layer, 1024-hidden, 8-heads |
| | | | XLM English-Romanian Multi-language model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-xnli15-1024`` | | 12-layer, 1024-hidden, 8-heads |
| | | | XLM Model pre-trained with MLM on the `15 XNLI languages <https://github.com/facebookresearch/XNLI>`__. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-tlm-xnli15-1024`` | | 12-layer, 1024-hidden, 8-heads |
| | | | XLM Model pre-trained with MLM + TLM on the `15 XNLI languages <https://github.com/facebookresearch/XNLI>`__. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-clm-enfr-1024`` | | 6-layer, 1024-hidden, 8-heads |
| | | | XLM English-French model trained with CLM (Causal Language Modeling) on the concatenation of English and French wikipedia |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-clm-ende-1024`` | | 6-layer, 1024-hidden, 8-heads |
| | | | XLM English-German model trained with CLM (Causal Language Modeling) on the concatenation of English and German wikipedia |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-17-1280`` | | 16-layer, 1280-hidden, 16-heads |
| | | | XLM model trained with MLM (Masked Language Modeling) on 17 languages. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-100-1280`` | | 16-layer, 1280-hidden, 16-heads |
| | | | XLM model trained with MLM (Masked Language Modeling) on 100 languages. |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| RoBERTa | ``roberta-base`` | | 12-layer, 768-hidden, 12-heads, 125M parameters |
| | | | RoBERTa using the BERT-base architecture |
| | | |
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``roberta-large`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
| | | | RoBERTa using the BERT-large architecture |
| | | |
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``roberta-large-mnli`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
| | | | ``roberta-large`` fine-tuned on `MNLI <http://www.nyu.edu/projects/bowman/multinli/>`__. |
| | | |
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilroberta-base`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
| | | | The DistilRoBERTa model distilled from the RoBERTa model `roberta-base` checkpoint. |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``roberta-base-openai-detector`` | | 12-layer, 768-hidden, 12-heads, 125M parameters |
| | | | ``roberta-base`` fine-tuned by OpenAI on the outputs of the 1.5B-parameter GPT-2 model. |
| | | |
| | | (see `details <https://github.com/openai/gpt-2-output-dataset/tree/master/detector>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``roberta-large-openai-detector`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
| | | | ``roberta-large`` fine-tuned by OpenAI on the outputs of the 1.5B-parameter GPT-2 model. |
| | | |
| | | (see `details <https://github.com/openai/gpt-2-output-dataset/tree/master/detector>`__) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| DistilBERT | ``distilbert-base-uncased`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilbert-base-uncased-distilled-squad`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint, with an additional linear layer. |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilbert-base-cased`` | | 6-layer, 768-hidden, 12-heads, 65M parameters |
| | | | The DistilBERT model distilled from the BERT model `bert-base-cased` checkpoint |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilbert-base-cased-distilled-squad`` | | 6-layer, 768-hidden, 12-heads, 65M parameters |
| | | | The DistilBERT model distilled from the BERT model `bert-base-cased` checkpoint, with an additional question answering layer. |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilgpt2`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
| | | | The DistilGPT2 model distilled from the GPT2 model `gpt2` checkpoint. |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilbert-base-german-cased`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
| | | | The German DistilBERT model distilled from the German DBMDZ BERT model `bert-base-german-dbmdz-cased` checkpoint. |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilbert-base-multilingual-cased`` | | 6-layer, 768-hidden, 12-heads, 134M parameters |
| | | | The multilingual DistilBERT model distilled from the Multilingual BERT model `bert-base-multilingual-cased` checkpoint. |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| CTRL | ``ctrl`` | | 48-layer, 1280-hidden, 16-heads, 1.6B parameters |
| | | | Salesforce's Large-sized CTRL English model |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| CamemBERT | ``camembert-base`` | | 12-layer, 768-hidden, 12-heads, 110M parameters |
| | | | CamemBERT using the BERT-base architecture |
| | | |
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/camembert>`__) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| ALBERT | ``albert-base-v1`` | | 12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters |
| | | | ALBERT base model |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-large-v1`` | | 24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters |
| | | | ALBERT large model |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-xlarge-v1`` | | 24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters |
| | | | ALBERT xlarge model |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-xxlarge-v1`` | | 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, 223M parameters |
| | | | ALBERT xxlarge model |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-base-v2`` | | 12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters |
| | | | ALBERT base model with no dropout, additional training data and longer training |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-large-v2`` | | 24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters |
| | | | ALBERT large model with no dropout, additional training data and longer training |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-xlarge-v2`` | | 24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters |
| | | | ALBERT xlarge model with no dropout, additional training data and longer training |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-xxlarge-v2`` | | 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, 223M parameters |
| | | | ALBERT xxlarge model with no dropout, additional training data and longer training |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| T5 | ``t5-small`` | | ~60M parameters with 6-layers, 512-hidden-state, 2048 feed-forward hidden-state, 8-heads, |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``t5-base`` | | ~220M parameters with 12-layers, 768-hidden-state, 3072 feed-forward hidden-state, 12-heads, |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``t5-large`` | | ~770M parameters with 24-layers, 1024-hidden-state, 4096 feed-forward hidden-state, 16-heads, |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``t5-3B`` | | ~2.8B parameters with 24-layers, 1024-hidden-state, 16384 feed-forward hidden-state, 32-heads, |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``t5-11B`` | | ~11B parameters with 24-layers, 1024-hidden-state, 65536 feed-forward hidden-state, 128-heads, |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| XLM-RoBERTa | ``xlm-roberta-base`` | | ~125M parameters with 12-layers, 768-hidden-state, 3072 feed-forward hidden-state, 8-heads, |
| | | | Trained on on 2.5 TB of newly created clean CommonCrawl data in 100 languages |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-roberta-large`` | | ~355M parameters with 24-layers, 1027-hidden-state, 4096 feed-forward hidden-state, 16-heads, |
| | | | Trained on 2.5 TB of newly created clean CommonCrawl data in 100 languages |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| FlauBERT | ``flaubert/flaubert_small_cased`` | | 6-layer, 512-hidden, 8-heads, 54M parameters |
| | | | FlauBERT small architecture |
| | | |
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``flaubert/flaubert_base_uncased`` | | 12-layer, 768-hidden, 12-heads, 137M parameters |
| | | | FlauBERT base architecture with uncased vocabulary |
| | | |
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``flaubert/flaubert_base_cased`` | | 12-layer, 768-hidden, 12-heads, 138M parameters |
| | | | FlauBERT base architecture with cased vocabulary |
| | | |
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``flaubert/flaubert_large_cased`` | | 24-layer, 1024-hidden, 16-heads, 373M parameters |
| | | | FlauBERT large architecture |
| | | |
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Bart | ``facebook/bart-large`` | | 24-layer, 1024-hidden, 16-heads, 406M parameters |
| | | |
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/bart>`_) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``facebook/bart-base`` | | 12-layer, 768-hidden, 16-heads, 139M parameters |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``facebook/bart-large-mnli`` | | Adds a 2 layer classification head with 1 million parameters |
| | | | bart-large base architecture with a classification head, finetuned on MNLI |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``facebook/bart-large-cnn`` | | 12-layer, 1024-hidden, 16-heads, 406M parameters (same as base) |
| | | | bart-large base architecture finetuned on cnn summarization task |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``facebook/mbart-large-en-ro`` | | 12-layer, 1024-hidden, 16-heads, 880M parameters |
| | | | bart-large architecture pretrained on cc25 multilingual data , finetuned on WMT english romanian translation. |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| DialoGPT | ``DialoGPT-small`` | | 12-layer, 768-hidden, 12-heads, 124M parameters |
| | | | Trained on English text: 147M conversation-like exchanges extracted from Reddit. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``DialoGPT-medium`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
| | | | Trained on English text: 147M conversation-like exchanges extracted from Reddit. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``DialoGPT-large`` | | 36-layer, 1280-hidden, 20-heads, 774M parameters |
| | | | Trained on English text: 147M conversation-like exchanges extracted from Reddit. |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Reformer | ``reformer-enwik8`` | | 12-layer, 1024-hidden, 8-heads, 149M parameters |
| | | | Trained on English Wikipedia data - enwik8. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``reformer-crime-and-punishment`` | | 6-layer, 256-hidden, 2-heads, 3M parameters |
| | | | Trained on English text: Crime and Punishment novel by Fyodor Dostoyevsky. |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| MarianMT | ``Helsinki-NLP/opus-mt-{src}-{tgt}`` | | 12-layer, 512-hidden, 8-heads, ~74M parameter Machine translation models. Parameter counts vary depending on vocab size. |
| | | | (see `model list <https://huggingface.co/Helsinki-NLP>`_) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Longformer | ``allenai/longformer-base-4096`` | | 12-layer, 768-hidden, 12-heads, ~149M parameters |
| | | | Starting from RoBERTa-base checkpoint, trained on documents of max length 4,096 |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``allenai/longformer-large-4096`` | | 24-layer, 1024-hidden, 16-heads, ~435M parameters |
| | | | Starting from RoBERTa-large checkpoint, trained on documents of max length 4,096 |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Architecture | Shortcut name | Details of the model |
+====================+============================================================+=======================================================================================================================================+
| BERT | ``bert-base-uncased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on lower-cased English text. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-uncased`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | Trained on lower-cased English text. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased English text. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-cased`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | Trained on cased English text. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-multilingual-uncased`` | | (Original, not recommended) 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on lower-cased text in the top 102 languages with the largest Wikipedias |
| | | |
| | | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-multilingual-cased`` | | (New, **recommended**) 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased text in the top 104 languages with the largest Wikipedias |
| | | |
| | | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-chinese`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased Chinese Simplified and Traditional text. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-german-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased German text by Deepset.ai |
| | | |
| | | (see `details on deepset.ai website <https://deepset.ai/german-bert>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-uncased-whole-word-masking`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | Trained on lower-cased English text using Whole-Word-Masking |
| | | |
| | | (see `details <https://github.com/google-research/bert/#bert>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-cased-whole-word-masking`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | Trained on cased English text using Whole-Word-Masking |
| | | |
| | | (see `details <https://github.com/google-research/bert/#bert>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-uncased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD |
| | | |
| | | (see details of fine-tuning in the `example section <https://github.com/huggingface/transformers/tree/master/examples>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-cased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters |
| | | | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD |
| | | |
| | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-cased-finetuned-mrpc`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | The ``bert-base-cased`` model fine-tuned on MRPC |
| | | |
| | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-german-dbmdz-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased German text by DBMDZ |
| | | |
| | | (see `details on dbmdz repository <https://github.com/dbmdz/german-bert>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-german-dbmdz-uncased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on uncased German text by DBMDZ |
| | | |
| | | (see `details on dbmdz repository <https://github.com/dbmdz/german-bert>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``cl-tohoku/bert-base-japanese`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on Japanese text. Text is tokenized with MeCab and WordPiece and this requires some extra dependencies, |
| | | | `fugashi <https://github.com/polm/fugashi>`__ which is a wrapper around `MeCab <https://taku910.github.io/mecab/>`__. |
| | | | Use ``pip install transformers["ja"]`` (or ``pip install -e .["ja"]`` if you install from source) to install them. |
| | | |
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``cl-tohoku/bert-base-japanese-whole-word-masking`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on Japanese text. Text is tokenized with MeCab and WordPiece and this requires some extra dependencies, |
| | | | `fugashi <https://github.com/polm/fugashi>`__ which is a wrapper around `MeCab <https://taku910.github.io/mecab/>`__. |
| | | | Use ``pip install transformers["ja"]`` (or ``pip install -e .["ja"]`` if you install from source) to install them. |
| | | |
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``cl-tohoku/bert-base-japanese-char`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on Japanese text. Text is tokenized into characters. |
| | | |
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``cl-tohoku/bert-base-japanese-char-whole-word-masking`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on Japanese text using Whole-Word-Masking. Text is tokenized into characters. |
| | | |
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``TurkuNLP/bert-base-finnish-cased-v1`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased Finnish text. |
| | | |
| | | (see `details on turkunlp.org <http://turkunlp.org/FinBERT/>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``TurkuNLP/bert-base-finnish-uncased-v1`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on uncased Finnish text. |
| | | |
| | | (see `details on turkunlp.org <http://turkunlp.org/FinBERT/>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``wietsedv/bert-base-dutch-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased Dutch text. |
| | | |
| | | (see `details on wietsedv repository <https://github.com/wietsedv/bertje/>`__). |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| GPT | ``openai-gpt`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | OpenAI GPT English model |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| GPT-2 | ``gpt2`` | | 12-layer, 768-hidden, 12-heads, 117M parameters. |
| | | | OpenAI GPT-2 English model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``gpt2-medium`` | | 24-layer, 1024-hidden, 16-heads, 345M parameters. |
| | | | OpenAI's Medium-sized GPT-2 English model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``gpt2-large`` | | 36-layer, 1280-hidden, 20-heads, 774M parameters. |
| | | | OpenAI's Large-sized GPT-2 English model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``gpt2-xl`` | | 48-layer, 1600-hidden, 25-heads, 1558M parameters. |
| | | | OpenAI's XL-sized GPT-2 English model |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Transformer-XL | ``transfo-xl-wt103`` | | 18-layer, 1024-hidden, 16-heads, 257M parameters. |
| | | | English model trained on wikitext-103 |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| XLNet | ``xlnet-base-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | XLNet English model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlnet-large-cased`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | XLNet Large English model |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| XLM | ``xlm-mlm-en-2048`` | | 12-layer, 2048-hidden, 16-heads |
| | | | XLM English model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-ende-1024`` | | 6-layer, 1024-hidden, 8-heads |
| | | | XLM English-German model trained on the concatenation of English and German wikipedia |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-enfr-1024`` | | 6-layer, 1024-hidden, 8-heads |
| | | | XLM English-French model trained on the concatenation of English and French wikipedia |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-enro-1024`` | | 6-layer, 1024-hidden, 8-heads |
| | | | XLM English-Romanian Multi-language model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-xnli15-1024`` | | 12-layer, 1024-hidden, 8-heads |
| | | | XLM Model pre-trained with MLM on the `15 XNLI languages <https://github.com/facebookresearch/XNLI>`__. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-tlm-xnli15-1024`` | | 12-layer, 1024-hidden, 8-heads |
| | | | XLM Model pre-trained with MLM + TLM on the `15 XNLI languages <https://github.com/facebookresearch/XNLI>`__. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-clm-enfr-1024`` | | 6-layer, 1024-hidden, 8-heads |
| | | | XLM English-French model trained with CLM (Causal Language Modeling) on the concatenation of English and French wikipedia |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-clm-ende-1024`` | | 6-layer, 1024-hidden, 8-heads |
| | | | XLM English-German model trained with CLM (Causal Language Modeling) on the concatenation of English and German wikipedia |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-17-1280`` | | 16-layer, 1280-hidden, 16-heads |
| | | | XLM model trained with MLM (Masked Language Modeling) on 17 languages. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-100-1280`` | | 16-layer, 1280-hidden, 16-heads |
| | | | XLM model trained with MLM (Masked Language Modeling) on 100 languages. |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| RoBERTa | ``roberta-base`` | | 12-layer, 768-hidden, 12-heads, 125M parameters |
| | | | RoBERTa using the BERT-base architecture |
| | | |
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``roberta-large`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
| | | | RoBERTa using the BERT-large architecture |
| | | |
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``roberta-large-mnli`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
| | | | ``roberta-large`` fine-tuned on `MNLI <http://www.nyu.edu/projects/bowman/multinli/>`__. |
| | | |
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilroberta-base`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
| | | | The DistilRoBERTa model distilled from the RoBERTa model `roberta-base` checkpoint. |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``roberta-base-openai-detector`` | | 12-layer, 768-hidden, 12-heads, 125M parameters |
| | | | ``roberta-base`` fine-tuned by OpenAI on the outputs of the 1.5B-parameter GPT-2 model. |
| | | |
| | | (see `details <https://github.com/openai/gpt-2-output-dataset/tree/master/detector>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``roberta-large-openai-detector`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
| | | | ``roberta-large`` fine-tuned by OpenAI on the outputs of the 1.5B-parameter GPT-2 model. |
| | | |
| | | (see `details <https://github.com/openai/gpt-2-output-dataset/tree/master/detector>`__) |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| DistilBERT | ``distilbert-base-uncased`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilbert-base-uncased-distilled-squad`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint, with an additional linear layer. |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilbert-base-cased`` | | 6-layer, 768-hidden, 12-heads, 65M parameters |
| | | | The DistilBERT model distilled from the BERT model `bert-base-cased` checkpoint |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilbert-base-cased-distilled-squad`` | | 6-layer, 768-hidden, 12-heads, 65M parameters |
| | | | The DistilBERT model distilled from the BERT model `bert-base-cased` checkpoint, with an additional question answering layer. |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilgpt2`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
| | | | The DistilGPT2 model distilled from the GPT2 model `gpt2` checkpoint. |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilbert-base-german-cased`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
| | | | The German DistilBERT model distilled from the German DBMDZ BERT model `bert-base-german-dbmdz-cased` checkpoint. |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilbert-base-multilingual-cased`` | | 6-layer, 768-hidden, 12-heads, 134M parameters |
| | | | The multilingual DistilBERT model distilled from the Multilingual BERT model `bert-base-multilingual-cased` checkpoint. |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| CTRL | ``ctrl`` | | 48-layer, 1280-hidden, 16-heads, 1.6B parameters |
| | | | Salesforce's Large-sized CTRL English model |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| CamemBERT | ``camembert-base`` | | 12-layer, 768-hidden, 12-heads, 110M parameters |
| | | | CamemBERT using the BERT-base architecture |
| | | |
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/camembert>`__) |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| ALBERT | ``albert-base-v1`` | | 12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters |
| | | | ALBERT base model |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-large-v1`` | | 24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters |
| | | | ALBERT large model |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-xlarge-v1`` | | 24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters |
| | | | ALBERT xlarge model |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-xxlarge-v1`` | | 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, 223M parameters |
| | | | ALBERT xxlarge model |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-base-v2`` | | 12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters |
| | | | ALBERT base model with no dropout, additional training data and longer training |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-large-v2`` | | 24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters |
| | | | ALBERT large model with no dropout, additional training data and longer training |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-xlarge-v2`` | | 24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters |
| | | | ALBERT xlarge model with no dropout, additional training data and longer training |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-xxlarge-v2`` | | 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, 223M parameters |
| | | | ALBERT xxlarge model with no dropout, additional training data and longer training |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| T5 | ``t5-small`` | | ~60M parameters with 6-layers, 512-hidden-state, 2048 feed-forward hidden-state, 8-heads, |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``t5-base`` | | ~220M parameters with 12-layers, 768-hidden-state, 3072 feed-forward hidden-state, 12-heads, |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``t5-large`` | | ~770M parameters with 24-layers, 1024-hidden-state, 4096 feed-forward hidden-state, 16-heads, |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``t5-3B`` | | ~2.8B parameters with 24-layers, 1024-hidden-state, 16384 feed-forward hidden-state, 32-heads, |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``t5-11B`` | | ~11B parameters with 24-layers, 1024-hidden-state, 65536 feed-forward hidden-state, 128-heads, |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| XLM-RoBERTa | ``xlm-roberta-base`` | | ~125M parameters with 12-layers, 768-hidden-state, 3072 feed-forward hidden-state, 8-heads, |
| | | | Trained on on 2.5 TB of newly created clean CommonCrawl data in 100 languages |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-roberta-large`` | | ~355M parameters with 24-layers, 1027-hidden-state, 4096 feed-forward hidden-state, 16-heads, |
| | | | Trained on 2.5 TB of newly created clean CommonCrawl data in 100 languages |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| FlauBERT | ``flaubert/flaubert_small_cased`` | | 6-layer, 512-hidden, 8-heads, 54M parameters |
| | | | FlauBERT small architecture |
| | | |
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``flaubert/flaubert_base_uncased`` | | 12-layer, 768-hidden, 12-heads, 137M parameters |
| | | | FlauBERT base architecture with uncased vocabulary |
| | | |
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``flaubert/flaubert_base_cased`` | | 12-layer, 768-hidden, 12-heads, 138M parameters |
| | | | FlauBERT base architecture with cased vocabulary |
| | | |
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``flaubert/flaubert_large_cased`` | | 24-layer, 1024-hidden, 16-heads, 373M parameters |
| | | | FlauBERT large architecture |
| | | |
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Bart | ``facebook/bart-large`` | | 24-layer, 1024-hidden, 16-heads, 406M parameters |
| | | |
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/bart>`_) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``facebook/bart-base`` | | 12-layer, 768-hidden, 16-heads, 139M parameters |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``facebook/bart-large-mnli`` | | Adds a 2 layer classification head with 1 million parameters |
| | | | bart-large base architecture with a classification head, finetuned on MNLI |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``facebook/bart-large-cnn`` | | 12-layer, 1024-hidden, 16-heads, 406M parameters (same as base) |
| | | | bart-large base architecture finetuned on cnn summarization task |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| DialoGPT | ``DialoGPT-small`` | | 12-layer, 768-hidden, 12-heads, 124M parameters |
| | | | Trained on English text: 147M conversation-like exchanges extracted from Reddit. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``DialoGPT-medium`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
| | | | Trained on English text: 147M conversation-like exchanges extracted from Reddit. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``DialoGPT-large`` | | 36-layer, 1280-hidden, 20-heads, 774M parameters |
| | | | Trained on English text: 147M conversation-like exchanges extracted from Reddit. |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Reformer | ``reformer-enwik8`` | | 12-layer, 1024-hidden, 8-heads, 149M parameters |
| | | | Trained on English Wikipedia data - enwik8. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``reformer-crime-and-punishment`` | | 6-layer, 256-hidden, 2-heads, 3M parameters |
| | | | Trained on English text: Crime and Punishment novel by Fyodor Dostoyevsky. |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| MarianMT | ``Helsinki-NLP/opus-mt-{src}-{tgt}`` | | 12-layer, 512-hidden, 8-heads, ~74M parameter Machine translation models. Parameter counts vary depending on vocab size. |
| | | | (see `model list <https://huggingface.co/Helsinki-NLP>`_) |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Pegasus | ``google/pegasus-{dataset}`` | | 16-layer, 1024-hidden, 16-heads, ~568M parameter, 2.2 GB for summary. `model list <https://huggingface.co/models?search=pegasus>`__ |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Longformer | ``allenai/longformer-base-4096`` | | 12-layer, 768-hidden, 12-heads, ~149M parameters |
| | | | Starting from RoBERTa-base checkpoint, trained on documents of max length 4,096 |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``allenai/longformer-large-4096`` | | 24-layer, 1024-hidden, 16-heads, ~435M parameters |
| | | | Starting from RoBERTa-large checkpoint, trained on documents of max length 4,096 |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| MBart | ``facebook/mbart-large-cc25`` | | 24-layer, 1024-hidden, 16-heads, 610M parameters |
| | | | mBART (bart-large architecture) model trained on 25 languages' monolingual corpus |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``facebook/mbart-large-en-ro`` | | 24-layer, 1024-hidden, 16-heads, 610M parameters |
| | | | mbart-large-cc25 model finetuned on WMT english romanian translation. |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Lxmert | ``lxmert-base-uncased`` | | 9-language layers, 9-relationship layers, and 12-cross-modality layers |
| | | | 768-hidden, 12-heads (for each layer) ~ 228M parameters |
| | | | Starting from lxmert-base checkpoint, trained on over 9 million image-text couplets from COCO, VisualGenome, GQA, VQA |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Funnel Transformer | ``funnel-transformer/small`` | | 14 layers: 3 blocks of 4 layers then 2 layers decoder, 768-hidden, 12-heads, 130M parameters |
| | | |
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``funnel-transformer/small-base`` | | 12 layers: 3 blocks of 4 layers (no decoder), 768-hidden, 12-heads, 115M parameters |
| | | |
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``funnel-transformer/medium`` | | 14 layers: 3 blocks 6, 3x2, 3x2 layers then 2 layers decoder, 768-hidden, 12-heads, 130M parameters |
| | | |
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``funnel-transformer/medium-base`` | | 12 layers: 3 blocks 6, 3x2, 3x2 layers(no decoder), 768-hidden, 12-heads, 115M parameters |
| | | |
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``funnel-transformer/intermediate`` | | 20 layers: 3 blocks of 6 layers then 2 layers decoder, 768-hidden, 12-heads, 177M parameters |
| | | |
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``funnel-transformer/intermediate-base`` | | 18 layers: 3 blocks of 6 layers (no decoder), 768-hidden, 12-heads, 161M parameters |
| | | |
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``funnel-transformer/large`` | | 26 layers: 3 blocks of 8 layers then 2 layers decoder, 1024-hidden, 12-heads, 386M parameters |
| | | |
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``funnel-transformer/large-base`` | | 24 layers: 3 blocks of 8 layers (no decoder), 1024-hidden, 12-heads, 358M parameters |
| | | |
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``funnel-transformer/xlarge`` | | 32 layers: 3 blocks of 10 layers then 2 layers decoder, 1024-hidden, 12-heads, 468M parameters |
| | | |
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``funnel-transformer/xlarge-base`` | | 30 layers: 3 blocks of 10 layers (no decoder), 1024-hidden, 12-heads, 440M parameters |
| | | |
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| LayoutLM | ``microsoft/layoutlm-base-uncased`` | | 12 layers, 768-hidden, 12-heads, 113M parameters |
| | | |
| | | (see `details <https://github.com/microsoft/unilm/tree/master/layoutlm>`__) |
+ +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``microsoft/layoutlm-large-uncased`` | | 24 layers, 1024-hidden, 16-heads, 343M parameters |
| | | |
| | | (see `details <https://github.com/microsoft/unilm/tree/master/layoutlm>`__) |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+

View File

@@ -1,5 +1,5 @@
Quick tour
==========
=======================================================================================================================
Let's have a quick look at the 🤗 Transformers library features. The library downloads pretrained models for
Natural Language Understanding (NLU) tasks, such as analyzing the sentiment of a text, and Natural Language Generation (NLG),
@@ -14,7 +14,7 @@ will dig a little bit more and see how the library gives you access to those mod
not, the code is expected to work for both backends without any change needed.
Getting started on a task with a pipeline
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The easiest way to use a pretrained model on a given task is to use :func:`~transformers.pipeline`. 🤗 Transformers
provides the following tasks out of the box:
@@ -108,11 +108,11 @@ any other model from the model hub):
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> model = AutoModelForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
>>> pipe = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
>>> classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
>>> ## TENSORFLOW CODE
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> # This model only exists in PyTorch, so we use the `from_pt` flag to import that model in TensorFlow.
>>> model = TFAutoModelForSequenceClassification.from_pretrained(model_name, from_pt=True)
>>> model = TFAutoModelForSequenceClassification.from_pretrained(model_name, from_pt=True)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
>>> classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
@@ -123,12 +123,12 @@ to share your fine-tuned model on the hub with the community, using :doc:`this t
.. _pretrained-model:
Under the hood: pretrained models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Let's now see what happens beneath the hood when using those pipelines. As we saw, the model and tokenizer are created
using the :obj:`from_pretrained` method:
::
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
@@ -142,11 +142,11 @@ using the :obj:`from_pretrained` method:
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
Using the tokenizer
^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We mentioned the tokenizer is responsible for the preprocessing of your texts. First, it will split a given text in
words (or part of words, punctuation symbols, etc.) usually called `tokens`. There are multiple rules that can govern
that process (you can learn more about them in the :doc:`tokenizer_summary <tokenizer_summary>`, which is why we need
that process (you can learn more about them in the :doc:`tokenizer summary <tokenizer_summary>`, which is why we need
to instantiate the tokenizer using the name of the model, to make sure we use the same rules as when the model was
pretrained.
@@ -191,7 +191,7 @@ and get tensors back. You can specify all of that to the tokenizer:
... return_tensors="tf"
... )
The padding is automatically applied on the side the model expect it (in this case, on the right), with the
The padding is automatically applied on the side expected by the model (in this case, on the right), with the
padding token the model was pretrained with. The attention mask is also adapted to take the padding into account:
.. code-block::
@@ -210,11 +210,11 @@ padding token the model was pretrained with. The attention mask is also adapted
You can learn more about tokenizers :doc:`here <preprocessing>`.
Using the model
^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once your input has been preprocessed by the tokenizer, you can directly send it to the model. As we mentioned, it will
contain all the relevant information the model needs. If you're using a TensorFlow model, you can directly pass the
dictionary keys to tensor, for a PyTorch model, you need to unpack the dictionary by adding :obj:`**`.
Once your input has been preprocessed by the tokenizer, you can send it directly to the model. As we mentioned, it will
contain all the relevant information the model needs. If you're using a TensorFlow model, you can pass the
dictionary keys directly to tensors, for a PyTorch model, you need to unpack the dictionary by adding :obj:`**`.
.. code-block::
@@ -235,9 +235,11 @@ final activations of the model.
>>> ## TENSORFLOW CODE
>>> print(tf_outputs)
(<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[-4.0832963 , 4.3364134 ],
[ 0.08181238, -0.04178794]], dtype=float32)>,)
array([[-4.0832963 , 4.336414 ],
[ 0.08181786, -0.04179301]], dtype=float32)>,)
The model can return more than just the final activations, which is why the output is a tuple. Here we only asked for
the final activations, so we get a tuple with one element.
.. note::
All 🤗 Transformers models (PyTorch or TensorFlow) return the activations of the model *before* the final
@@ -262,7 +264,7 @@ We can see we get the numbers from before:
>>> print(tf_predictions)
tf.Tensor(
[[2.2042994e-04 9.9977952e-01]
[5.3086078e-01 4.6913919e-01]], shape=(2, 2), dtype=float32)
[5.3086340e-01 4.6913657e-01]], shape=(2, 2), dtype=float32)
>>> ## PYTORCH CODE
>>> print(pt_predictions)
tensor([[2.2043e-04, 9.9978e-01],
@@ -285,9 +287,15 @@ training loop. 🤗 Transformers also provides a :class:`~transformers.Trainer`
you are using TensorFlow) class to help with your training (taking care of things such as distributed training, mixed
precision, etc.). See the :doc:`training tutorial <training>` for more details.
Once your model is fine-tuned, you can save it with its tokenizer the following way:
.. note::
::
Pytorch model outputs are special dataclasses so that you can get autocompletion for their attributes in an IDE.
They also behave like a tuple or a dictionary (e.g., you can index with an integer, a slice or a string) in which
case the attributes not set (that have :obj:`None` values) are ignored.
Once your model is fine-tuned, you can save it with its tokenizer in the following way:
.. code-block::
tokenizer.save_pretrained(save_directory)
model.save_pretrained(save_directory)
@@ -297,14 +305,14 @@ directory name instead of the model name. One cool feature of 🤗 Transformers
PyTorch and TensorFlow: any model saved as before can be loaded back either in PyTorch or TensorFlow. If you are
loading a saved PyTorch model in a TensorFlow model, use :func:`~transformers.TFAutoModel.from_pretrained` like this:
::
.. code-block::
tokenizer = AutoTokenizer.from_pretrained(save_directory)
model = TFAutoModel.from_pretrained(save_directory, from_pt=True)
and if you are loading a saved TensorFlow model in a PyTorch model, you should use the following code:
::
.. code-block::
tokenizer = AutoTokenizer.from_pretrained(save_directory)
model = AutoModel.from_pretrained(save_directory, from_tf=True)
@@ -312,7 +320,7 @@ and if you are loading a saved TensorFlow model in a PyTorch model, you should u
Lastly, you can also ask the model to return all hidden states and all attention weights if you need them:
::
.. code-block::
>>> ## PYTORCH CODE
>>> pt_outputs = pt_model(**pt_batch, output_hidden_states=True, output_attentions=True)
@@ -322,14 +330,16 @@ Lastly, you can also ask the model to return all hidden states and all attention
>>> all_hidden_states, all_attentions = tf_outputs[-2:]
Accessing the code
^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The :obj:`AutoModel` and :obj:`AutoTokenizer` classes are just shortcuts that will automatically work with any
pretrained model. Behind the scenes, the library has one model class per combination of architecture plus class, so the
code is easy to access and tweak if you need to.
In our previous example, the model was called "distilbert-base-uncased-finetuned-sst-2-english", which means it's
using the :doc:`DistilBERT </model_doc/distilbert>` architecture. The model automatically created is then a
using the :doc:`DistilBERT </model_doc/distilbert>` architecture. As
:class:`~transformers.AutoModelForSequenceClassification` (or :class:`~transformers.TFAutoModelForSequenceClassification`
if you are using TensorFlow) was used, the model automatically created is then a
:class:`~transformers.DistilBertForSequenceClassification`. You can look at its documentation for all details relevant
to that specific model, or browse the source code. This is how you would directly instantiate model and tokenizer
without the auto magic:
@@ -348,11 +358,11 @@ without the auto magic:
>>> tokenizer = DistilBertTokenizer.from_pretrained(model_name)
Customizing the model
^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you want to change how the model itself is built, you can define your custom configuration class. Each architecture
comes with its own relevant configuration (in the case of DistilBERT, :class:`~transformers.DistilBertConfig`) which
allows you to specify any of the hidden dimension, dropout rate etc. If you do core modifications, like changing the
allows you to specify any of the hidden dimension, dropout rate, etc. If you do core modifications, like changing the
hidden size, you won't be able to use a pretrained model anymore and will need to train from scratch. You would then
instantiate the model directly from this configuration.

View File

@@ -0,0 +1,251 @@
***********************************************************************************************************************
Exporting transformers models
***********************************************************************************************************************
ONNX / ONNXRuntime
=======================================================================================================================
Projects `ONNX (Open Neural Network eXchange) <http://onnx.ai>`_ and `ONNXRuntime (ORT) <https://microsoft.github.io/onnxruntime/>`_ are part of an effort from leading industries in the AI field
to provide a unified and community-driven format to store and, by extension, efficiently execute neural network leveraging a variety
of hardware and dedicated optimizations.
Starting from transformers v2.10.0 we partnered with ONNX Runtime to provide an easy export of transformers models to
the ONNX format. You can have a look at the effort by looking at our joint blog post `Accelerate your NLP pipelines using
Hugging Face Transformers and ONNX Runtime <https://medium.com/microsoftazure/accelerate-your-nlp-pipelines-using-hugging-face-transformers-and-onnx-runtime-2443578f4333>`_.
Exporting a model is done through the script `convert_graph_to_onnx.py` at the root of the transformers sources.
The following command shows how easy it is to export a BERT model from the library, simply run:
.. code-block:: bash
python convert_graph_to_onnx.py --framework <pt, tf> --model bert-base-cased bert-base-cased.onnx
The conversion tool works for both PyTorch and Tensorflow models and ensures:
* The model and its weights are correctly initialized from the Hugging Face model hub or a local checkpoint.
* The inputs and outputs are correctly generated to their ONNX counterpart.
* The generated model can be correctly loaded through onnxruntime.
.. note::
Currently, inputs and outputs are always exported with dynamic sequence axes preventing some optimizations
on the ONNX Runtime. If you would like to see such support for fixed-length inputs/outputs, please
open up an issue on transformers.
Also, the conversion tool supports different options which let you tune the behavior of the generated model:
* **Change the target opset version of the generated model.** (More recent opset generally supports more operators and enables faster inference)
* **Export pipeline-specific prediction heads.** (Allow to export model along with its task-specific prediction head(s))
* **Use the external data format (PyTorch only).** (Lets you export model which size is above 2Gb (`More info <https://github.com/pytorch/pytorch/pull/33062>`_))
Optimizations
-----------------------------------------------------------------------------------------------------------------------
ONNXRuntime includes some transformers-specific transformations to leverage optimized operations in the graph.
Below are some of the operators which can be enabled to speed up inference through ONNXRuntime (*see note below*):
* Constant folding
* Attention Layer fusing
* Skip connection LayerNormalization fusing
* FastGeLU approximation
Some of the optimizations performed by ONNX runtime can be hardware specific and thus lead to different performances
if used on another machine with a different hardware configuration than the one used for exporting the model.
For this reason, when using ``convert_graph_to_onnx.py`` optimizations are not enabled,
ensuring the model can be easily exported to various hardware.
Optimizations can then be enabled when loading the model through ONNX runtime for inference.
.. note::
When quantization is enabled (see below), ``convert_graph_to_onnx.py`` script will enable optimizations on the model
because quantization would modify the underlying graph making it impossible for ONNX runtime to do the optimizations
afterwards.
.. note::
For more information about the optimizations enabled by ONNXRuntime, please have a look at the (`ONNXRuntime Github <https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/transformers>`_)
Quantization
-----------------------------------------------------------------------------------------------------------------------
ONNX exporter supports generating a quantized version of the model to allow efficient inference.
Quantization works by converting the memory representation of the parameters in the neural network
to a compact integer format. By default, weights of a neural network are stored as single-precision float (`float32`)
which can express a wide-range of floating-point numbers with decent precision.
These properties are especially interesting at training where you want fine-grained representation.
On the other hand, after the training phase, it has been shown one can greatly reduce the range and the precision of `float32` numbers
without changing the performances of the neural network.
More technically, `float32` parameters are converted to a type requiring fewer bits to represent each number, thus reducing
the overall size of the model. Here, we are enabling `float32` mapping to `int8` values (a non-floating, single byte, number representation)
according to the following formula:
.. math::
y_{float32} = scale * x_{int8} - zero\_point
.. note::
The quantization process will infer the parameter `scale` and `zero_point` from the neural network parameters
Leveraging tiny-integers has numerous advantages when it comes to inference:
* Storing fewer bits instead of 32 bits for the `float32` reduces the size of the model and makes it load faster.
* Integer operations execute a magnitude faster on modern hardware
* Integer operations require less power to do the computations
In order to convert a transformers model to ONNX IR with quantized weights you just need to specify ``--quantize``
when using ``convert_graph_to_onnx.py``. Also, you can have a look at the ``quantize()`` utility-method in this
same script file.
Example of quantized BERT model export:
.. code-block:: bash
python convert_graph_to_onnx.py --framework <pt, tf> --model bert-base-cased --quantize bert-base-cased.onnx
.. note::
Quantization support requires ONNX Runtime >= 1.4.0
.. note::
When exporting quantized model you will end up with two different ONNX files. The one specified at the end of the
above command will contain the original ONNX model storing `float32` weights.
The second one, with ``-quantized`` suffix, will hold the quantized parameters.
TorchScript
=======================================================================================================================
.. note::
This is the very beginning of our experiments with TorchScript and we are still exploring its capabilities
with variable-input-size models. It is a focus of interest to us and we will deepen our analysis in upcoming
releases, with more code examples, a more flexible implementation, and benchmarks comparing python-based codes
with compiled TorchScript.
According to Pytorch's documentation: "TorchScript is a way to create serializable and optimizable models from PyTorch code".
Pytorch's two modules `JIT and TRACE <https://pytorch.org/docs/stable/jit.html>`_ allow the developer to export
their model to be re-used in other programs, such as efficiency-oriented C++ programs.
We have provided an interface that allows the export of 🤗 Transformers models to TorchScript so that they can
be reused in a different environment than a Pytorch-based python program. Here we explain how to export and use our models using TorchScript.
Exporting a model requires two things:
* a forward pass with dummy inputs.
* model instantiation with the ``torchscript`` flag.
These necessities imply several things developers should be careful about. These are detailed below.
Implications
-----------------------------------------------------------------------------------------------------------------------
TorchScript flag and tied weights
-----------------------------------------------------------------------------------------------------------------------
This flag is necessary because most of the language models in this repository have tied weights between their
``Embedding`` layer and their ``Decoding`` layer. TorchScript does not allow the export of models that have tied weights, therefore
it is necessary to untie and clone the weights beforehand.
This implies that models instantiated with the ``torchscript`` flag have their ``Embedding`` layer and ``Decoding`` layer
separate, which means that they should not be trained down the line. Training would de-synchronize the two layers,
leading to unexpected results.
This is not the case for models that do not have a Language Model head, as those do not have tied weights. These models
can be safely exported without the ``torchscript`` flag.
Dummy inputs and standard lengths
-----------------------------------------------------------------------------------------------------------------------
The dummy inputs are used to do a model forward pass. While the inputs' values are propagating through the layers,
Pytorch keeps track of the different operations executed on each tensor. These recorded operations are then used
to create the "trace" of the model.
The trace is created relatively to the inputs' dimensions. It is therefore constrained by the dimensions of the dummy
input, and will not work for any other sequence length or batch size. When trying with a different size, an error such
as:
``The expanded size of the tensor (3) must match the existing size (7) at non-singleton dimension 2``
will be raised. It is therefore recommended to trace the model with a dummy input size at least as large as the largest
input that will be fed to the model during inference. Padding can be performed to fill the missing values. As the model
will have been traced with a large input size however, the dimensions of the different matrix will be large as well,
resulting in more calculations.
It is recommended to be careful of the total number of operations done on each input and to follow performance closely
when exporting varying sequence-length models.
Using TorchScript in Python
-----------------------------------------------------------------------------------------------------------------------
Below is an example, showing how to save, load models as well as how to use the trace for inference.
Saving a model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This snippet shows how to use TorchScript to export a ``BertModel``. Here the ``BertModel`` is instantiated
according to a ``BertConfig`` class and then saved to disk under the filename ``traced_bert.pt``
.. code-block:: python
from transformers import BertModel, BertTokenizer, BertConfig
import torch
enc = BertTokenizer.from_pretrained("bert-base-uncased")
# Tokenizing input text
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = enc.tokenize(text)
# Masking one of the input tokens
masked_index = 8
tokenized_text[masked_index] = '[MASK]'
indexed_tokens = enc.convert_tokens_to_ids(tokenized_text)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
# Creating a dummy input
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
dummy_input = [tokens_tensor, segments_tensors]
# Initializing the model with the torchscript flag
# Flag set to True even though it is not necessary as this model does not have an LM Head.
config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768,
num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, torchscript=True)
# Instantiating the model
model = BertModel(config)
# The model needs to be in evaluation mode
model.eval()
# If you are instantiating the model with `from_pretrained` you can also easily set the TorchScript flag
model = BertModel.from_pretrained("bert-base-uncased", torchscript=True)
# Creating the trace
traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors])
torch.jit.save(traced_model, "traced_bert.pt")
Loading a model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This snippet shows how to load the ``BertModel`` that was previously saved to disk under the name ``traced_bert.pt``.
We are re-using the previously initialised ``dummy_input``.
.. code-block:: python
loaded_model = torch.jit.load("traced_bert.pt")
loaded_model.eval()
all_encoder_layers, pooled_output = loaded_model(*dummy_input)
Using a traced model for inference
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Using the traced model for inference is as simple as using its ``__call__`` dunder method:
.. code-block:: python
traced_model(tokens_tensor, segments_tensors)

View File

@@ -1,5 +1,5 @@
Summary of the tasks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This page shows the most frequent use-cases when using the library. The models available allow for many different
configurations and a great versatility in use-cases. The most simple ones are presented here, showcasing usage
@@ -15,18 +15,17 @@ checkpoints are usually pre-trained on a large corpus of data and fine-tuned on
following:
- Not all models were fine-tuned on all tasks. If you want to fine-tune a model on a specific task, you can leverage
one of the `run_$TASK.py` script in the
`examples <https://github.com/huggingface/transformers/tree/master/examples>`_ directory.
one of the `run_$TASK.py` scripts in the
`examples <https://github.com/huggingface/transformers/tree/master/examples>`__ directory.
- Fine-tuned models were fine-tuned on a specific dataset. This dataset may or may not overlap with your use-case
and domain. As mentioned previously, you may leverage the
`examples <https://github.com/huggingface/transformers/tree/master/examples>`_ scripts to fine-tune your model, or you
`examples <https://github.com/huggingface/transformers/tree/master/examples>`__ scripts to fine-tune your model, or you
may create your own training script.
In order to do an inference on a task, several mechanisms are made available by the library:
- Pipelines: very easy-to-use abstractions, which require as little as two lines of code.
- Using a model directly with a tokenizer (PyTorch/TensorFlow): the full inference using the model. Less abstraction,
but much more powerful.
- Direct model use: Less abstractions, but more flexibility and power via a direct access to a tokenizer (PyTorch/TensorFlow) and full inference capacity.
Both approaches are showcased here.
@@ -39,15 +38,16 @@ Both approaches are showcased here.
This would produce random output.
Sequence Classification
--------------------------
-----------------------------------------------------------------------------------------------------------------------
Sequence classification is the task of classifying sequences according to a given number of classes. An example
of sequence classification is the GLUE dataset, which is entirely based on that task. If you would like to fine-tune
a model on a GLUE sequence classification task, you may leverage the
`run_glue.py <https://github.com/huggingface/transformers/tree/master/examples/text-classification/run_glue.py>`_ or
`run_tf_glue.py <https://github.com/huggingface/transformers/tree/master/examples/text-classification/run_tf_glue.py>`_ scripts.
`run_glue.py <https://github.com/huggingface/transformers/tree/master/examples/text-classification/run_glue.py>`__ and
`run_pl_glue.py <https://github.com/huggingface/transformers/tree/master/examples/text-classification/run_pl_glue.py>`__ or
`run_tf_glue.py <https://github.com/huggingface/transformers/tree/master/examples/text-classification/run_tf_glue.py>`__ scripts.
Here is an example using the pipelines do to sentiment analysis: identifying if a sequence is positive or negative.
Here is an example of using pipelines to do sentiment analysis: identifying if a sequence is positive or negative.
It leverages a fine-tuned model on sst2, which is a GLUE task.
This returns a label ("POSITIVE" or "NEGATIVE") alongside a score, as follows:
@@ -70,15 +70,17 @@ This returns a label ("POSITIVE" or "NEGATIVE") alongside a score, as follows:
Here is an example of doing a sequence classification using a model to determine if two sequences are paraphrases
of each other. The process is the following:
- Instantiate a tokenizer and a model from the checkpoint name. The model is identified as a BERT model and loads it
with the weights stored in the checkpoint.
- Build a sequence from the two sentences, with the correct model-specific separators token type ids
and attention masks (:func:`~transformers.PreTrainedTokenizer.encode` and
:func:`~transformers.PreTrainedTokenizer.__call__` take care of this)
- Pass this sequence through the model so that it is classified in one of the two available classes: 0
(not a paraphrase) and 1 (is a paraphrase)
- Compute the softmax of the result to get probabilities over the classes
- Print the results
1. Instantiate a tokenizer and a model from the checkpoint name. The model is
identified as a BERT model and loads it with the weights stored in the
checkpoint.
2. Build a sequence from the two sentences, with the correct model-specific
separators token type ids and attention masks
(:func:`~transformers.PreTrainedTokenizer.encode` and
:func:`~transformers.PreTrainedTokenizer.__call__` take care of this).
3. Pass this sequence through the model so that it is classified in one of the
two available classes: 0 (not a paraphrase) and 1 (is a paraphrase).
4. Compute the softmax of the result to get probabilities over the classes.
5. Print the results.
.. code-block::
@@ -98,8 +100,8 @@ of each other. The process is the following:
>>> paraphrase = tokenizer(sequence_0, sequence_2, return_tensors="pt")
>>> not_paraphrase = tokenizer(sequence_0, sequence_1, return_tensors="pt")
>>> paraphrase_classification_logits = model(**paraphrase)[0]
>>> not_paraphrase_classification_logits = model(**not_paraphrase)[0]
>>> paraphrase_classification_logits = model(**paraphrase).logits
>>> not_paraphrase_classification_logits = model(**not_paraphrase).logits
>>> paraphrase_results = torch.softmax(paraphrase_classification_logits, dim=1).tolist()[0]
>>> not_paraphrase_results = torch.softmax(not_paraphrase_classification_logits, dim=1).tolist()[0]
@@ -150,13 +152,16 @@ of each other. The process is the following:
is paraphrase: 6%
Extractive Question Answering
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the `run_squad.py`.
a model on a SQuAD task, you may leverage the
`run_squad.py <https://github.com/huggingface/transformers/tree/master/examples/question-answering/run_squad.py>`__ and
`run_tf_squad.py <https://github.com/huggingface/transformers/tree/master/examples/question-answering/run_tf_squad.py>`__ scripts.
Here is an example using the pipelines do to question answering: extracting an answer from a text given a question.
Here is an example of using pipelines to do question answering: extracting an answer from a text given a question.
It leverages a fine-tuned model on SQuAD.
.. code-block::
@@ -171,7 +176,7 @@ It leverages a fine-tuned model on SQuAD.
... a model on a SQuAD task, you may leverage the examples/question-answering/run_squad.py script.
... """
This returns an answer extracted from the text, a confidence score, alongside "start" and "end" values which
This returns an answer extracted from the text, a confidence score, alongside "start" and "end" values, which
are the positions of the extracted answer in the text.
.. code-block::
@@ -187,16 +192,19 @@ are the positions of the extracted answer in the text.
Here is an example of question answering using a model and a tokenizer. The process is the following:
- Instantiate a tokenizer and a model from the checkpoint name. The model is identified as a BERT model and loads it
with the weights stored in the checkpoint.
- Define a text and a few questions.
- Iterate over the questions and build a sequence from the text and the current question, with the correct
model-specific separators token type ids and attention masks
- Pass this sequence through the model. This outputs a range of scores across the entire sequence tokens (question and
text), for both the start and end positions.
- Compute the softmax of the result to get probabilities over the tokens
- Fetch the tokens from the identified start and stop values, convert those tokens to a string.
- Print the results
1. Instantiate a tokenizer and a model from the checkpoint name. The model is
identified as a BERT model and loads it with the weights stored in the
checkpoint.
2. Define a text and a few questions.
3. Iterate over the questions and build a sequence from the text and the current
question, with the correct model-specific separators token type ids and
attention masks.
4. Pass this sequence through the model. This outputs a range of scores across
the entire sequence tokens (question and text), for both the start and end
positions.
5. Compute the softmax of the result to get probabilities over the tokens.
6. Fetch the tokens from the identified start and stop values, convert those tokens to a string.
7. Print the results.
.. code-block::
@@ -289,10 +297,10 @@ Here is an example of question answering using a model and a tokenizer. The proc
Language Modeling
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Language modeling is the task of fitting a model to a corpus, which can be domain specific. All popular transformer
based models are trained using a variant of language modeling, e.g. BERT with masked language modeling, GPT-2 with
Language modeling is the task of fitting a model to a corpus, which can be domain specific. All popular transformer-based
models are trained using a variant of language modeling, e.g. BERT with masked language modeling, GPT-2 with
causal language modeling.
Language modeling can be useful outside of pre-training as well, for example to shift the model distribution to be
@@ -300,12 +308,12 @@ domain-specific: using a language model trained over a very large corpus, and th
or on scientific papers e.g. `LysandreJik/arxiv-nlp <https://huggingface.co/lysandre/arxiv-nlp>`__.
Masked Language Modeling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Masked language modeling is the task of masking tokens in a sequence with a masking token, and prompting the model to
fill that mask with an appropriate token. This allows the model to attend to both the right context (tokens on the
right of the mask) and the left context (tokens on the left of the mask). Such a training creates a strong basis
for downstream tasks requiring bi-directional context such as SQuAD (question answering,
for downstream tasks, requiring bi-directional context such as SQuAD (question answering,
see `Lewis, Lui, Goyal et al. <https://arxiv.org/abs/1910.13461>`__, part 4.2).
Here is an example of using pipelines to replace a mask from a sequence:
@@ -316,7 +324,7 @@ Here is an example of using pipelines to replace a mask from a sequence:
>>> nlp = pipeline("fill-mask")
This outputs the sequences with the mask filled, the confidence score as well as the token id in the tokenizer
This outputs the sequences with the mask filled, the confidence score, and the token id in the tokenizer
vocabulary:
.. code-block::
@@ -349,17 +357,19 @@ vocabulary:
'token': 17715,
'token_str': 'Ġprototype'}]
Here is an example doing masked language modeling using a model and a tokenizer. The process is the following:
Here is an example of doing masked language modeling using a model and a tokenizer. The process is the following:
- Instantiate a tokenizer and a model from the checkpoint name. The model is identified as a DistilBERT model and
loads it with the weights stored in the checkpoint.
- Define a sequence with a masked token, placing the :obj:`tokenizer.mask_token` instead of a word.
- Encode that sequence into IDs and find the position of the masked token in that list of IDs.
- Retrieve the predictions at the index of the mask token: this tensor has the same size as the vocabulary, and the
values are the scores attributed to each token. The model gives higher score to tokens he deems probable in that
context.
- Retrieve the top 5 tokens using the PyTorch :obj:`topk` or TensorFlow :obj:`top_k` methods.
- Replace the mask token by the tokens and print the results
1. Instantiate a tokenizer and a model from the checkpoint name. The model is
identified as a DistilBERT model and loads it with the weights stored in the
checkpoint.
2. Define a sequence with a masked token, placing the :obj:`tokenizer.mask_token` instead of a word.
3. Encode that sequence into a list of IDs and find the position of the masked token in that list.
4. Retrieve the predictions at the index of the mask token: this tensor has the
same size as the vocabulary, and the values are the scores attributed to each
token. The model gives higher score to tokens it deems probable in that
context.
5. Retrieve the top 5 tokens using the PyTorch :obj:`topk` or TensorFlow :obj:`top_k` methods.
6. Replace the mask token by the tokens and print the results
.. code-block::
@@ -375,7 +385,7 @@ Here is an example doing masked language modeling using a model and a tokenizer.
>>> input = tokenizer.encode(sequence, return_tensors="pt")
>>> mask_token_index = torch.where(input == tokenizer.mask_token_id)[1]
>>> token_logits = model(input)[0]
>>> token_logits = model(input).logits
>>> mask_token_logits = token_logits[0, mask_token_index, :]
>>> top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist()
@@ -411,7 +421,7 @@ This prints five sequences, with the top 5 tokens predicted by the model:
Causal Language Modeling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Causal language modeling is the task of predicting the token following a sequence of tokens. In this situation, the
model only attends to the left context (tokens on the left of the mask). Such a training is particularly interesting
@@ -419,7 +429,7 @@ for generation tasks.
Usually, the next token is predicted by sampling from the logits of the last hidden state the model produces from the input sequence.
Here is an example using the tokenizer and model and leveraging the :func:`~transformers.PreTrainedModel.top_k_top_p_filtering` method to sample the next token following an input sequence of tokens.
Here is an example of using the tokenizer and model and leveraging the :func:`~transformers.PreTrainedModel.top_k_top_p_filtering` method to sample the next token following an input sequence of tokens.
.. code-block::
@@ -436,7 +446,7 @@ Here is an example using the tokenizer and model and leveraging the :func:`~tran
>>> input_ids = tokenizer.encode(sequence, return_tensors="pt")
>>> # get logits of last hidden state
>>> next_token_logits = model(input_ids)[0][:, -1, :]
>>> next_token_logits = model(input_ids).logits[:, -1, :]
>>> # filter
>>> filtered_next_token_logits = top_k_top_p_filtering(next_token_logits, top_k=50, top_p=1.0)
@@ -473,19 +483,19 @@ Here is an example using the tokenizer and model and leveraging the :func:`~tran
>>> resulting_string = tokenizer.decode(generated.numpy().tolist()[0])
This outputs a (hopefully) coherent next token following the original sequence, which is in our case is the word *has*:
This outputs a (hopefully) coherent next token following the original sequence, which in our case is the word *has*:
.. code-block::
print(resulting_string)
>>> print(resulting_string)
Hugging Face is based in DUMBO, New York City, and has
In the next section, we show how this functionality is leveraged in :func:`~transformers.PreTrainedModel.generate` to generate multiple tokens up to a user-defined length.
Text Generation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In text generation (*a.k.a* *open-ended text generation*) the goal is to create a coherent portion of text that is a continuation from the given context. As an example, is it shown how *GPT-2* can be used in pipelines to generate text. As a default all models apply *Top-K* sampling when used in pipelines as configured in their respective configurations (see `gpt-2 config <https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-config.json>`_ for example).
In text generation (*a.k.a* *open-ended text generation*) the goal is to create a coherent portion of text that is a continuation from the given context. The following example shows how *GPT-2* can be used in pipelines to generate text. As a default all models apply *Top-K* sampling when used in pipelines, as configured in their respective configurations (see `gpt-2 config <https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-config.json>`__ for example).
.. code-block::
@@ -497,10 +507,10 @@ In text generation (*a.k.a* *open-ended text generation*) the goal is to create
Here the model generates a random text with a total maximal length of *50* tokens from context *"As far as I am concerned, I will"*.
The default arguments of ``PreTrainedModel.generate()`` can directly be overriden in the pipeline as is shown above for the argument ``max_length``.
Here, the model generates a random text with a total maximal length of *50* tokens from context *"As far as I am concerned, I will"*.
The default arguments of ``PreTrainedModel.generate()`` can be directly overriden in the pipeline, as is shown above for the argument ``max_length``.
Here is an example for text generation using XLNet and its tokenzier.
Here is an example of text generation using ``XLNet`` and its tokenzier.
.. code-block::
@@ -556,24 +566,27 @@ Here is an example for text generation using XLNet and its tokenzier.
.. code-block::
print(generated)
>>> print(generated)
Today the weather is really nice and I am planning on anning on taking a nice...... of a great time!<eop>...............
Text generation is currently possible with *GPT-2*, *OpenAi-GPT*, *CTRL*, *XLNet*, *Transfo-XL* and *Reformer* in PyTorch and for most models in Tensorflow as well. As can be seen in the example above *XLNet* and *Transfo-xl* often need to be padded to work well.
GPT-2 is usually a good choice for *open-ended text generation* because it was trained on millions on webpages with a causal language modeling objective.
Text generation is currently possible with *GPT-2*, *OpenAi-GPT*, *CTRL*, *XLNet*, *Transfo-XL* and *Reformer* in PyTorch and for most models in Tensorflow as well. As can be seen in the example above *XLNet* and *Transfo-XL* often need to be padded to work well.
GPT-2 is usually a good choice for *open-ended text generation* because it was trained on millions of webpages with a causal language modeling objective.
For more information on how to apply different decoding strategies for text generation, please also refer to our generation blog post `here <https://huggingface.co/blog/how-to-generate>`_.
For more information on how to apply different decoding strategies for text generation, please also refer to our text generation blog post `here <https://huggingface.co/blog/how-to-generate>`__.
Named Entity Recognition
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Named Entity Recognition (NER) is the task of classifying tokens according to a class, for example identifying a
Named Entity Recognition (NER) is the task of classifying tokens according to a class, for example, identifying a
token as a person, an organisation or a location.
An example of a named entity recognition dataset is the CoNLL-2003 dataset, which is entirely based on that task.
If you would like to fine-tune a model on an NER task, you may leverage the `ner/run_ner.py` (PyTorch),
`ner/run_pl_ner.py` (leveraging pytorch-lightning) or the `ner/run_tf_ner.py` (TensorFlow) scripts.
If you would like to fine-tune a model on an NER task, you may leverage the
`run_ner.py <https://github.com/huggingface/transformers/tree/master/examples/token-classification/run_ner.py>`__ (PyTorch),
`run_pl_ner.py <https://github.com/huggingface/transformers/tree/master/examples/token-classification/run_pl_ner.py>`__ (leveraging pytorch-lightning) or the
`run_tf_ner.py <https://github.com/huggingface/transformers/tree/master/examples/token-classification/run_tf_ner.py>`__ (TensorFlow) scripts.
Here is an example using the pipelines do to named entity recognition, trying to identify tokens as belonging to one
Here is an example of using pipelines to do named entity recognition, specifically, trying to identify tokens as belonging to one
of 9 classes:
- O, Outside of a named entity
@@ -599,13 +612,12 @@ It leverages a fine-tuned model on CoNLL-2003, fine-tuned by `@stefan-it <https:
... "close to the Manhattan Bridge which is visible from the window."
This outputs a list of all words that have been identified as an entity from the 9 classes defined above. Here is the
This outputs a list of all words that have been identified as one of the entities from the 9 classes defined above. Here are the
expected results:
.. code-block::
print(nlp(sequence))
>>> print(nlp(sequence))
[
{'word': 'Hu', 'score': 0.9995632767677307, 'entity': 'I-ORG'},
{'word': '##gging', 'score': 0.9915938973426819, 'entity': 'I-ORG'},
@@ -621,22 +633,25 @@ expected results:
{'word': 'Bridge', 'score': 0.990249514579773, 'entity': 'I-LOC'}
]
Note how the words "Hugging Face" have been identified as an organisation, and "New York City", "DUMBO" and
Note, how the tokens of the sequence "Hugging Face" have been identified as an organisation, and "New York City", "DUMBO" and
"Manhattan Bridge" have been identified as locations.
Here is an example doing named entity recognition using a model and a tokenizer. The process is the following:
Here is an example of doing named entity recognition, using a model and a tokenizer. The process is the following:
- Instantiate a tokenizer and a model from the checkpoint name. The model is identified as a BERT model and
loads it with the weights stored in the checkpoint.
- Define the label list with which the model was trained on.
- Define a sequence with known entities, such as "Hugging Face" as an organisation and "New York City" as a location.
- Split words into tokens so that they can be mapped to the predictions. We use a small hack by firstly completely
encoding and decoding the sequence, so that we're left with a string that contains the special tokens.
- Encode that sequence into IDs (special tokens are added automatically).
- Retrieve the predictions by passing the input to the model and getting the first output. This results in a
distribution over the 9 possible classes for each token. We take the argmax to retrieve the most likely class
for each token.
- Zip together each token with its prediction and print it.
1. Instantiate a tokenizer and a model from the checkpoint name. The model is
identified as a BERT model and loads it with the weights stored in the
checkpoint.
2. Define the label list with which the model was trained on.
3. Define a sequence with known entities, such as "Hugging Face" as an organisation and "New York City" as a location.
4. Split words into tokens so that they can be mapped to predictions. We use a
small hack by, first, completely encoding and decoding the sequence, so that
we're left with a string that contains the special tokens.
5. Encode that sequence into IDs (special tokens are added automatically).
6. Retrieve the predictions by passing the input to the model and getting the
first output. This results in a distribution over the 9 possible classes for
each token. We take the argmax to retrieve the most likely class for each
token.
7. Zip together each token with its prediction and print it.
.. code-block::
@@ -666,7 +681,7 @@ Here is an example doing named entity recognition using a model and a tokenizer.
>>> tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(sequence)))
>>> inputs = tokenizer.encode(sequence, return_tensors="pt")
>>> outputs = model(inputs)[0]
>>> outputs = model(inputs).logits
>>> predictions = torch.argmax(outputs, dim=2)
>>> ## TENSORFLOW CODE
>>> from transformers import TFAutoModelForTokenClassification, AutoTokenizer
@@ -698,8 +713,8 @@ Here is an example doing named entity recognition using a model and a tokenizer.
>>> predictions = tf.argmax(outputs, axis=2)
This outputs a list of each token mapped to their prediction. Differently from the pipeline, here every token has
a prediction as we didn't remove the "0" class which means that no particular entity was found on that token. The
This outputs a list of each token mapped to its corresponding prediction. Differently from the pipeline, here every token has
a prediction as we didn't remove the "0"th class, which means that no particular entity was found on that token. The
following array should be the output:
.. code-block::
@@ -708,15 +723,15 @@ following array should be the output:
[('[CLS]', 'O'), ('Hu', 'I-ORG'), ('##gging', 'I-ORG'), ('Face', 'I-ORG'), ('Inc', 'I-ORG'), ('.', 'O'), ('is', 'O'), ('a', 'O'), ('company', 'O'), ('based', 'O'), ('in', 'O'), ('New', 'I-LOC'), ('York', 'I-LOC'), ('City', 'I-LOC'), ('.', 'O'), ('Its', 'O'), ('headquarters', 'O'), ('are', 'O'), ('in', 'O'), ('D', 'I-LOC'), ('##UM', 'I-LOC'), ('##BO', 'I-LOC'), (',', 'O'), ('therefore', 'O'), ('very', 'O'), ('##c', 'O'), ('##lose', 'O'), ('to', 'O'), ('the', 'O'), ('Manhattan', 'I-LOC'), ('Bridge', 'I-LOC'), ('.', 'O'), ('[SEP]', 'O')]
Summarization
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Summarization is the task of summarizing a text / an article into a shorter text.
Summarization is the task of summarizing a document or an article into a shorter text.
An example of a summarization dataset is the CNN / Daily Mail dataset, which consists of long news articles and was created for the task of summarization.
If you would like to fine-tune a model on a summarization task, you may leverage the ``examples/summarization/bart/run_train.sh`` (leveraging pytorch-lightning) script.
If you would like to fine-tune a model on a summarization task, various approaches are described in this
`document <https://github.com/huggingface/transformers/blob/master/examples/seq2seq/README.md>`__.
Here is an example using the pipelines do to summarization.
It leverages a Bart model that was fine-tuned on the CNN / Daily Mail data set.
Here is an example of using the pipelines to do summarization. It leverages a Bart model that was fine-tuned on the CNN / Daily Mail data set.
.. code-block::
@@ -743,8 +758,8 @@ It leverages a Bart model that was fine-tuned on the CNN / Daily Mail data set.
... If convicted, Barrientos faces up to four years in prison. Her next court appearance is scheduled for May 18.
... """
Because the summarization pipeline depends on the ``PretrainedModel.generate()`` method, we can override the default arguments
of ``PretrainedModel.generate()`` directly in the pipeline as is shown for ``max_length`` and ``min_length`` above.
Because the summarization pipeline depends on the ``PretrainedModel.generate()`` method, we can override the default arguments
of ``PretrainedModel.generate()`` directly in the pipeline for ``max_length`` and ``min_length`` as shown below.
This outputs the following summary:
.. code-block::
@@ -752,14 +767,15 @@ This outputs the following summary:
>>> print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False))
[{'summary_text': 'Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002. She is believed to still be married to four men.'}]
Here is an example doing summarization using a model and a tokenizer. The process is the following:
Here is an example of doing summarization using a model and a tokenizer. The process is the following:
- Instantiate a tokenizer and a model from the checkpoint name. Summarization is usually done using an encoder-decoder model, such as ``Bart`` or ``T5``.
- Define the article that should be summarizaed.
- Leverage the ``PretrainedModel.generate()`` method.
- Add the T5 specific prefix "summarize: ".
1. Instantiate a tokenizer and a model from the checkpoint name. Summarization is usually done using an encoder-decoder model, such as ``Bart`` or ``T5``.
2. Define the article that should be summarized.
3. Add the T5 specific prefix "summarize: ".
4. Use the ``PretrainedModel.generate()`` method to generate the summary.
In this example we use Google`s T5 model. Even though it was pre-trained only on a multi-task mixed dataset (including CNN / Daily Mail), it yields very good results.
Here Google`s T5 model is used that was only pre-trained on a multi-task mixed data set (including CNN / Daily Mail), but nevertheless yields very good results.
.. code-block::
>>> ## PYTORCH CODE
@@ -782,16 +798,18 @@ Here Google`s T5 model is used that was only pre-trained on a multi-task mixed d
>>> outputs = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
Translation
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Translation is the task of translating a text from one language to another.
An example of a translation dataset is the WMT English to German dataset, which has English sentences as the input data
and German sentences as the target data.
An example of a translation dataset is the WMT English to German dataset, which has sentences in English as the input data
and the corresponding sentences in German as the target data.
If you would like to fine-tune a model on a translation task, various approaches are described in this
`document <https://github.com/huggingface/transformers/blob/master/examples/seq2seq/README.md>`__.
Here is an example using the pipelines do to translation.
It leverages a T5 model that was only pre-trained on a multi-task mixture dataset (including WMT), but yields impressive
translation results nevertheless.
Here is an example of using the pipelines to do translation.
It leverages a T5 model that was only pre-trained on a multi-task mixture dataset (including WMT), yet, yielding impressive
translation results.
.. code-block::
@@ -801,20 +819,15 @@ translation results nevertheless.
>>> print(translator("Hugging Face is a technology company based in New York and Paris", max_length=40))
[{'translation_text': 'Hugging Face ist ein Technologieunternehmen mit Sitz in New York und Paris.'}]
Because the translation pipeline depends on the ``PretrainedModel.generate()`` method, we can override the default arguments
Because the translation pipeline depends on the ``PretrainedModel.generate()`` method, we can override the default arguments
of ``PretrainedModel.generate()`` directly in the pipeline as is shown for ``max_length`` above.
This outputs the following translation into German:
::
Here is an example of doing translation using a model and a tokenizer. The process is the following:
Hugging Face ist ein Technologieunternehmen mit Sitz in New York und Paris.
Here is an example doing translation using a model and a tokenizer. The process is the following:
- Instantiate a tokenizer and a model from the checkpoint name. Summarization is usually done using an encoder-decoder model, such as ``Bart`` or ``T5``.
- Define the article that should be summarizaed.
- Leverage the ``PretrainedModel.generate()`` method.
- Add the T5 specific prefix "translate English to German: "
1. Instantiate a tokenizer and a model from the checkpoint name. Summarization is usually done using an encoder-decoder model, such as ``Bart`` or ``T5``.
2. Define the article that should be summarizaed.
3. Add the T5 specific prefix "translate English to German: "
4. Use the ``PretrainedModel.generate()`` method to perform the translation.
.. code-block::
@@ -826,10 +839,6 @@ Here is an example doing translation using a model and a tokenizer. The process
>>> inputs = tokenizer.encode("translate English to German: Hugging Face is a technology company based in New York and Paris", return_tensors="pt")
>>> outputs = model.generate(inputs, max_length=40, num_beams=4, early_stopping=True)
>>> print(outputs)
tensor([[ 0, 11560, 3896, 8881, 229, 236, 3, 14366, 15377, 181,
11216, 16, 368, 1060, 64, 1919, 5]])
>>> ## TENSORFLOW CODE
>>> from transformers import TFAutoModelWithLMHead, AutoTokenizer
@@ -839,7 +848,9 @@ Here is an example doing translation using a model and a tokenizer. The process
>>> inputs = tokenizer.encode("translate English to German: Hugging Face is a technology company based in New York and Paris", return_tensors="tf")
>>> outputs = model.generate(inputs, max_length=40, num_beams=4, early_stopping=True)
>>> print(outputs)
tf.Tensor(
[[ 0 11560 3896 8881 229 236 3 14366 15377 181 11216 16
368 1060 64 1919 5]], shape=(1, 17), dtype=int32)
As with the pipeline example, we get the same translation:
.. code-block::
>>> print(tokenizer.decode(outputs[0]))
Hugging Face ist ein Technologieunternehmen mit Sitz in New York und Paris.

953
docs/source/testing.rst Normal file
View File

@@ -0,0 +1,953 @@
Testing
=======================================================================================================================
Let's take a look at how 🤗 Transformer models are tested and how you can write new tests and improve the existing ones.
There are 2 test suites in the repository:
1. ``tests`` -- tests for the general API
2. ``examples`` -- tests primarily for various applications that aren't part of the API
How transformers are tested
-----------------------------------------------------------------------------------------------------------------------
1. Once a PR is submitted it gets tested with 9 CircleCi jobs. Every new commit to that PR gets retested. These jobs are defined in this `config file <https://github.com/huggingface/transformers/blob/master/.circleci/config.yml>`__, so that if needed you can reproduce the same environment on your machine.
These CI jobs don't run ``@slow`` tests.
2. There are 3 jobs run by `github actions <https://github.com/huggingface/transformers/actions>`__:
* `torch hub integration <https://github.com/huggingface/transformers/blob/master/.github/workflows/github-torch-hub.yml>`__: checks whether torch hub integration works.
* `self-hosted (push) <https://github.com/huggingface/transformers/blob/master/.github/workflows/self-push.yml>`__: runs fast tests on GPU only on commits on ``master``. It only runs if a commit on ``master`` has updated the code in one of the following folders: ``src``, ``tests``, ``.github`` (to prevent running on added model cards, notebooks, etc.)
* `self-hosted runner <https://github.com/huggingface/transformers/blob/master/.github/workflows/self-scheduled.yml>`__: runs slow tests on ``tests`` and ``examples``:
.. code-block:: bash
RUN_SLOW=1 USE_CUDA=1 pytest tests/
RUN_SLOW=1 USE_CUDA=1 pytest examples/
The results can be observed `here <https://github.com/huggingface/transformers/actions>`__.
Running tests
-----------------------------------------------------------------------------------------------------------------------
Choosing which tests to run
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This document goes into many details of how tests can be run. If after reading everything, you need even more details you will find them `here <https://docs.pytest.org/en/latest/usage.html>`__.
Here are some most useful ways of running tests.
Run all:
.. code-block:: console
pytest
or:
.. code-block:: bash
make test
Note that the latter is defined as:
.. code-block:: bash
python -m pytest -n auto --dist=loadfile -s -v ./tests/
which tells pytest to:
* run as many test processes as they are CPU cores (which could be too many if you don't have a ton of RAM!)
* ensure that all tests from the same file will be run by the same test process
* do not capture output
* run in verbose mode
Getting the list of all tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All tests of the test suite:
.. code-block:: bash
pytest --collect-only -q
All tests of a given test file:
.. code-block:: bash
pytest tests/test_optimization.py --collect-only -q
Run a specific test module
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To run an individual test module:
.. code-block:: bash
pytest tests/test_logging.py
Run specific tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Since unittest is used inside most of the tests, to run specific subtests you need to know the name of the unittest class containing those tests. For example, it could be:
.. code-block:: bash
pytest tests/test_optimization.py::OptimizationTest::test_adam_w
Here:
* ``tests/test_optimization.py`` - the file with tests
* ``OptimizationTest`` - the name of the class
* ``test_adam_w`` - the name of the specific test function
If the file contains multiple classes, you can choose to run only tests of a given class. For example:
.. code-block:: bash
pytest tests/test_optimization.py::OptimizationTest
will run all the tests inside that class.
As mentioned earlier you can see what tests are contained inside the ``OptimizationTest`` class by running:
.. code-block:: bash
pytest tests/test_optimization.py::OptimizationTest --collect-only -q
You can run tests by keyword expressions.
To run only tests whose name contains ``adam``:
.. code-block:: bash
pytest -k adam tests/test_optimization.py
To run all tests except those whose name contains ``adam``:
.. code-block:: bash
pytest -k "not adam" tests/test_optimization.py
And you can combine the two patterns in one:
.. code-block:: bash
pytest -k "ada and not adam" tests/test_optimization.py
Run only modified tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can run the tests related to the unstaged files or the current branch (according to Git) by using `pytest-picked <https://github.com/anapaulagomes/pytest-picked>`__. This is a great way of quickly testing your changes didn't break anything, since it won't run the tests related to files you didn't touch.
.. code-block:: bash
pip install pytest-picked
.. code-block:: bash
pytest --picked
All tests will be run from files and folders which are modified, but not
yet committed.
Automatically rerun failed tests on source modification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`pytest-xdist <https://github.com/pytest-dev/pytest-xdist>`__ provides a
very useful feature of detecting all failed tests, and then waiting for
you to modify files and continuously re-rerun those failing tests until
they pass while you fix them. So that you don't need to re start pytest
after you made the fix. This is repeated until all tests pass after
which again a full run is performed.
.. code-block:: bash
pip install pytest-xdist
To enter the mode: ``pytest -f`` or ``pytest --looponfail``
File changes are detected by looking at ``looponfailroots`` root
directories and all of their contents (recursively). If the default for
this value does not work for you, you can change it in your project by
setting a configuration option in ``setup.cfg``:
.. code-block:: ini
[tool:pytest]
looponfailroots = transformers tests
or ``pytest.ini``/``tox.ini`` files:
.. code-block:: ini
[pytest]
looponfailroots = transformers tests
This would lead to only looking for file changes in the respective
directories, specified relatively to the ini-files directory.
`pytest-watch <https://github.com/joeyespo/pytest-watch>`__ is an
alternative implementation of this functionality.
Skip a test module
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you want to run all test modules, except a few you can exclude them by giving an explicit list of tests to run. For example, to run all except ``test_modeling_*.py`` tests:
.. code-block:: bash
pytest `ls -1 tests/*py | grep -v test_modeling`
Clearing state
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CI builds and when isolation is important (against speed), cache should
be cleared:
.. code-block:: bash
pytest --cache-clear tests
Running tests in parallel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As mentioned earlier ``make test`` runs tests in parallel via ``pytest-xdist`` plugin (``-n X`` argument, e.g. ``-n 2`` to run 2 parallel jobs).
``pytest-xdist``'s ``--dist=`` option allows one to control how the tests are grouped. ``--dist=loadfile`` puts the tests located in one file onto the same process.
Since the order of executed tests is different and unpredictable, if
running the test suite with ``pytest-xdist`` produces failures (meaning
we have some undetected coupled tests), use
`pytest-replay <https://github.com/ESSS/pytest-replay>`__ to replay the
tests in the same order, which should help with then somehow reducing
that failing sequence to a minimum.
Test order and repetition
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It's good to repeat the tests several times, in sequence, randomly, or
in sets, to detect any potential inter-dependency and state-related bugs
(tear down). And the straightforward multiple repetition is just good to
detect some problems that get uncovered by randomness of DL.
Repeat tests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* `pytest-flakefinder <https://github.com/dropbox/pytest-flakefinder>`__:
.. code-block:: bash
pip install pytest-flakefinder
And then run every test multiple times (50 by default):
.. code-block:: bash
pytest --flake-finder --flake-runs=5 tests/test_failing_test.py
.. note::
This plugin doesn't work with ``-n`` flag from ``pytest-xdist``.
.. note::
There is another plugin ``pytest-repeat``, but it doesn't work with ``unittest``.
Run tests in a random order
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: bash
pip install pytest-random-order
Important: the presence of ``pytest-random-order`` will automatically
randomize tests, no configuration change or command line options is
required.
As explained earlier this allows detection of coupled tests - where one
test's state affects the state of another. When ``pytest-random-order``
is installed it will print the random seed it used for that session,
e.g:
.. code-block:: bash
pytest tests
[...]
Using --random-order-bucket=module
Using --random-order-seed=573663
So that if the given particular sequence fails, you can reproduce it by
adding that exact seed, e.g.:
.. code-block:: bash
pytest --random-order-seed=573663
[...]
Using --random-order-bucket=module
Using --random-order-seed=573663
It will only reproduce the exact order if you use the exact same list of
tests (or no list at all). Once you start to manually narrowing
down the list you can no longer rely on the seed, but have to list them
manually in the exact order they failed and tell pytest to not randomize
them instead using ``--random-order-bucket=none``, e.g.:
.. code-block:: bash
pytest --random-order-bucket=none tests/test_a.py tests/test_c.py tests/test_b.py
To disable the shuffling for all tests:
.. code-block:: bash
pytest --random-order-bucket=none
By default ``--random-order-bucket=module`` is implied, which will
shuffle the files on the module levels. It can also shuffle on
``class``, ``package``, ``global`` and ``none`` levels. For the complete
details please see its `documentation <https://github.com/jbasko/pytest-random-order>`__.
Another randomization alternative is: ``pytest-randomly`` <https://github.com/pytest-dev/pytest-randomly>`__. This module has a very similar functionality/interface, but it doesn't have the bucket modes available in ``pytest-random-order``. It has the same problem of imposing itself once installed.
Look and feel variations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pytest-sugar
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
`pytest-sugar <https://github.com/Frozenball/pytest-sugar>`__ is a
plugin that improves the look-n-feel, adds a progressbar, and show tests
that fail and the assert instantly. It gets activated automatically upon
installation.
.. code-block:: bash
pip install pytest-sugar
To run tests without it, run:
.. code-block:: bash
pytest -p no:sugar
or uninstall it.
Report each sub-test name and its progress
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For a single or a group of tests via ``pytest`` (after
``pip install pytest-pspec``):
.. code-block:: bash
pytest --pspec tests/test_optimization.py
Instantly shows failed tests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
`pytest-instafail <https://github.com/pytest-dev/pytest-instafail>`__
shows failures and errors instantly instead of waiting until the end of
test session.
.. code-block:: bash
pip install pytest-instafail
.. code-block:: bash
pytest --instafail
To GPU or not to GPU
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On a GPU-enabled setup, to test in CPU-only mode add ``CUDA_VISIBLE_DEVICES=""``:
.. code-block:: bash
CUDA_VISIBLE_DEVICES="" pytest tests/test_logging.py
or if you have multiple gpus, you can tell which one to use in this test session, e.g. to use only the second gpu if you have gpus ``0`` and ``1``, you can run:
.. code-block:: bash
CUDA_VISIBLE_DEVICES="1" pytest tests/test_logging.py
This is handy when you want to run different tasks on different GPUs.
And we have these decorators that require the condition described by the marker.
``
@require_torch
@require_tf
@require_multigpu
@require_non_multigpu
@require_torch_tpu
@require_torch_and_cuda
``
Some decorators like ``@parametrized`` rewrite test names, therefore ``@require_*`` skip decorators have to be listed last for them to work correctly. Here is an example of the correct usage:
.. code-block:: python
@parameterized.expand(...)
@require_multigpu
def test_integration_foo():
There is no problem whatsoever with ``@pytest.mark.parametrize`` (but it only works with non-unittests) - can use it in any order.
This section will be expanded soon once our work in progress on those decorators is finished.
Inside tests:
* How many GPUs are available:
.. code-block:: bash
torch.cuda.device_count()
Output capture
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
During test execution any output sent to ``stdout`` and ``stderr`` is
captured. If a test or a setup method fails, its according captured
output will usually be shown along with the failure traceback.
To disable output capturing and to get the ``stdout`` and ``stderr``
normally, use ``-s`` or ``--capture=no``:
.. code-block:: bash
pytest -s tests/test_logging.py
To send test results to JUnit format output:
.. code-block:: bash
py.test tests --junitxml=result.xml
Color control
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To have no color (e.g., yellow on white background is not readable):
.. code-block:: bash
pytest --color=no tests/test_logging.py
Sending test report to online pastebin service
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Creating a URL for each test failure:
.. code-block:: bash
pytest --pastebin=failed tests/test_logging.py
This will submit test run information to a remote Paste service and
provide a URL for each failure. You may select tests as usual or add for
example -x if you only want to send one particular failure.
Creating a URL for a whole test session log:
.. code-block:: bash
pytest --pastebin=all tests/test_logging.py
Writing tests
-----------------------------------------------------------------------------------------------------------------------
🤗 transformers tests are based on ``unittest``, but run by ``pytest``, so most of the time features from both systems can be used.
You can read `here <https://docs.pytest.org/en/stable/unittest.html>`__ which features are supported, but the important thing to remember is that most ``pytest`` fixtures don't work. Neither parametrization, but we use the module ``parameterized`` that works in a similar way.
Parametrization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Often, there is a need to run the same test multiple times, but with different arguments. It could be done from within the test, but then there is no way of running that test for just one set of arguments.
.. code-block:: python
# test_this1.py
import unittest
from parameterized import parameterized
class TestMathUnitTest(unittest.TestCase):
@parameterized.expand([
("negative", -1.5, -2.0),
("integer", 1, 1.0),
("large fraction", 1.6, 1),
])
def test_floor(self, name, input, expected):
assert_equal(math.floor(input), expected)
Now, by default this test will be run 3 times, each time with the last 3 arguments of ``test_floor`` being assigned the corresponding arguments in the parameter list.
and you could run just the ``negative`` and ``integer`` sets of params with:
.. code-block:: bash
pytest -k "negative and integer" tests/test_mytest.py
or all but ``negative`` sub-tests, with:
.. code-block:: bash
pytest -k "not negative" tests/test_mytest.py
Besides using the ``-k`` filter that was just mentioned, you can find out the exact name of each sub-test and run any or all of them using their exact names.
.. code-block:: bash
pytest test_this1.py --collect-only -q
and it will list:
.. code-block:: bash
test_this1.py::TestMathUnitTest::test_floor_0_negative
test_this1.py::TestMathUnitTest::test_floor_1_integer
test_this1.py::TestMathUnitTest::test_floor_2_large_fraction
So now you can run just 2 specific sub-tests:
.. code-block:: bash
pytest test_this1.py::TestMathUnitTest::test_floor_0_negative test_this1.py::TestMathUnitTest::test_floor_1_integer
The module `parameterized <https://pypi.org/project/parameterized/>`__ which is already in the developer dependencies of ``transformers`` works for both: ``unittests`` and ``pytest`` tests.
If, however, the test is not a ``unittest``, you may use ``pytest.mark.parametrize`` (or you may see it being used in some existing tests, mostly under ``examples``).
Here is the same example, this time using ``pytest``'s ``parametrize`` marker:
.. code-block:: python
# test_this2.py
import pytest
@pytest.mark.parametrize(
"name, input, expected",
[
("negative", -1.5, -2.0),
("integer", 1, 1.0),
("large fraction", 1.6, 1),
],
)
def test_floor(name, input, expected):
assert_equal(math.floor(input), expected)
Same as with ``parameterized``, with ``pytest.mark.parametrize`` you can have a fine control over which sub-tests are run, if the ``-k`` filter doesn't do the job. Except, this parametrization function creates a slightly different set of names for the sub-tests. Here is what they look like:
.. code-block:: bash
pytest test_this2.py --collect-only -q
and it will list:
.. code-block:: bash
test_this2.py::test_floor[integer-1-1.0]
test_this2.py::test_floor[negative--1.5--2.0]
test_this2.py::test_floor[large fraction-1.6-1]
So now you can run just the specific test:
.. code-block:: bash
pytest test_this2.py::test_floor[negative--1.5--2.0] test_this2.py::test_floor[integer-1-1.0]
as in the previous example.
Temporary files and directories
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Using unique temporary files and directories are essential for parallel test running, so that the tests won't overwrite each other's data. Also we want to get the temp files and directories removed at the end of each test that created them. Therefore, using packages like ``tempfile``, which address these needs is essential.
However, when debugging tests, you need to be able to see what goes into the temp file or directory and you want to know it's exact path and not having it randomized on every test re-run.
A helper class :obj:`transformers.test_utils.TestCasePlus` is best used for such purposes. It's a sub-class of :obj:`unittest.TestCase`, so we can easily inherit from it in the test modules.
Here is an example of its usage:
.. code-block:: python
from transformers.testing_utils import TestCasePlus
class ExamplesTests(TestCasePlus):
def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir()
This code creates a unique temporary directory, and sets :obj:`tmp_dir` to its location.
In this and all the following scenarios the temporary directory will be auto-removed at the end of test, unless ``after=False`` is passed to the helper function.
* Create a temporary directory of my choice and delete it at the end - useful for debugging when you want to monitor a specific directory:
.. code-block:: python
def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test")
* Create a temporary directory of my choice and do not delete it at the end---useful for when you want to look at the temp results:
.. code-block:: python
def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", after=False)
* Create a temporary directory of my choice and ensure to delete it right away---useful for when you disabled deletion in the previous test run and want to make sure the that temporary directory is empty before the new test is run:
.. code-block:: python
def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", before=True)
.. note::
In order to run the equivalent of ``rm -r`` safely, only subdirs of the project repository checkout are allowed if an explicit obj:`tmp_dir` is used, so that by mistake no ``/tmp`` or similar important part of the filesystem will get nuked. i.e. please always pass paths that start with ``./``.
.. note::
Each test can register multiple temporary directories and they all will get auto-removed, unless requested otherwise.
Skipping tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is useful when a bug is found and a new test is written, yet the
bug is not fixed yet. In order to be able to commit it to the main
repository we need make sure it's skipped during ``make test``.
Methods:
- A **skip** means that you expect your test to pass only if some
conditions are met, otherwise pytest should skip running the test
altogether. Common examples are skipping windows-only tests on
non-windows platforms, or skipping tests that depend on an external
resource which is not available at the moment (for example a
database).
- A **xfail** means that you expect a test to fail for some reason. A
common example is a test for a feature not yet implemented, or a bug
not yet fixed. When a test passes despite being expected to fail
(marked with pytest.mark.xfail), its an xpass and will be reported
in the test summary.
One of the important differences between the two is that ``skip``
doesn't run the test, and ``xfail`` does. So if the code that's buggy
causes some bad state that will affect other tests, do not use
``xfail``.
Implementation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Here is how to skip whole test unconditionally:
.. code-block:: python
@unittest.skip("this bug needs to be fixed")
def test_feature_x():
or via pytest:
.. code-block:: python
@pytest.mark.skip(reason="this bug needs to be fixed")
or the ``xfail`` way:
.. code-block:: python
@pytest.mark.xfail
def test_feature_x():
Here is how to skip a test based on some internal check inside the test:
.. code-block:: python
def test_feature_x():
if not has_something():
pytest.skip("unsupported configuration")
or the whole module:
.. code-block:: python
import pytest
if not pytest.config.getoption("--custom-flag"):
pytest.skip("--custom-flag is missing, skipping tests", allow_module_level=True)
or the ``xfail`` way:
.. code-block:: python
def test_feature_x():
pytest.xfail("expected to fail until bug XYZ is fixed")
Here is how to skip all tests in a module if some import is missing:
.. code-block:: python
docutils = pytest.importorskip("docutils", minversion="0.3")
- Skip a test based on a condition:
.. code-block:: python
@pytest.mark.skipif(sys.version_info < (3,6), reason="requires python3.6 or higher")
def test_feature_x():
or:
.. code-block:: python
@unittest.skipIf(torch_device == "cpu", "Can't do half precision")
def test_feature_x():
or skip the whole module:
.. code-block:: python
@pytest.mark.skipif(sys.platform == 'win32', reason="does not run on windows")
class TestClass():
def test_feature_x(self):
More details, example and ways are `here <https://docs.pytest.org/en/latest/skipping.html>`__.
Custom markers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Slow tests
Tests that are too slow (e.g. once downloading huge model files) are marked with:
.. code-block:: python
from transformers.testing_utils import slow
@slow
def test_integration_foo():
To run such tests set ``RUN_SLOW=1`` env var, e.g.:
.. code-block:: bash
RUN_SLOW=1 pytest tests
Some decorators like ``@parametrized`` rewrite test names, therefore ``@slow`` and the rest of the skip decorators ``@require_*`` have to be listed last for them to work correctly. Here is an example of the correct usage:
.. code-block:: python
@parameterized.expand(...)
@slow
def test_integration_foo():
Testing the stdout/stderr output
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In order to test functions that write to ``stdout`` and/or ``stderr``,
the test can access those streams using the ``pytest``'s `capsys
system <https://docs.pytest.org/en/latest/capture.html>`__. Here is how
this is accomplished:
.. code-block:: python
import sys
def print_to_stdout(s): print(s)
def print_to_stderr(s): sys.stderr.write(s)
def test_result_and_stdout(capsys):
msg = "Hello"
print_to_stdout(msg)
print_to_stderr(msg)
out, err = capsys.readouterr() # consume the captured output streams
# optional: if you want to replay the consumed streams:
sys.stdout.write(out)
sys.stderr.write(err)
# test:
assert msg in out
assert msg in err
And, of course, most of the time, ``stderr`` will come as a part of an
exception, so try/except has to be used in such a case:
.. code-block:: python
def raise_exception(msg): raise ValueError(msg)
def test_something_exception():
msg = "Not a good value"
error = ''
try:
raise_exception(msg)
except Exception as e:
error = str(e)
assert msg in error, f"{msg} is in the exception:\n{error}"
Another approach to capturing stdout is via ``contextlib.redirect_stdout``:
.. code-block:: python
from io import StringIO
from contextlib import redirect_stdout
def print_to_stdout(s): print(s)
def test_result_and_stdout():
msg = "Hello"
buffer = StringIO()
with redirect_stdout(buffer):
print_to_stdout(msg)
out = buffer.getvalue()
# optional: if you want to replay the consumed streams:
sys.stdout.write(out)
# test:
assert msg in out
An important potential issue with capturing stdout is that it may
contain ``\r`` characters that in normal ``print`` reset everything that
has been printed so far. There is no problem with ``pytest``, but with
``pytest -s`` these characters get included in the buffer, so to be able
to have the test run with and without ``-s``, you have to make an extra
cleanup to the captured output, using ``re.sub(r'~.*\r', '', buf, 0, re.M)``.
But, then we have a helper context manager wrapper to automatically take
care of it all, regardless of whether it has some ``\r``'s in it or
not, so it's a simple:
.. code-block:: python
from transformers.testing_utils import CaptureStdout
with CaptureStdout() as cs:
function_that_writes_to_stdout()
print(cs.out)
Here is a full test example:
.. code-block:: python
from transformers.testing_utils import CaptureStdout
msg = "Secret message\r"
final = "Hello World"
with CaptureStdout() as cs:
print(msg + final)
assert cs.out == final+"\n", f"captured: {cs.out}, expecting {final}"
If you'd like to capture ``stderr`` use the :obj:`CaptureStderr` class
instead:
.. code-block:: python
from transformers.testing_utils import CaptureStderr
with CaptureStderr() as cs:
function_that_writes_to_stderr()
print(cs.err)
If you need to capture both streams at once, use the parent
:obj:`CaptureStd` class:
.. code-block:: python
from transformers.testing_utils import CaptureStd
with CaptureStd() as cs:
function_that_writes_to_stdout_and_stderr()
print(cs.err, cs.out)
Capturing logger stream
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you need to validate the output of a logger, you can use :obj:`CaptureLogger`:
.. code-block:: python
from transformers import logging
from transformers.testing_utils import CaptureLogger
msg = "Testing 1, 2, 3"
logging.set_verbosity_info()
logger = logging.get_logger("transformers.tokenization_bart")
with CaptureLogger(logger) as cl:
logger.info(msg)
assert cl.out, msg+"\n"
Testing with environment variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you want to test the impact of environment variables for a specific test you can use a helper decorator ``transformers.testing_utils.mockenv``
.. code-block:: python
from transformers.testing_utils import mockenv
class HfArgumentParserTest(unittest.TestCase):
@mockenv(TRANSFORMERS_VERBOSITY="error")
def test_env_override(self):
env_level_str = os.getenv("TRANSFORMERS_VERBOSITY", None)
Getting reproducible results
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In some situations you may want to remove randomness for your tests. To
get identical reproducable results set, you will need to fix the seed:
.. code-block:: python
seed = 42
# python RNG
import random
random.seed(seed)
# pytorch RNGs
import torch
torch.manual_seed(seed)
torch.backends.cudnn.deterministic = True
if torch.cuda.is_available(): torch.cuda.manual_seed_all(seed)
# numpy RNG
import numpy as np
np.random.seed(seed)
# tf RNG
tf.random.set_seed(seed)
Debugging tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To start a debugger at the point of the warning, do this:
.. code-block:: bash
pytest tests/test_logging.py -W error::UserWarning --pdb

View File

@@ -1,243 +1,243 @@
Tokenizer summary
-----------------
In this page, we will have a closer look at tokenization. As we saw in
:doc:`the preprocessing tutorial <preprocessing>`, tokenizing a text is splitting it into words or subwords, which then
are converted to ids. The second part is pretty straightforward, here we will focus on the first part. More
specifically, we will look at the three main different kinds of tokenizers used in 🤗 Transformers:
:ref:`Byte-Pair Encoding (BPE) <byte-pair-encoding>`, :ref:`WordPiece <wordpiece>` and
:ref:`SentencePiece <sentencepiece>`, and provide examples of models using each of those.
Note that on each model page, you can look at the documentation of the associated tokenizer to know which of those
algorithms the pretrained model used. For instance, if we look at :class:`~transformers.BertTokenizer`, we can see it's
using :ref:`WordPiece <wordpiece>`.
Introduction to tokenization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Splitting a text in smaller chunks is a task that's harder than it looks, and there are multiple ways of doing it. For
instance, let's look at the sentence "Don't you love 🤗 Transformers? We sure do." A first simple way of tokenizing
this text is just to split it by spaces, which would give:
::
["Don't", "you", "love", "🤗", "Transformers?", "We", "sure", "do."]
This is a nice first step, but if we look at the tokens "Transformers?" or "do.", we can see we can do better. Those
will be different than the tokens "Transformers" and "do" for our model, so we should probably take the punctuation
into account. This would give:
::
["Don", "'", "t", "you", "love", "🤗", "Transformers", "?", "We", "sure", "do", "."]
which is better already. One thing that is annoying though is how it dealt with "Don't". "Don't" stands for do not, so
it should probably be better tokenized as ``["Do", "n't"]``. This is where things start getting more complicated, and
part of the reason each kind of model has its own tokenizer class. Depending on the rules we apply to split our texts
into tokens, we'll get different tokenized versions of the same text. And of course, a given pretrained model won't
perform properly if you don't use the exact same rules as the persons who pretrained it.
`spaCy <https://spacy.io/>`__ and `Moses <http://www.statmt.org/moses/?n=Development.GetStarted>`__ are two popular
rule-based tokenizers. On the text above, they'd output something like:
::
["Do", "n't", "you", "love", "🤗", "Transformers", "?", "We", "sure", "do", "."]
Space/punctuation-tokenization and rule-based tokenization are both examples of word tokenization, which is splitting a
sentence into words. While it's the most intuitive way to separate texts in smaller chunks, it can have a problem when
you have a huge corpus: it usually yields a very big vocabulary (the set of all unique tokens used).
:doc:`Transformer XL <model_doc/transformerxl>` for instance uses space/punctuation-tokenization, and has a vocabulary
size of 267,735!
A huge vocabulary size means a huge embedding matrix at the start of the model, which will cause memory problems.
TransformerXL deals with it by using a special kind of embeddings called adaptive embeddings, but in general,
transformers model rarely have a vocabulary size greater than 50,000, especially if they are trained on a single
language.
So if tokenizing on words is unsatisfactory, we could go on the opposite direction and simply tokenize on characters.
While it's very simple and would save a lot of memory, this doesn't allow the model to learn representations of texts
as meaningful as when using a word tokenization, leading to a loss of performance. So to get the best of both worlds,
all transformers models use a hybrid between word-level and character-level tokenization called subword tokenization.
Subword tokenization
^^^^^^^^^^^^^^^^^^^^
Subword tokenization algorithms rely on the principle that most common words should be left as is, but rare words
should be decomposed in meaningful subword units. For instance "annoyingly" might be considered a rare word and
decomposed as "annoying" and "ly". This is especially useful in agglutinative languages such as Turkish, where you can
form (almost) arbitrarily long complex words by stringing together some subwords.
This allows the model to keep a reasonable vocabulary while still learning useful representations for common words or
subwords. This also gives the ability to the model to process words it has never seen before, by decomposing them into
subwords it knows. For instance, the base :class:`~transformers.BertTokenizer` will tokenize "I have a new GPU!" like
this:
::
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> tokenizer.tokenize("I have a new GPU!")
['i', 'have', 'a', 'new', 'gp', '##u', '!']
Since we are considering the uncased model, the sentence was lowercased first. Then all the words were present in the
vocabulary of the tokenizer, except for "gpu", so the tokenizer split it in subwords it knows: "gp" and "##u". The "##"
means that the rest of the token should be attached to the previous one, without space (for when we need to decode
predictions and reverse the tokenization).
Another example is when we use the base :class:`~transformers.XLNetTokenizer` to tokenize our previous text:
::
>>> from transformers import XLNetTokenizer
>>> tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
>>> tokenizer.tokenize("Don't you love 🤗 Transformers? We sure do.")
['▁Don', "'", 't', '▁you', '▁love', '▁', '🤗', '▁', 'Transform', 'ers', '?', '▁We', '▁sure', '▁do', '.']
We'll get back to the meaning of those '▁' when we look at :ref:`SentencePiece <sentencepiece>` but you can see
Transformers has been split into "Transform" and "ers".
Let's now look at how the different subword tokenization algorithms work. Note that they all rely on some form of
training which is usually done on the corpus the corresponding model will be trained on.
.. _byte-pair-encoding:
Byte-Pair Encoding
~~~~~~~~~~~~~~~~~~
Byte-Pair Encoding was introduced in `this paper <https://arxiv.org/abs/1508.07909>`__. It relies on a pretokenizer
splitting the training data into words, which can be a simple space tokenization
(:doc:`GPT-2 <model_doc/gpt2>` and :doc:`Roberta <model_doc/roberta>` uses this for instance) or a rule-based tokenizer
(:doc:`XLM <model_doc/xlm>` use Moses for most languages, as does :doc:`FlauBERT <model_doc/flaubert>`),
:doc:`GPT <model_doc/gpt>` uses Spacy and ftfy) and, counts the frequency of each word in the training corpus.
It then begins from the list of all characters, and will learn merge rules to form a new token from two symbols in the
vocabulary until it has learned a vocabulary of the desired size (this is a hyperparameter to pick).
Let's say that after the pre-tokenization we have the following words (the number indicating the frequency of each
word):
::
('hug', 10), ('pug', 5), ('pun', 12), ('bun', 4), ('hugs', 5)
Then the base vocabulary is ['b', 'g', 'h', 'n', 'p', 's', 'u'] and all our words are first split by character:
::
('h' 'u' 'g', 10), ('p' 'u' 'g', 5), ('p' 'u' 'n', 12), ('b' 'u' 'n', 4), ('h' 'u' 'g' 's', 5)
We then take each pair of symbols and look at the most frequent. For instance 'hu' is present `10 + 5 = 15` times (10
times in the 10 occurrences of 'hug', 5 times in the 5 occurrences of 'hugs'). The most frequent here is 'ug', present
`10 + 5 + 2 + 5 = 22` times in total. So the first merge rule the tokenizer learns is to group all 'u' and 'g' together
then it adds 'ug' to the vocabulary. Our corpus then becomes
::
('h' 'ug', 10), ('p' 'ug', 5), ('p' 'u' 'n', 12), ('b' 'u' 'n', 4), ('h' 'ug' 's', 5)
and we continue by looking at the next most common pair of symbols. It's 'un', present 16 times, so we merge those two
and add 'un' to the vocabulary. Then it's 'hug' (as 'h' + 'ug'), present 15 times, so we merge those two and add 'hug'
to the vocabulary.
At this stage, the vocabulary is ``['b', 'g', 'h', 'n', 'p', 's', 'u', 'ug', 'un', 'hug']`` and our corpus is
represented as
::
('hug', 10), ('p' 'ug', 5), ('p' 'un', 12), ('b' 'un', 4), ('hug' 's', 5)
If we stop there, the tokenizer can apply the rules it learned to new words (as long as they don't contain characters that
were not in the base vocabulary). For instance 'bug' would be tokenized as ``['b', 'ug']`` but mug would be tokenized as
``['<unk>', 'ug']`` since the 'm' is not in the base vocabulary. This doesn't happen to letters in general (since the
base corpus uses all of them), but to special characters like emojis.
As we said before, the vocabulary size (which is the base vocabulary size + the number of merges) is a hyperparameter
to choose. For instance :doc:`GPT <model_doc/gpt>` has a vocabulary size of 40,478 since they have 478 base characters
and chose to stop the training of the tokenizer at 40,000 merges.
Byte-level BPE
^^^^^^^^^^^^^^
To deal with the fact the base vocabulary needs to get all base characters, which can be quite big if one allows for
all unicode characters, the
`GPT-2 paper <https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf>`__
introduces a clever trick, which is to use bytes as the base vocabulary (which gives a size of 256). With some
additional rules to deal with punctuation, this manages to be able to tokenize every text without needing an unknown
token. For instance, the :doc:`GPT-2 model <model_doc/gpt>` has a vocabulary size of 50,257, which corresponds to the
256 bytes base tokens, a special end-of-text token and the symbols learned with 50,000 merges.
.. _wordpiece:
WordPiece
=========
WordPiece is the subword tokenization algorithm used for :doc:`BERT <model_doc/bert>` (as well as
:doc:`DistilBERT <model_doc/distilbert>` and :doc:`Electra <model_doc/electra>`) and was outlined in
`this paper <https://static.googleusercontent.com/media/research.google.com/ja//pubs/archive/37842.pdf>`__. It relies
on the same base as BPE, which is to initialize the vocabulary to every character present in the corpus and
progressively learn a given number of merge rules, the difference is that it doesn't choose the pair that is the most
frequent but the one that will maximize the likelihood on the corpus once merged.
What does this mean? Well, in the previous example, it means we would only merge 'u' and 'g' if the probability of
having 'ug' divided by the probability of having 'u' then 'g' is greater than for any other pair of symbols. It's
subtly different from what BPE does in the sense that it evaluates what it "loses" by merging two symbols and makes
sure it's `worth it`.
.. _unigram:
Unigram
=======
Unigram is a subword tokenization algorithm introduced in `this paper <https://arxiv.org/pdf/1804.10959.pdf>`__.
Instead of starting with a group of base symbols and learning merges with some rule, like BPE or WordPiece, it starts
from a large vocabulary (for instance, all pretokenized words and the most common substrings) that it will trim down
progressively. It's not used directly for any of the pretrained models in the library, but it's used in conjunction
with :ref:`SentencePiece <sentencepiece>`.
More specifically, at a given step, unigram computes a loss from the corpus we have and the current vocabulary, then,
for each subword, evaluate how much the loss would augment if the subword was removed from the vocabulary. It then
sorts the subwords by this quantity (that represents how worse the loss becomes if the token is removed) and removes
all the worst p tokens (for instance p could be 10% or 20%). It then repeats the process until the vocabulary has
reached the desired size, always keeping the base characters (to be able to tokenize any word written with them, like
BPE or WordPiece).
Contrary to BPE and WordPiece that work out rules in a certain order that you can then apply in the same order when
tokenizing new text, Unigram will have several ways of tokenizing a new text. For instance, if it ends up with the
vocabulary
::
['b', 'g', 'h', 'n', 'p', 's', 'u', 'ug', 'un', 'hug']
we had before, it could tokenize "hugs" as ``['hug', 's']``, ``['h', 'ug', 's']`` or ``['h', 'u', 'g', 's']``. So which
one choose? On top of saving the vocabulary, the trained tokenizer will save the probability of each token in the
training corpus. You can then give a probability to each tokenization (which is the product of the probabilities of the
tokens forming it) and pick the most likely one (or if you want to apply some data augmentation, you could sample one
of the tokenization according to their probabilities).
Those probabilities are what are used to define the loss that trains the tokenizer: if our corpus consists of the
words :math:`x_{1}, \dots, x_{N}` and if for the word :math:`x_{i}` we note :math:`S(x_{i})` the set of all possible
tokenizations of :math:`x_{i}` (with the current vocabulary), then the loss is defined as
.. math::
\mathcal{L} = -\sum_{i=1}^{N} \log \left ( \sum_{x \in S(x_{i})} p(x) \right )
.. _sentencepiece:
SentencePiece
=============
All the methods we have been looking at so far required some from of pretrokenization, which has a central problem: not
all languages use spaces to separate words. This is a problem :doc:`XLM <model_doc/xlm>` solves by using specific
pretokenizers for each of those languages (in this case, Chinese, Japanese and Thai). To solve this problem,
SentencePiece (introduced in `this paper <https://arxiv.org/pdf/1808.06226.pdf>`__) treats the input as a raw stream,
includes the space in the set of characters to use, then uses BPE or unigram to construct the appropriate vocabulary.
That's why in the example we saw before using :class:`~transformers.XLNetTokenizer` (which uses SentencePiece), we had
some '▁' characters, that represent spaces. Decoding a tokenized text is then super easy: we just have to concatenate
all of them together and replace those '▁' by spaces.
All transformers models in the library that use SentencePiece use it with unigram. Examples of models using it are
:doc:`ALBERT <model_doc/albert>`, :doc:`XLNet <model_doc/xlnet>` or the :doc:`Marian framework <model_doc/marian>`.
Tokenizer summary
-----------------------------------------------------------------------------------------------------------------------
In this page, we will have a closer look at tokenization. As we saw in
:doc:`the preprocessing tutorial <preprocessing>`, tokenizing a text is splitting it into words or subwords, which then
are converted to ids. The second part is pretty straightforward, here we will focus on the first part. More
specifically, we will look at the three main different kinds of tokenizers used in 🤗 Transformers:
:ref:`Byte-Pair Encoding (BPE) <byte-pair-encoding>`, :ref:`WordPiece <wordpiece>` and
:ref:`SentencePiece <sentencepiece>`, and provide examples of models using each of those.
Note that on each model page, you can look at the documentation of the associated tokenizer to know which of those
algorithms the pretrained model used. For instance, if we look at :class:`~transformers.BertTokenizer`, we can see it's
using :ref:`WordPiece <wordpiece>`.
Introduction to tokenization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Splitting a text in smaller chunks is a task that's harder than it looks, and there are multiple ways of doing it. For
instance, let's look at the sentence "Don't you love 🤗 Transformers? We sure do." A first simple way of tokenizing
this text is just to split it by spaces, which would give:
.. code-block::
["Don't", "you", "love", "🤗", "Transformers?", "We", "sure", "do."]
This is a nice first step, but if we look at the tokens "Transformers?" or "do.", we can see we can do better. Those
will be different than the tokens "Transformers" and "do" for our model, so we should probably take the punctuation
into account. This would give:
.. code-block::
["Don", "'", "t", "you", "love", "🤗", "Transformers", "?", "We", "sure", "do", "."]
which is better already. One thing that is annoying though is how it dealt with "Don't". "Don't" stands for do not, so
it should probably be better tokenized as ``["Do", "n't"]``. This is where things start getting more complicated, and
part of the reason each kind of model has its own tokenizer class. Depending on the rules we apply to split our texts
into tokens, we'll get different tokenized versions of the same text. And of course, a given pretrained model won't
perform properly if you don't use the exact same rules as the persons who pretrained it.
`spaCy <https://spacy.io/>`__ and `Moses <http://www.statmt.org/moses/?n=Development.GetStarted>`__ are two popular
rule-based tokenizers. On the text above, they'd output something like:
.. code-block::
["Do", "n't", "you", "love", "🤗", "Transformers", "?", "We", "sure", "do", "."]
Space/punctuation-tokenization and rule-based tokenization are both examples of word tokenization, which is splitting a
sentence into words. While it's the most intuitive way to separate texts in smaller chunks, it can have a problem when
you have a huge corpus: it usually yields a very big vocabulary (the set of all unique tokens used).
:doc:`Transformer XL <model_doc/transformerxl>` for instance uses space/punctuation-tokenization, and has a vocabulary
size of 267,735!
A huge vocabulary size means a huge embedding matrix at the start of the model, which will cause memory problems.
TransformerXL deals with it by using a special kind of embeddings called adaptive embeddings, but in general,
transformers models rarely have a vocabulary size greater than 50,000, especially if they are trained on a single
language.
So if tokenizing on words is unsatisfactory, we could go on the opposite direction and simply tokenize on characters.
While it's very simple and would save a lot of memory, this doesn't allow the model to learn representations of texts
as meaningful as when using a word tokenization, leading to a loss of performance. So to get the best of both worlds,
all transformers models use a hybrid between word-level and character-level tokenization called subword tokenization.
Subword tokenization
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Subword tokenization algorithms rely on the principle that most common words should be left as is, but rare words
should be decomposed in meaningful subword units. For instance "annoyingly" might be considered a rare word and
decomposed as "annoying" and "ly". This is especially useful in agglutinative languages such as Turkish, where you can
form (almost) arbitrarily long complex words by stringing together some subwords.
This allows the model to keep a reasonable vocabulary while still learning useful representations for common words or
subwords. This also enables the model to process words it has never seen before, by decomposing them into
subwords it knows. For instance, the base :class:`~transformers.BertTokenizer` will tokenize "I have a new GPU!" like
this:
.. code-block::
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> tokenizer.tokenize("I have a new GPU!")
['i', 'have', 'a', 'new', 'gp', '##u', '!']
Since we are considering the uncased model, the sentence was lowercased first. Then all the words were present in the
vocabulary of the tokenizer, except for "gpu", so the tokenizer split it in subwords it knows: "gp" and "##u". The "##"
means that the rest of the token should be attached to the previous one, without space (for when we need to decode
predictions and reverse the tokenization).
Another example is when we use the base :class:`~transformers.XLNetTokenizer` to tokenize our previous text:
.. code-block::
>>> from transformers import XLNetTokenizer
>>> tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
>>> tokenizer.tokenize("Don't you love 🤗 Transformers? We sure do.")
['▁Don', "'", 't', '▁you', '▁love', '▁', '🤗', '▁', 'Transform', 'ers', '?', '▁We', '▁sure', '▁do', '.']
We'll get back to the meaning of those '▁' when we look at :ref:`SentencePiece <sentencepiece>` but you can see
Transformers has been split into "Transform" and "ers".
Let's now look at how the different subword tokenization algorithms work. Note that they all rely on some form of
training which is usually done on the corpus the corresponding model will be trained on.
.. _byte-pair-encoding:
Byte-Pair Encoding
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Byte-Pair Encoding was introduced in `this paper <https://arxiv.org/abs/1508.07909>`__. It relies on a pretokenizer
splitting the training data into words, which can be a simple space tokenization
(:doc:`GPT-2 <model_doc/gpt2>` and :doc:`Roberta <model_doc/roberta>` uses this for instance) or a rule-based tokenizer
(:doc:`XLM <model_doc/xlm>` use Moses for most languages, as does :doc:`FlauBERT <model_doc/flaubert>`),
:doc:`GPT <model_doc/gpt>` uses Spacy and ftfy, and counts the frequency of each word in the training corpus.
It then begins from the list of all characters, and will learn merge rules to form a new token from two symbols in the
vocabulary until it has learned a vocabulary of the desired size (this is a hyperparameter to pick).
Let's say that after the pre-tokenization we have the following words (the number indicating the frequency of each
word):
.. code-block::
('hug', 10), ('pug', 5), ('pun', 12), ('bun', 4), ('hugs', 5)
Then the base vocabulary is ['b', 'g', 'h', 'n', 'p', 's', 'u'] and all our words are first split by character:
.. code-block::
('h' 'u' 'g', 10), ('p' 'u' 'g', 5), ('p' 'u' 'n', 12), ('b' 'u' 'n', 4), ('h' 'u' 'g' 's', 5)
We then take each pair of symbols and look at the most frequent. For instance 'hu' is present `10 + 5 = 15` times (10
times in the 10 occurrences of 'hug', 5 times in the 5 occurrences of 'hugs'). The most frequent here is 'ug', present
`10 + 5 + 5 = 20` times in total. So the first merge rule the tokenizer learns is to group all 'u' and 'g' together
then it adds 'ug' to the vocabulary. Our corpus then becomes
.. code-block::
('h' 'ug', 10), ('p' 'ug', 5), ('p' 'u' 'n', 12), ('b' 'u' 'n', 4), ('h' 'ug' 's', 5)
and we continue by looking at the next most common pair of symbols. It's 'un', present 16 times, so we merge those two
and add 'un' to the vocabulary. Then it's 'hug' (as 'h' + 'ug'), present 15 times, so we merge those two and add 'hug'
to the vocabulary.
At this stage, the vocabulary is ``['b', 'g', 'h', 'n', 'p', 's', 'u', 'ug', 'un', 'hug']`` and our corpus is
represented as
.. code-block::
('hug', 10), ('p' 'ug', 5), ('p' 'un', 12), ('b' 'un', 4), ('hug' 's', 5)
If we stop there, the tokenizer can apply the rules it learned to new words (as long as they don't contain characters that
were not in the base vocabulary). For instance 'bug' would be tokenized as ``['b', 'ug']`` but mug would be tokenized as
``['<unk>', 'ug']`` since the 'm' is not in the base vocabulary. This doesn't happen to letters in general (since the
base corpus uses all of them), but to special characters like emojis.
As we said before, the vocabulary size (which is the base vocabulary size + the number of merges) is a hyperparameter
to choose. For instance :doc:`GPT <model_doc/gpt>` has a vocabulary size of 40,478 since they have 478 base characters
and chose to stop the training of the tokenizer at 40,000 merges.
Byte-level BPE
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To deal with the fact the base vocabulary needs to get all base characters, which can be quite big if one allows for
all unicode characters, the
`GPT-2 paper <https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf>`__
introduces a clever trick, which is to use bytes as the base vocabulary (which gives a size of 256). With some
additional rules to deal with punctuation, this manages to be able to tokenize every text without needing an unknown
token. For instance, the :doc:`GPT-2 model <model_doc/gpt>` has a vocabulary size of 50,257, which corresponds to the
256 bytes base tokens, a special end-of-text token and the symbols learned with 50,000 merges.
.. _wordpiece:
WordPiece
=======================================================================================================================
WordPiece is the subword tokenization algorithm used for :doc:`BERT <model_doc/bert>` (as well as
:doc:`DistilBERT <model_doc/distilbert>` and :doc:`Electra <model_doc/electra>`) and was outlined in
`this paper <https://static.googleusercontent.com/media/research.google.com/ja//pubs/archive/37842.pdf>`__. It relies
on the same base as BPE, which is to initialize the vocabulary to every character present in the corpus and
progressively learn a given number of merge rules, the difference is that it doesn't choose the pair that is the most
frequent but the one that will maximize the likelihood on the corpus once merged.
What does this mean? Well, in the previous example, it means we would only merge 'u' and 'g' if the probability of
having 'ug' divided by the probability of having 'u' then 'g' is greater than for any other pair of symbols. It's
subtly different from what BPE does in the sense that it evaluates what it "loses" by merging two symbols and makes
sure it's `worth it`.
.. _unigram:
Unigram
=======================================================================================================================
Unigram is a subword tokenization algorithm introduced in `this paper <https://arxiv.org/pdf/1804.10959.pdf>`__.
Instead of starting with a group of base symbols and learning merges with some rule, like BPE or WordPiece, it starts
from a large vocabulary (for instance, all pretokenized words and the most common substrings) that it will trim down
progressively. It's not used directly for any of the pretrained models in the library, but it's used in conjunction
with :ref:`SentencePiece <sentencepiece>`.
More specifically, at a given step, unigram computes a loss from the corpus we have and the current vocabulary, then,
for each subword, evaluate how much the loss would augment if the subword was removed from the vocabulary. It then
sorts the subwords by this quantity (that represents how worse the loss becomes if the token is removed) and removes
all the worst p tokens (for instance p could be 10% or 20%). It then repeats the process until the vocabulary has
reached the desired size, always keeping the base characters (to be able to tokenize any word written with them, like
BPE or WordPiece).
Contrary to BPE and WordPiece that work out rules in a certain order that you can then apply in the same order when
tokenizing new text, Unigram will have several ways of tokenizing a new text. For instance, if it ends up with the
vocabulary
.. code-block::
['b', 'g', 'h', 'n', 'p', 's', 'u', 'ug', 'un', 'hug']
we had before, it could tokenize "hugs" as ``['hug', 's']``, ``['h', 'ug', 's']`` or ``['h', 'u', 'g', 's']``. So which
one choose? On top of saving the vocabulary, the trained tokenizer will save the probability of each token in the
training corpus. You can then give a probability to each tokenization (which is the product of the probabilities of the
tokens forming it) and pick the most likely one (or if you want to apply some data augmentation, you could sample one
of the tokenization according to their probabilities).
Those probabilities define the loss that trains the tokenizer: if our corpus consists of the
words :math:`x_{1}, \dots, x_{N}` and if for the word :math:`x_{i}` we note :math:`S(x_{i})` the set of all possible
tokenizations of :math:`x_{i}` (with the current vocabulary), then the loss is defined as
.. math::
\mathcal{L} = -\sum_{i=1}^{N} \log \left ( \sum_{x \in S(x_{i})} p(x) \right )
.. _sentencepiece:
SentencePiece
=======================================================================================================================
All the methods we have been looking at so far required some form of pretokenization, which has a central problem: not
all languages use spaces to separate words. This is a problem :doc:`XLM <model_doc/xlm>` solves by using specific
pretokenizers for each of those languages (in this case, Chinese, Japanese and Thai). To solve this problem,
SentencePiece (introduced in `this paper <https://arxiv.org/pdf/1808.06226.pdf>`__) treats the input as a raw stream,
includes the space in the set of characters to use, then uses BPE or unigram to construct the appropriate vocabulary.
That's why in the example we saw before using :class:`~transformers.XLNetTokenizer` (which uses SentencePiece), we had
the '▁' character, that represents space. Decoding a tokenized text is then super easy: we just have to concatenate
all of them together and replace '▁' with space.
All transformers models in the library that use SentencePiece use it with unigram. Examples of models using it are
:doc:`ALBERT <model_doc/albert>`, :doc:`XLNet <model_doc/xlnet>` or the :doc:`Marian framework <model_doc/marian>`.

View File

@@ -1,135 +0,0 @@
TorchScript
================================================
.. note::
This is the very beginning of our experiments with TorchScript and we are still exploring its capabilities
with variable-input-size models. It is a focus of interest to us and we will deepen our analysis in upcoming
releases, with more code examples, a more flexible implementation, and benchmarks comparing python-based codes
with compiled TorchScript.
According to Pytorch's documentation: "TorchScript is a way to create serializable and optimizable models from PyTorch code".
Pytorch's two modules `JIT and TRACE <https://pytorch.org/docs/stable/jit.html>`_ allow the developer to export
their model to be re-used in other programs, such as efficiency-oriented C++ programs.
We have provided an interface that allows the export of 🤗 Transformers models to TorchScript so that they can
be reused in a different environment than a Pytorch-based python program. Here we explain how to use our models so that
they can be exported, and what to be mindful of when using these models with TorchScript.
Exporting a model needs two things:
* dummy inputs to execute a model forward pass.
* the model needs to be instantiated with the ``torchscript`` flag.
These necessities imply several things developers should be careful about. These are detailed below.
Implications
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TorchScript flag and tied weights
------------------------------------------------
This flag is necessary because most of the language models in this repository have tied weights between their
``Embedding`` layer and their ``Decoding`` layer. TorchScript does not allow the export of models that have tied weights,
it is therefore necessary to untie the weights beforehand.
This implies that models instantiated with the ``torchscript`` flag have their ``Embedding`` layer and ``Decoding`` layer
separate, which means that they should not be trained down the line. Training would de-synchronize the two layers,
leading to unexpected results.
This is not the case for models that do not have a Language Model head, as those do not have tied weights. These models
can be safely exported without the ``torchscript`` flag.
Dummy inputs and standard lengths
------------------------------------------------
The dummy inputs are used to do a model forward pass. While the inputs' values are propagating through the layers,
Pytorch keeps track of the different operations executed on each tensor. These recorded operations are then used
to create the "trace" of the model.
The trace is created relatively to the inputs' dimensions. It is therefore constrained by the dimensions of the dummy
input, and will not work for any other sequence length or batch size. When trying with a different size, an error such
as:
``The expanded size of the tensor (3) must match the existing size (7) at non-singleton dimension 2``
will be raised. It is therefore recommended to trace the model with a dummy input size at least as large as the largest
input that will be fed to the model during inference. Padding can be performed to fill the missing values. As the model
will have been traced with a large input size however, the dimensions of the different matrix will be large as well,
resulting in more calculations.
It is recommended to be careful of the total number of operations done on each input and to follow performance closely
when exporting varying sequence-length models.
Using TorchScript in Python
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Below are examples of using the Python to save, load models as well as how to use the trace for inference.
Saving a model
------------------------------------------------
This snippet shows how to use TorchScript to export a ``BertModel``. Here the ``BertModel`` is instantiated
according to a ``BertConfig`` class and then saved to disk under the filename ``traced_bert.pt``
.. code-block:: python
from transformers import BertModel, BertTokenizer, BertConfig
import torch
enc = BertTokenizer.from_pretrained("bert-base-uncased")
# Tokenizing input text
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = enc.tokenize(text)
# Masking one of the input tokens
masked_index = 8
tokenized_text[masked_index] = '[MASK]'
indexed_tokens = enc.convert_tokens_to_ids(tokenized_text)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
# Creating a dummy input
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
dummy_input = [tokens_tensor, segments_tensors]
# Initializing the model with the torchscript flag
# Flag set to True even though it is not necessary as this model does not have an LM Head.
config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768,
num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, torchscript=True)
# Instantiating the model
model = BertModel(config)
# The model needs to be in evaluation mode
model.eval()
# If you are instantiating the model with `from_pretrained` you can also easily set the TorchScript flag
model = BertModel.from_pretrained("bert-base-uncased", torchscript=True)
# Creating the trace
traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors])
torch.jit.save(traced_model, "traced_bert.pt")
Loading a model
------------------------------------------------
This snippet shows how to load the ``BertModel`` that was previously saved to disk under the name ``traced_bert.pt``.
We are re-using the previously initialised ``dummy_input``.
.. code-block:: python
loaded_model = torch.jit.load("traced_model.pt")
loaded_model.eval()
all_encoder_layers, pooled_output = loaded_model(dummy_input)
Using a traced model for inference
------------------------------------------------
Using the traced model for inference is as simple as using its ``__call__`` dunder method:
.. code-block:: python
traced_model(tokens_tensor, segments_tensors)

View File

@@ -1,5 +1,5 @@
Training and fine-tuning
========================
=======================================================================================================================
Model classes in 🤗 Transformers are designed to be compatible with native
PyTorch and TensorFlow 2 and can be used seemlessly with either. In this
@@ -16,15 +16,15 @@ TF2, and focus specifically on the nuances and tools for training models in
Sections:
* :ref:`pytorch`
* :ref:`tensorflow`
* :ref:`trainer`
* :ref:`additional-resources`
- :ref:`pytorch`
- :ref:`tensorflow`
- :ref:`trainer`
- :ref:`additional-resources`
.. _pytorch:
Fine-tuning in native PyTorch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Model classes in 🤗 Transformers that don't begin with ``TF`` are
`PyTorch Modules <https://pytorch.org/docs/master/generated/torch.nn.Module.html>`_,
@@ -39,7 +39,7 @@ of the specified model are used to initialize the model. The
library also includes a number of task-specific final layers or 'heads' whose
weights are instantiated randomly when not present in the specified
pre-trained model. For example, instantiating a model with
``BertForSequenceClassification.from_pretrained('bert-base-uncased', num_classes=2)``
``BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)``
will create a BERT model instance with encoder weights copied from the
``bert-base-uncased`` model and a randomly initialized sequence
classification head on top of the encoder with an output size of 2. Models
@@ -49,7 +49,7 @@ put it in train mode.
.. code-block:: python
from transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', return_dict=True)
model.train()
This is useful because it allows us to make use of the pre-trained BERT
@@ -99,7 +99,7 @@ backwards pass and update the weights:
labels = torch.tensor([1,0]).unsqueeze(0)
outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
loss = outputs[0]
loss = outputs.loss
loss.backward()
optimizer.step()
@@ -111,7 +111,7 @@ The following is equivalent to the previous example:
from torch.nn import functional as F
labels = torch.tensor([1,0]).unsqueeze(0)
outputs = model(input_ids, attention_mask=attention_mask)
loss = F.cross_entropy(labels, outputs[0])
loss = F.cross_entropy(labels, outputs.logitd)
loss.backward()
optimizer.step()
@@ -131,7 +131,6 @@ Then all we have to do is call ``scheduler.step()`` after ``optimizer.step()``.
.. code-block:: python
...
loss.backward()
optimizer.step()
scheduler.step()
@@ -142,7 +141,7 @@ with features like mixed precision and easy tensorboard logging.
Freezing the encoder
--------------------
-----------------------------------------------------------------------------------------------------------------------
In some cases, you might be interested in keeping the weights of the
pre-trained encoder frozen and optimizing only the weights of the head
@@ -151,7 +150,7 @@ the encoder parameters, which can be accessed with the ``base_model``
submodule on any task-specific model in the library:
.. code-block:: python
for param in model.base_model.parameters():
param.requires_grad = False
@@ -159,7 +158,7 @@ submodule on any task-specific model in the library:
.. _tensorflow:
Fine-tuning in native TensorFlow 2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Models can also be trained natively in TensorFlow 2. Just as with PyTorch,
TensorFlow models can be instantiated with
@@ -182,6 +181,7 @@ the pretrained tokenizer name.
.. code-block:: python
from transformers import BertTokenizer, glue_convert_examples_to_features
import tensorflow as tf
import tensorflow_datasets as tfds
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
data = tfds.load('glue/mrpc')
@@ -191,7 +191,7 @@ the pretrained tokenizer name.
The model can then be compiled and trained as any Keras model:
.. code-block:: python
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer, loss=loss)
@@ -210,7 +210,7 @@ can even save the model and then reload it as a PyTorch model (or vice-versa):
.. _trainer:
Trainer
^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We also provide a simple but feature-complete training and evaluation
interface through :func:`~transformers.Trainer` and
@@ -272,7 +272,7 @@ optimize.
:func:`~transformers.Trainer` uses a built-in default function to collate
batches and prepare them to be fed into the model. If needed, you can also
use the ``data_collator`` argument to pass your own collator function which
takes in the data in the format provides by your dataset and returns a
takes in the data in the format provided by your dataset and returns a
batch ready to be fed into the model. Note that
:func:`~transformers.TFTrainer` expects the passed datasets to be dataset
objects from ``tensorflow_datasets``.
@@ -282,7 +282,7 @@ your own ``compute_metrics`` function and pass it to the trainer.
.. code-block:: python
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
def compute_metrics(pred):
labels = pred.label_ids
@@ -303,21 +303,16 @@ launching tensorboard in your specified ``logging_dir`` directory.
.. _additional-resources:
Additional resources
^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* `A lightweight colab demo
<https://colab.research.google.com/drive/1-JIJlao4dI-Ilww_NnTc0rxtp-ymgDgM?usp=sharing>`_
which uses ``Trainer`` for IMDb sentiment classification.
- `A lightweight colab demo <https://colab.research.google.com/drive/1-JIJlao4dI-Ilww_NnTc0rxtp-ymgDgM?usp=sharing>`_
which uses ``Trainer`` for IMDb sentiment classification.
* `🤗 Transformers Examples <https://github.com/huggingface/transformers/tree/master/examples>`_
including scripts for training and fine-tuning on GLUE, SQuAD, and
several other tasks.
- `🤗 Transformers Examples <https://github.com/huggingface/transformers/tree/master/examples>`_
including scripts for training and fine-tuning on GLUE, SQuAD, and several other tasks.
* `How to train a language model
<https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb>`_,
a detailed colab notebook which uses ``Trainer`` to train a masked
language model from scratch on Esperanto.
- `How to train a language model <https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb>`_,
a detailed colab notebook which uses ``Trainer`` to train a masked language model from scratch on Esperanto.
* `🤗 Transformers Notebooks <./notebooks.html>`_ which contain dozens
of example notebooks from the community for training and using
🤗 Transformers on a variety of tasks.
- `🤗 Transformers Notebooks <notebooks.html>`_ which contain dozens of example notebooks from the community for
training and using 🤗 Transformers on a variety of tasks.

View File

@@ -1,7 +1,7 @@
# Examples
Version 2.9 of 🤗 Transformers introduces a new [`Trainer`](https://github.com/huggingface/transformers/blob/master/src/transformers/trainer.py) class for PyTorch, and its equivalent [`TFTrainer`](https://github.com/huggingface/transformers/blob/master/src/transformers/trainer_tf.py) for TF 2.
Running the examples requires PyTorch 1.3.1+ or TensorFlow 2.1+.
Running the examples requires PyTorch 1.3.1+ or TensorFlow 2.2+.
Here is the list of all our examples:
- **grouped by task** (all official examples work for multiple models)
@@ -21,13 +21,13 @@ This is still a work-in-progress in particular documentation is still sparse
| [**`text-classification`**](https://github.com/huggingface/transformers/tree/master/examples/text-classification) | GLUE, XNLI | ✅ | ✅ | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/trainer/01_text_classification.ipynb)
| [**`token-classification`**](https://github.com/huggingface/transformers/tree/master/examples/token-classification) | CoNLL NER | ✅ | ✅ | ✅ | -
| [**`multiple-choice`**](https://github.com/huggingface/transformers/tree/master/examples/multiple-choice) | SWAG, RACE, ARC | ✅ | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ViktorAlm/notebooks/blob/master/MPC_GPU_Demo_for_TF_and_PT.ipynb)
| [**`question-answering`**](https://github.com/huggingface/transformers/tree/master/examples/question-answering) | SQuAD | - | ✅ | - | -
| [**`question-answering`**](https://github.com/huggingface/transformers/tree/master/examples/question-answering) | SQuAD | | ✅ | - | -
| [**`text-generation`**](https://github.com/huggingface/transformers/tree/master/examples/text-generation) | - | n/a | n/a | n/a | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/02_how_to_generate.ipynb)
| [**`distillation`**](https://github.com/huggingface/transformers/tree/master/examples/distillation) | All | - | - | - | -
| [**`summarization`**](https://github.com/huggingface/transformers/tree/master/examples/seq2seq) | CNN/Daily Mail | - | - | ✅ | -
| [**`translation`**](https://github.com/huggingface/transformers/tree/master/examples/seq2seq) | WMT | - | - | ✅ | -
| [**`bertology`**](https://github.com/huggingface/transformers/tree/master/examples/bertology) | - | - | - | - | -
| [**`adversarial`**](https://github.com/huggingface/transformers/tree/master/examples/adversarial) | HANS | ✅ | - | - | -
| [**`distillation`**](https://github.com/huggingface/transformers/tree/master/examples/distillation) | All | - | - | - | -
| [**`summarization`**](https://github.com/huggingface/transformers/tree/master/examples/seq2seq) | CNN/Daily Mail | | - | ✅ | -
| [**`translation`**](https://github.com/huggingface/transformers/tree/master/examples/seq2seq) | WMT | | - | ✅ | -
| [**`bertology`**](https://github.com/huggingface/transformers/tree/master/examples/bertology) | - | - | - | - | -
| [**`adversarial`**](https://github.com/huggingface/transformers/tree/master/examples/adversarial) | HANS | ✅ | - | - | -
<br>
@@ -78,3 +78,50 @@ python examples/xla_spawn.py --num_cores 8 \
```
Feedback and more use cases and benchmarks involving TPUs are welcome, please share with the community.
## Logging & Experiment tracking
You can easily log and monitor your runs code. The following are currently supported:
* [TensorBoard](https://www.tensorflow.org/tensorboard)
* [Weights & Biases](https://docs.wandb.com/library/integrations/huggingface)
* [Comet ML](https://www.comet.ml/docs/python-sdk/huggingface/)
### Weights & Biases
To use Weights & Biases, install the wandb package with:
```bash
pip install wandb
```
Then log in the command line:
```bash
wandb login
```
If you are in Jupyter or Colab, you should login with:
```python
import wandb
wandb.login()
```
Whenever you use `Trainer` or `TFTrainer` classes, your losses, evaluation metrics, model topology and gradients (for `Trainer` only) will automatically be logged.
When using 🤗 Transformers with PyTorch Lightning, runs can be tracked through `WandbLogger`. Refer to related [documentation & examples](https://docs.wandb.com/library/integrations/lightning).
### Comet.ml
To use `comet_ml`, install the Python package with:
```bash
pip install comet_ml
```
or if in a Conda environment:
```bash
conda install -c comet_ml -c anaconda -c conda-forge comet_ml
```

View File

@@ -20,8 +20,8 @@ from dataclasses import dataclass
from typing import List, Optional, Union
import tqdm
from filelock import FileLock
from filelock import FileLock
from transformers import (
BartTokenizer,
BartTokenizerFast,
@@ -112,7 +112,10 @@ if is_torch_available():
cached_features_file = os.path.join(
data_dir,
"cached_{}_{}_{}_{}".format(
"dev" if evaluate else "train", tokenizer.__class__.__name__, str(max_seq_length), task,
"dev" if evaluate else "train",
tokenizer.__class__.__name__,
str(max_seq_length),
task,
),
)
label_list = processor.get_labels()
@@ -255,7 +258,11 @@ class HansProcessor(DataProcessor):
return self._create_examples(self._read_tsv(os.path.join(data_dir, "heuristics_evaluation_set.txt")), "dev")
def get_labels(self):
"""See base class."""
"""See base class.
Note that we follow the standard three labels for MNLI
(see :class:`~transformers.data.processors.utils.MnliProcessor`)
but the HANS evaluation groups `contradiction` and `neutral` into `non-entailment` (label 0) while
`entailment` is label 1."""
return ["contradiction", "entailment", "neutral"]
def _create_examples(self, lines, set_type):
@@ -268,13 +275,16 @@ class HansProcessor(DataProcessor):
text_a = line[5]
text_b = line[6]
pairID = line[7][2:] if line[7].startswith("ex") else line[7]
label = line[-1]
label = line[0]
examples.append(InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label, pairID=pairID))
return examples
def hans_convert_examples_to_features(
examples: List[InputExample], label_list: List[str], max_length: int, tokenizer: PreTrainedTokenizer,
examples: List[InputExample],
label_list: List[str],
max_length: int,
tokenizer: PreTrainedTokenizer,
):
"""
Loads a data file into a list of ``InputFeatures``

View File

@@ -0,0 +1,10 @@
# 🤗 Benchmark results
Here, you can find a list of the different benchmark results created by the community.
If you would like to list benchmark results on your favorite models of the [model hub](https://huggingface.co/models) here, please open a Pull Request and add it below.
| Benchmark description | Results | Environment info | Author |
|:----------|:-------------|:-------------|------:|
| PyTorch Benchmark on inference for `bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
| PyTorch Benchmark on inference for `bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |

View File

@@ -20,7 +20,9 @@ class PlotArguments:
Arguments pertaining to which model/config/tokenizer we are going to fine-tune, or train from scratch.
"""
csv_file: str = field(metadata={"help": "The csv file to plot."},)
csv_file: str = field(
metadata={"help": "The csv file to plot."},
)
plot_along_batch: bool = field(
default=False,
metadata={"help": "Whether to plot along batch size or sequence lengh. Defaults to sequence length."},
@@ -30,7 +32,8 @@ class PlotArguments:
metadata={"help": "Whether the csv file has time results or memory results. Defaults to memory results."},
)
no_log_scale: bool = field(
default=False, metadata={"help": "Disable logarithmic scale when plotting"},
default=False,
metadata={"help": "Disable logarithmic scale when plotting"},
)
is_train: bool = field(
default=False,
@@ -39,7 +42,8 @@ class PlotArguments:
},
)
figure_png_file: Optional[str] = field(
default=None, metadata={"help": "Filename under which the plot will be saved. If unused no plot is saved."},
default=None,
metadata={"help": "Filename under which the plot will be saved. If unused no plot is saved."},
)
short_model_names: Optional[List[str]] = list_field(
default=None, metadata={"help": "List of model names that are used instead of the ones in the csv file."}

View File

@@ -20,7 +20,25 @@ from transformers import HfArgumentParser, PyTorchBenchmark, PyTorchBenchmarkArg
def main():
parser = HfArgumentParser(PyTorchBenchmarkArguments)
benchmark_args = parser.parse_args_into_dataclasses()[0]
try:
benchmark_args = parser.parse_args_into_dataclasses()[0]
except ValueError as e:
arg_error_msg = "Arg --no_{0} is no longer used, please use --no-{0} instead."
begin_error_msg = " ".join(str(e).split(" ")[:-1])
full_error_msg = ""
depreciated_args = eval(str(e).split(" ")[-1])
wrong_args = []
for arg in depreciated_args:
# arg[2:] removes '--'
if arg[2:] in PyTorchBenchmarkArguments.deprecated_args:
# arg[5:] removes '--no_'
full_error_msg += arg_error_msg.format(arg[5:])
else:
wrong_args.append(arg)
if len(wrong_args) > 0:
full_error_msg = full_error_msg + begin_error_msg + str(wrong_args)
raise ValueError(full_error_msg)
benchmark = PyTorchBenchmark(args=benchmark_args)
benchmark.run()

View File

@@ -22,6 +22,24 @@ def main():
parser = HfArgumentParser(TensorFlowBenchmarkArguments)
benchmark_args = parser.parse_args_into_dataclasses()[0]
benchmark = TensorFlowBenchmark(args=benchmark_args)
try:
benchmark_args = parser.parse_args_into_dataclasses()[0]
except ValueError as e:
arg_error_msg = "Arg --no_{0} is no longer used, please use --no-{0} instead."
begin_error_msg = " ".join(str(e).split(" ")[:-1])
full_error_msg = ""
depreciated_args = eval(str(e).split(" ")[-1])
wrong_args = []
for arg in depreciated_args:
# arg[2:] removes '--'
if arg[2:] in TensorFlowBenchmark.deprecated_args:
# arg[5:] removes '--no_'
full_error_msg += arg_error_msg.format(arg[5:])
else:
wrong_args.append(arg)
if len(wrong_args) > 0:
full_error_msg = full_error_msg + begin_error_msg + str(wrong_args)
raise ValueError(full_error_msg)
benchmark.run()

View File

@@ -101,30 +101,30 @@ class AlbertModelWithPabee(AlbertModel):
regression=False,
):
r"""
Return:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.AlbertConfig`) and inputs:
last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):
Sequence of hidden-states at the output of the last layer of the model.
pooler_output (:obj:`torch.FloatTensor`: of shape :obj:`(batch_size, hidden_size)`):
Last layer hidden-state of the first token of the sequence (classification token)
further processed by a Linear layer and a Tanh activation function. The Linear
layer weights are trained from the next sentence prediction (classification)
objective during pre-training.
Return:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.AlbertConfig`) and inputs:
last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):
Sequence of hidden-states at the output of the last layer of the model.
pooler_output (:obj:`torch.FloatTensor`: of shape :obj:`(batch_size, hidden_size)`):
Last layer hidden-state of the first token of the sequence (classification token)
further processed by a Linear layer and a Tanh activation function. The Linear
layer weights are trained from the next sentence prediction (classification)
objective during pre-training.
This output is usually *not* a good summary
of the semantic content of the input, you're often better with averaging or pooling
the sequence of hidden-states for the whole input sequence.
hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_hidden_states=True``):
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
This output is usually *not* a good summary
of the semantic content of the input, you're often better with averaging or pooling
the sequence of hidden-states for the whole input sequence.
hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_hidden_states=True``):
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_attentions=True``):
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_attentions=True``):
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
heads.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
heads.
"""
if input_ids is not None and inputs_embeds is not None:
@@ -157,7 +157,10 @@ class AlbertModelWithPabee(AlbertModel):
res = []
for i in range(self.config.num_hidden_layers):
encoder_outputs = self.encoder.adaptive_forward(
encoder_outputs, current_layer=i, attention_mask=extended_attention_mask, head_mask=head_mask,
encoder_outputs,
current_layer=i,
attention_mask=extended_attention_mask,
head_mask=head_mask,
)
pooled_output = self.pooler_activation(self.pooler(encoder_outputs[0][:, 0]))
@@ -174,7 +177,10 @@ class AlbertModelWithPabee(AlbertModel):
for i in range(self.config.num_hidden_layers):
calculated_layer_num += 1
encoder_outputs = self.encoder.adaptive_forward(
encoder_outputs, current_layer=i, attention_mask=extended_attention_mask, head_mask=head_mask,
encoder_outputs,
current_layer=i,
attention_mask=extended_attention_mask,
head_mask=head_mask,
)
pooled_output = self.pooler_activation(self.pooler(encoder_outputs[0][:, 0]))
@@ -236,42 +242,42 @@ class AlbertForSequenceClassificationWithPabee(AlbertPreTrainedModel):
labels=None,
):
r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
Labels for computing the sequence classification/regression loss.
Indices should be in ``[0, ..., config.num_labels - 1]``.
If ``config.num_labels == 1`` a regression loss is computed (Mean-Square loss),
If ``config.num_labels > 1`` a classification loss is computed (Cross-Entropy).
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
Labels for computing the sequence classification/regression loss.
Indices should be in ``[0, ..., config.num_labels - 1]``.
If ``config.num_labels == 1`` a regression loss is computed (Mean-Square loss),
If ``config.num_labels > 1`` a classification loss is computed (Cross-Entropy).
Returns:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.AlbertConfig`) and inputs:
loss: (`optional`, returned when ``labels`` is provided) ``torch.FloatTensor`` of shape ``(1,)``:
Classification (or regression if config.num_labels==1) loss.
logits ``torch.FloatTensor`` of shape ``(batch_size, config.num_labels)``
Classification (or regression if config.num_labels==1) scores (before SoftMax).
hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_hidden_states=True``):
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
Returns:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.AlbertConfig`) and inputs:
loss: (`optional`, returned when ``labels`` is provided) ``torch.FloatTensor`` of shape ``(1,)``:
Classification (or regression if config.num_labels==1) loss.
logits ``torch.FloatTensor`` of shape ``(batch_size, config.num_labels)``
Classification (or regression if config.num_labels==1) scores (before SoftMax).
hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_hidden_states=True``):
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_attentions=True``):
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_attentions=True``):
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
heads.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
heads.
Examples::
Examples::
from transformers import AlbertTokenizer
from pabee import AlbertForSequenceClassificationWithPabee
import torch
from transformers import AlbertTokenizer
from pabee import AlbertForSequenceClassificationWithPabee
import torch
tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')
model = AlbertForSequenceClassificationWithPabee.from_pretrained('albert-base-v2')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute")).unsqueeze(0) # Batch size 1
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
outputs = model(input_ids, labels=labels)
loss, logits = outputs[:2]
tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')
model = AlbertForSequenceClassificationWithPabee.from_pretrained('albert-base-v2')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute")).unsqueeze(0) # Batch size 1
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
outputs = model(input_ids, labels=labels)
loss, logits = outputs[:2]
"""

Some files were not shown because too many files have changed in this diff Show More