Compare commits

...

937 Commits

Author SHA1 Message Date
Lysandre
96d1cfb13d Patch release: v4.8.2
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
2021-06-30 14:18:21 +02:00
Sylvain Gugger
7d42ddda89 Add option to save on each training node (#12421)
* Add option to save on each training node

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Address review comments

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-06-30 12:47:22 +02:00
Jabin Huang
22bb717c04 fix ids_to_tokens naming error in tokenizer of deberta v2 (#12412)
Co-authored-by: Jipeng Huang <jihuan@microsoft.com>
2021-06-30 12:47:15 +02:00
NielsRogge
2fcc976045 Rename detr targets to labels (#12280)
* Rename target to labels in DetrFeatureExtractor

* Update DetrFeatureExtractor tests accordingly

* Improve docs of DetrFeatureExtractor

* Improve docs

* Make style
2021-06-30 12:47:04 +02:00
Sylvain Gugger
136617224b Release: v4.8.1
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
2021-06-24 10:12:11 -04:00
Lysandre Debut
c0073b66ec Fix torchscript tests (#12336)
* Fix torchscript tests

* Better test

* Remove bogus print
2021-06-24 15:53:07 +02:00
Richard Liaw
0b752bf9da try-this (#12338)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-24 15:53:00 +02:00
Sylvain Gugger
fb711f22d6 Fix default to logging_dir lost in merge conflict 2021-06-24 09:01:22 +02:00
Sylvain Gugger
055f86fd88 Release: v4.8.0 2021-06-24 09:01:00 +02:00
Patrick von Platen
468cda20f2 [Flax T5] Fix weight initialization and fix docs (#12327)
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
* finish t5 flax fixes

* improve naming
2021-06-23 17:39:21 +01:00
Sylvain Gugger
12a4457c56 Pin good version of huggingface_hub 2021-06-23 12:30:15 -04:00
Michael Benayoun
986ac03e37 changed modeling_fx_utils.py to utils/fx.py for clarity (#12326)
Co-authored-by: Michael Benayoun <michael@huggingface.co>
2021-06-23 18:16:24 +02:00
Lysandre
941b4442ba Temporarily revert the fill-mask improvements. 2021-06-23 17:46:24 +02:00
Lysandre Debut
4bdff2cdbe Conda build (#12323) 2021-06-23 11:07:07 -04:00
Sylvain Gugger
9eda6b52e2 Add all XxxPreTrainedModel to the main init (#12314)
* Add all XxxPreTrainedModel to the main init

* Add to template

* Add to template bis

* Add FlaxT5
2021-06-23 10:40:54 -04:00
Sylvain Gugger
53c60babe4 Clean push to hub API (#12187)
* Clean push to hub API

* Create working dir if it does not exist

* Different tweak

* New API + all models + test Flax

* Adds the Trainer clean up

* Update src/transformers/file_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* (nit) output types

* No need to set clone_from when folder exists

* Update src/transformers/trainer.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Add generated_from_trainer tag

* Update to new version

* Fixes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-06-23 10:11:19 -04:00
chenht2010
625f512d5e [TFWav2Vec2] Fix docs (#12283)
* fix error

* make style check happy

Co-authored-by: chenhaitao <chenhaitao@qiyi.com>
2021-06-23 14:51:31 +01:00
Patrick von Platen
44739c8180 [Flax/JAX] Add how to propose projects markdown (#12311)
* fix_torch_device_generate_test

* remove @

* finish

* make style
2021-06-23 14:50:35 +01:00
Lysandre Debut
ef3dceff4a Add mention of the huggingface_hub methods for offline mode (#12320) 2021-06-23 09:45:30 -04:00
Vasudev Gupta
e98233dde1 Flax T5 (#12150)
* copy pytorch-t5

* init

* boom boom

* forward pass same

* make generation work

* add more tests

* make test work

* finish normal tests

* make fix-copies

* finish quality

* correct slow example

* correct slow test

* version table

* upload models

* Update tests/test_modeling_flax_t5.py

* correct incorrectly deleted line

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-06-23 13:13:32 +01:00
David Fan
7d4cfa3b47 Rewrite ProphetNet to adapt converting ONNX friendly (#11981)
* Rewrite

* [ONNX] rewrite
2021-06-23 11:34:18 +01:00
Suraj Patil
c0fe3c9a7a Flax summarization script (#12230)
* add summrization script

* fix arguments, preprocessing, metrics

* add generation and metrics

* auto model, prediction loop

* prettify

* label smoothing

* adress Sylvain and Patricks suggestions

* dynamically import shift_tokens_right

* fix shift_tokens_right_fn call
2021-06-23 15:49:30 +05:30
Daniel Stancl
26a2e36595 Add output in a dictionary for TF generate method (#12139)
* Add output args to greedy search

* Fix critical typo + make style quality

* Handle generate_beam_search

* Add dict_specific tests and fix the placement of encoder outputs

* Add  specific outputs

* Update doc

* Fix typo

* Adjust handling encoder_outputs + Fix generating for T5

* Fix generate for RAG

* Fix handling ouptut_attentions when target_mapping is not None

Take care of situations when target_mapping is provided
as there are 2-tuple of attentions

Change from:
if inputs["output_attentions"]:
    attentions = tuple(tf.transpose(t, perm(2, 3, 0, 1)) for t in attentions)

to:
if inputs["output_attentions"]:
    if inputs["target_mapping"] is not None:
        # when target_mapping is provided, there are 2-tuple of attentions
         attentions = tuple(
             tuple(tf.transpose(attn_stream, perm=(2, 3, 0, 1)) for attn_stream in t) for t in attentions
        )
    else:
        attentions = tuple(tf.transpose(t, perm=(2, 3, 0, 1)) for t in attentions)

* Rename kwargs to model_kwargs

* make style quality

* Move imports in test_modeling_tf_common.py

Move ModelOutput-related imports in test_modeling_tf_common.py
into the `is_tf_available():` statement.

* Rewrite nested if-statements

* Fix added tests
2021-06-23 10:52:11 +01:00
Nicolas Patry
d4be498441 Optimizing away the fill-mask pipeline. (#12113)
* Optimizing away the `fill-mask` pipeline.

- Don't send anything to the tokenizer unless needed. Vocab check is
much faster
- Keep BC by sending data to the tokenizer when needed. User handling warning messages will see performance benefits again
- Make `targets` and `top_k` work together better `top_k` cannot be
higher than `len(targets)` but can be smaller still.
- Actually simplify the `target_ids` in case of duplicate (it can happen
because we're parsing raw strings)
- Removed useless code to fail on empty strings. It works only if empty
string is in first position, moved to ignoring them instead.
- Changed the related tests as only the tests would fail correctly
(having incorrect value in first position)

* Make tests compatible for 2 different vocabs... (at the price of a
warning).

Co-authored-by: @EtaoinWu

* ValueError working globally

* Update src/transformers/pipelines/fill_mask.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* `tokenizer.vocab` -> `tokenizer.get_vocab()` for more compatiblity +
fallback.

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-06-23 10:38:04 +02:00
Kevin Canwen Xu
037e466b10 Add CodeCarbon Integration (#12304)
* Add optional dependency

* Add CodeCarbon integration

* Add CodeCarbon integration

* Add CodeCarbon integration

* typo
2021-06-23 14:53:09 +08:00
Stas Bekman
bfd5da8e28 [docs] performance (#12258)
* initial performance document

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* rewrites based on suggestions

* 8x multiple is for AMP only

* add contribute section

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-06-22 15:34:19 -07:00
Sylvain Gugger
1562c04e41 FlaxBartPretrainedModel -> FlaxBartPreTrainedModel (#12313) 2021-06-22 16:37:05 -04:00
Stas Bekman
ebe5413589 [trainer] 2 bug fixes and a rename (#12309)
* bug fixes and a rename

* add extended DDP test
2021-06-22 11:13:23 -07:00
Patrick von Platen
64029abe4c [Flax] Main doc for event orga (#12305)
* fix_torch_device_generate_test

* remove @

* push

* finish

* some typos

* add more info on communication

* add suggestions
2021-06-22 18:02:52 +01:00
Kilian Kluge
032d56a435 Fix and improve documentation for LEDForConditionalGeneration (#12303)
* Replace conditional generation example (fixes #12268)

* Replace model in summarization example with finetuned checkpoint, adapt example text

* Fix typo in new summarization example

* Fix docstring formatting, add missing import statement to example
2021-06-22 09:58:13 -04:00
Suraj Patil
1498eb9888 add FlaxAutoModelForImageClassification in main init (#12298) 2021-06-22 18:26:05 +05:30
Stefan Schweter
2affeb2905 trainer_tf: adjust wandb installation command (#12291) 2021-06-22 08:47:31 -04:00
Hamid Shojanazeri
af6e01c5bc Fix for the issue of device-id getting hardcoded for token_type_ids during Tracing [WIP] (#11252)
* registering a buffer for token_type_ids, to pass the error of device-id getting hardcoded when tracing

* sytle format

* adding persistent flag to the resgitered buffers that prevent from adding them to the state_dict and addresses the Backward compatibility issue

* adding the try catch to the fix as persistent flag is only available from PT >1.6

* adding version check

* added the condition to only use the token_type_ids buffer when its autogenerated not passed by user

* adding comments and making the conidtion where token_type_ids are None to use the registered buffer

* taking out position-embeddding from the if block

* adding comments

* handling the case if buffer for position_ids was not registered

* reverted the changes on position_ids, fix the issue with size of token_type_ids buffer, moved the modification for generated token_type_ids to Bertmodel, instead of Embeddings

* reverting the token_type_ids in case of None to the previous version

* reverting changes on position_ids adding back the if block

* changes added by running make fix-copies

* changes added by running make fix-copies and added the import version as it was getting used

* changes added by running make fix-copies

* changes added by running make fix-copies

* fixing the import format

* fixing the import format

* modified to use temp tensor for trimed and expanded token_type_ids buffer

* changes made by fix-copies after temp tensor modifications

* changes made by fix-copies after temp tensor modifications

* changes made by fix-copies after temp tensor modifications

* clean up

* clean up

* clean up

* clean up

* Nit

* Nit

* Nit

* modified according to support device conversion on traced models

* modified according to support device conversion on traced models

* modified according to support device conversion on traced models

* modified according to support device conversion on traced models

* changes based on latest in master

* Adapt templates

* Add version import

Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-81.us-west-2.compute.internal>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-06-22 05:21:30 -04:00
Stas Bekman
0d97ba8a98 [tests] multiple improvements (#12294)
* [tests] multiple improvements

* cleanup

* style

* todo to investigate

* fix
2021-06-21 19:51:36 -07:00
Stas Bekman
dad414d5f9 [trainer + examples] set log level from CLI (#12276)
* set log level from CLI

* add log_level_replica + test + extended docs

* cleanup

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* rename datasets objects to allow datasets module

* improve the doc

* style

* doc improve

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-21 19:30:50 -07:00
Stas Bekman
a4ed074d4b reset report_to to none, avoid deprecation warning (#12293) 2021-06-21 16:50:12 -07:00
Patrick von Platen
7ef309ca10 [Flax] Add jax flax to env command (#12251)
* fix_torch_device_generate_test

* remove @

* add commands for flax/jax
2021-06-21 17:12:12 +01:00
Matt
e3cb7a0b60 Tensorflow QA example (#12252)
* New Tensorflow QA example!

* Style pass

* Updating README.md for the new example

* flake8 fixes

* Update examples/tensorflow/question-answering/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-21 16:37:28 +01:00
Patrick von Platen
4e9a6796c7 [Flax] Fix flax test save pretrained (#12256)
* fix_torch_device_generate_test

* remove @

* fix flax save pretrained test
2021-06-21 16:37:13 +01:00
Stas Bekman
b75b5605c9 [DeepSpeed] don't ignore --adafactor (#12257) 2021-06-21 08:17:00 -07:00
Suraj Patil
eb881674f2 [Flax] [WIP] allow loading head model with base model weights (#12255)
* boom boom

* remove flax clip example

* allow loading head model with base model weights

* add test

* fix imports

* disable save, load test for clip

* add test_save_load_to_base
2021-06-21 15:56:42 +01:00
Suraj Patil
8d5b7f36e5 [FlaxClip] fix test from/save pretrained test (#12284)
* boom boom

* remove flax clip example

* fix from_save_pretrained
2021-06-21 15:54:34 +01:00
Vishal Burman
b53bc55ba9 Fix for making student ProphetNet for Seq2Seq Distillation (#12130)
* make_student.py: fix to make student ProphetNet

* reformat
2021-06-21 09:36:44 -04:00
Lysandre Debut
b76850a808 Better CI feedback (#12279)
* Better run ID

* Only part of CI

* Revert "Only part of CI"

This reverts commit 29f7f248d21e0f5792e0670ba8705b31ad8967b7.
2021-06-21 02:52:12 -04:00
Lysandre
30a5521c0b Fix the scheduled CI 2021-06-21 08:27:25 +02:00
Stas Bekman
2e5dbdf2db [t5 doc] make the example work out of the box (#12239)
* [run_clm.py] restore caching

* style

* [t5 doc] make the example work out of the box

This PR expands the training example to include the correct model type for the example to work, e.g. with `T5Model` this example will break.

* Update docs/source/model_doc/t5.rst

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* expand the other example

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-06-18 10:00:19 -07:00
Xa9aX ツ
f3558bbcfd Depreciate pythonic Mish and support PyTorch 1.9 version of Mish (#12240)
* Moved Mish to Torch 1.9 version

* Run black formatting
2021-06-18 09:13:45 -04:00
Suraj Patil
47a9768334 [FlaxBart] few small fixes (#12247)
* boom boom

* remove flax clip example

* few small fixes
2021-06-18 10:29:42 +01:00
Suraj Patil
f74655cd9b [Flax] FlaxAutoModelForSeq2SeqLM (#12228)
* add FlaxAutoModelForSeq2SeqLM
2021-06-18 13:20:09 +05:30
Bhavitvya Malik
e43e11260f update desc for map in all examples (#12226)
* update desc for map in all examples

* added plm

* suggestions
2021-06-17 15:37:31 -04:00
Sylvain Gugger
adb70eda4d AutoTokenizer: infer the class from the tokenizer config if possible (#12208)
* AutoTokenizer: infer the class from the tokenizer config if possible

* Add tests

* Update src/transformers/models/auto/tokenization_auto.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-06-17 12:39:22 -04:00
Lysandre
0daadc1919 Docs for v4.8.0 2021-06-17 18:17:42 +02:00
Lysandre
7a6c9fab8e Release: v4.7.0
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
2021-06-17 17:57:42 +02:00
Stas Bekman
d6ea91c96a fix pt-1.9.0 add_ deprecation (#12217)
* fix pt-1.9.0 add_ deprecation

* add () for clarity

* Trigger CI

* require_version(torch
2021-06-17 08:53:59 -07:00
Lysandre Debut
3a960c4857 Support for torch 1.9.0 (#12224)
* Support for torch 1.9.0

* Torch scatter for 1.9.0

* Github Actions run on 1.9.0
2021-06-17 11:29:01 -04:00
Sylvain Gugger
afdd9e3663 Add link to the course (#12229) 2021-06-17 11:14:53 -04:00
NielsRogge
29b0aef871 Improve detr (#12147)
* Remove unused variables

* Improve docs

* Fix docs of segmentation masks

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-06-17 10:37:54 -04:00
Lysandre Debut
b56848c8c8 Pipeline update & tests (#12207) 2021-06-17 09:41:16 +02:00
Bhadresh Savani
700cee3446 [Docs] fixed broken link (#12205)
* fixed broken link

* Update docs/source/benchmarks.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/benchmarks.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-16 15:14:53 -04:00
Sylvain Gugger
255a17a089 Use yaml to create metadata (#12185)
* Use yaml to create metadata

* Fix typo

* Remove pin
2021-06-16 13:17:45 -04:00
Nicolas Patry
15ef0dc5c6 Enabling AutoTokenizer for HubertConfig. (#12198) 2021-06-16 15:28:46 +01:00
Philipp Schmid
afa414d060 updated DLC images and sample notebooks (#12191) 2021-06-16 07:24:00 -04:00
Patrick von Platen
ccca510276 Hubert (#11889)
* fix_torch_device_generate_test

* remove @

* add hubert

* add first test file

* more docs

* fix bugs

* fix bug

* finish

* finish

* finish docstring

* fix

* fix

* finalize

* add to ignored

* finish

* Apply suggestions from code review

* correct naming

* finish

* fix auto config

* finish

* correct convert script

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* apply suggestions lysandre & suraj

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-06-16 12:14:12 +01:00
Patrick von Platen
c3c39f7e84 [Flax] Add Beam Search (#12131)
* fix_torch_device_generate_test

* remove @

* push new logit processors

* add processors

* save first working version

* save intermediate

* finish

* make style

* make fix-copies

* finish

* Update tests/test_modeling_flax_bart.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-06-16 09:43:54 +01:00
Sylvain Gugger
802ffaff0d Temporarily deactivate torchhub test (#12184) 2021-06-15 16:16:51 -04:00
Lysandre Debut
52c7ca0488 Temporarily deactivate torch-scatter while we wait for new release (#12181)
* Temporarily deactivate torch-scatter while we wait for new release

* torch-1.8.1 binary for scatter

* Revert to 1.8.0

* Pin torch dependency

* torchaudio and torchvision
2021-06-15 16:03:58 -04:00
Sylvain Gugger
7d7ceca396 Model card defaults (#12122)
* [WIP] Model card defaults

* finetuned_from default value

* Add all mappings to the mapping file

* Be more defensive on finetuned_from arg

* Add default task tag

* Separate tags from tasks

* Edge case for dataset

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-06-15 16:01:37 -04:00
Stas Bekman
6e7cc5cc51 [testing] ensure concurrent pytest workers use a unique port for torch.dist (#12166)
* ensure concurrent pytest workers use a unique port for torch.distributed.launch

* reword
2021-06-15 11:12:59 -07:00
Amog Kamsetty
b9d66f4c4b Ray Tune Integration Updates (#12134)
* fix

* fixes

* add back to scheduled tests

* formatting

* Update integrations.py
2021-06-15 14:11:29 -04:00
Kilian Kluge
a79585bbf9 Update AutoModel classes in summarization example (#12178)
- Convert use of deprecated AutoModelWithLMHead to AutoModelForSeq2SeqLM
- Add newly required `truncation=True` to `tokenizer.encode` with `max_length`

This silences all warnings.
2021-06-15 10:36:10 -04:00
Sylvain Gugger
d6c929e200 Merge remote-tracking branch 'origin/master' 2021-06-15 09:37:46 -04:00
Sylvain Gugger
a8694b8850 Adjust banner width 2021-06-15 09:37:15 -04:00
kumapo
955b2b97a6 Enable add_prefix_space if model_type is roberta or gpt2 (#12116) 2021-06-15 09:33:21 -04:00
Sylvain Gugger
60b1d6b45b Add course banner (#12157)
* Add course banner

* Update course banner
2021-06-15 09:25:49 -04:00
Lysandre Debut
d07b540a37 Have dummy processors have a from_pretrained method (#12145) 2021-06-15 08:39:05 -04:00
Avital Oliver
9b393240a2 Use a released version of optax rather than installing from Git. (#12173)
Use a released version of optax rather than installing from Git
2021-06-15 16:42:51 +05:30
Patrick von Platen
9bc9e59869 [Flax generate] Add params to generate (#12171)
* fix_torch_device_generate_test

* remove @

* add params as input

* finish
2021-06-15 11:50:12 +01:00
Sylvain Gugger
a55dc157e3 Add video links to the documentation (#12162) 2021-06-15 06:37:37 -04:00
Stas Bekman
040283170c consistent nn. and nn.functional: part 5 docs (#12161) 2021-06-14 13:34:32 -07:00
Stas Bekman
88e84186e5 [style] consistent nn. and nn.functional: part 4 examples (#12156)
* consistent nn. and nn.functional: p4 examples

* restore
2021-06-14 12:28:24 -07:00
Stas Bekman
372ab9cd6d [style] consistent nn. and nn.functional: part 3 tests (#12155)
* consistent nn. and nn.functional: p3 templates

* restore
2021-06-14 12:18:22 -07:00
Vasudev Gupta
d9c0d08f9a Flax Big Bird (#11967)
* add flax bert

* bert -> bigbird

* original_full ported

* add debugger

* init block sparse

* fix copies ; gelu_fast -> gelu_new

* block sparse port

* fix block sparse

* block sparse working

* all ckpts working

* fix-copies

* make quality

* init tests

* temporary fix for FlaxBigBirdForMultipleChoice

* skip test_attention_outputs

* fix

* gelu_fast -> gelu_new ; fix multiple choice model

* remove nsp

* fix sequence classifier

* fix

* make quality

* make fix-copies

* finish

* Delete debugger.ipynb

* Update src/transformers/models/big_bird/modeling_flax_big_bird.py

* make style

* finish

* bye bye jit flax tests

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-06-14 20:01:03 +01:00
Stas Bekman
a156da9a23 consistent nn. and nn.functional: p2 templates (#12153) 2021-06-14 11:41:24 -07:00
Patrick von Platen
007be9e402 [Flax] Fix flax pt equivalence tests (#12154)
* fix_torch_device_generate_test

* remove @

* upload
2021-06-14 19:19:10 +01:00
Will Rice
d438eee030 Adding TFWav2Vec2Model (#11617)
* [WIP] Add TFWav2Vec2Model

Work in progress for adding a tensorflow version of Wav2Vec2

* feedback changes

* small fix

* Test Feedback Round 1

* Add SpecAugment and CTC Loss

* correct spec augment mask creation

* docstring and correct copyright

* correct bugs

* remove bogus file

* finish tests correction

* del unnecessary layers

* Update src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* make style

* correct final bug

* Feedback Changes

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-06-14 18:58:54 +01:00
Stas Bekman
1ed2ebf60d [style] consistent nn. and nn.functional (#12124)
* consistent nn. and nn.functional

* fix glitch

* fix glitch #2
2021-06-14 09:44:28 -07:00
Stas Bekman
ff7c81687a [optim] implement AdafactorSchedule (#12123)
* implement AdafactorSchedule

* typo

* fix

* Update src/transformers/optimization.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-14 09:43:48 -07:00
Suraj Patil
fe3576488a fix error message (#12148) 2021-06-14 14:12:18 +01:00
Kumar Abhishek
9de62cfbce [lm examples] Replicate --config_overrides addition to other LM examples (#12135)
* [lm examples] Replicate --config_overrides addition to other LM examples

* Removing no trainer files changes

* Update README

Co-authored-by: Kumar Abhishek <kabhishek@expedia.com>
2021-06-14 08:12:22 -04:00
Nicholas Broad
cd7961b632 Use text_column_name variable instead of "text" (#12132)
* Use text_column_name variable instead of "text"

`text_column_name` was already defined above where I made the changes and it was also used below where I made changes.

This is a very minor change. If a dataset does not use "text" as the column name, then the `tokenize_function` will now use whatever column is assigned to `text_column_name`. `text_column_name` is just the first column name if "text" is not a column name. It makes the function a little more robust, though I would assume that 90% + of datasets use "text" anyway.

* black formatting

* make style

Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>
2021-06-14 08:11:13 -04:00
Sylvain Gugger
b8ab541340 Don't log anything before logging is setup in examples (#12121)
* Don't log anything before logging is setup in examples

* Last example
2021-06-14 08:03:33 -04:00
Patrick von Platen
7566fefa69 [Flax] Add links to google colabs (#12146)
* fix_torch_device_generate_test

* remove @

* add colab links
2021-06-14 11:00:29 +01:00
SaulLu
476ba679dd Feature to use the PreTrainedTokenizerFast class as a stand-alone tokenizer (#11810)
* feature for tokenizer without slow/legacy version

* format

* modify common test

* add tests

* add PreTrainedTokenizerFast to AutoTokenizer

* format

* change tokenizer common test in order to be able to run test without a slow version

* update tokenizer fast test in order to use `rust_tokenizer_class` attribute instead of `tokenizer_class`

* add autokenizer test

* replace  `if self.tokenizer_class is not None` with ` if self.tokenizer_class is None`

* remove obsolete change in comment

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/tokenization_utils_fast.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* change `get_main_tokenizer` into `get_tokenizers`

* clarify `get_tokenizers` method

* homogenize with `test_slow_tokenizer` and `test_rust_tokenizer`

* add `test_rust_tokenizer = False` to tokenizer which don't define a fast version

* `test_rust_tokenizer = False` for BertJapaneseTokenizer

* `test_rust_tokenizer = False` for BertJapaneseCharacterTokenizationTest

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-14 11:58:44 +02:00
Daniel Stancl
4a51b1dd9b FlaxBart (#11537)
* Start working on FlaxBart

* Create modeling_flax_bart.py

* Write FlaxBartAttention

* Add FlaxBartEncoderLayer

* Add FlaxBartDecoderLayer and some typing

* Add helepr function for FlaxBart

* shift_tokens_right

* _make_causal_mask

* _expand_mask

* Add PositionalEmbedding and fix init_std naming

* Add FlaxBartPretrainedModel

* Add FlaxBartEncoder

* Add FlaxBartEncoder

* Add FlaxBartEncoder among modules to be imported

* YET WE CANNOT INITIALIZE THAT!! :(

* Make BartEncoder working

Change BartEncoder to instance of nn.Module so far

* Add FlaxBartDecoder

* Add FlaxBartModel

* TODO to make model run -> Prepapre model inputs

* Resolve padding

* Add FlaxBartModel

* Add FlaxBartModel into importable modules

* Remove FlaxBartEncoder and FlaxBartDecoder from importable modules

* make style; not properly working

* make style; make quality not pass due to some import I left

* Remove TODO for padding_idx in nn.Embed so far

* Add FlaxBartForConditionalGeneration

* Incorporate Flax model output classes, i.e. return_dict

* Add another models and incorporate use_cache arg

* Add FlaxBartForSequenceClassification and FlaxBartForQuestionAnswering

* Incorporate use_cache arg from PyTorch implementation

* Add all necessary Flax output utils

* Add FlaxBartForCausalLM; not working yet'

* Add minor improvements; still lacks some functionality

* Update docs, src and tests

* Add support of FlaxBart to docs/source

* Fix some bugs in FlaxBart souce code

* Add some neccessary tests for FlaxBart models - jit_compilation not passing

* Fix tests and add test_head_masking

* Fix tests for @jax.jit computation

* Add test_head_masking

* Migrate FlaxBart tests from jax.numpy to numpy

* Remove FlaxBartForCausalLM

* Clean repo

* fix bart model weight structure

* Fix FlaxBartForSequenceClassification

Slicing is not possible to use below jit, therefore, selecting sentence
representation from hidden_states must be changed.

* Allow FlaxBartForSequenceClassification for testing pt_flax equivalence

* Allow testing for FlaxBartForQA for pt_flax equivalence

* Add a comment to FlaxBartForSequenceClassification + change noise from 1e-3 to 1e-6

* remove past_key_values

* remove inputs_mebeds and make input_ids required

* add position ids

* re-write attention layer

* fix dataclass

* fix pos embeds and attention output

* fix pos embeds

* expose encode method

* expose decode method

* move docstring to top

* add cache for causal attn layer

* remove head masking for now

* s2s greedy search first pass

* boom boom

* fix typos

* fix greedy generate for bart

* use encoder, decoder layers instead of num_hidden_layers

* handle encoder_outputs

* cleanup

* simplify decoding

* more clean-up

* typos

* Change header + add {decoder_,}position_ids into 2 models

* add BartConfig

* fix existing tests

* add encode, decode methods

* Fix shift_tokens_right for JIT compilation + clarify one condition

* fix decode

* encoder => encode

* simplify generate

* add tests for encode and decode

* style

* add tests for cache

* fix equivalence tests

* sample generate now works with seq2seq

* generation tests

* initialize dense layers

* docstring and cleanup

* quality

* remove get/set input_embeddings

* address Patricks suggestions

* decode for every model, remove encoder_outputs from call

* update tests accordingly

* decode returns only decoder outputs and logits

* fix arguments

* doc encode, decode methods

* correct base_model_prefix

* fix test for seq classif model

* fix docs

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-06-14 15:16:08 +05:30
Suraj Patil
d36fce8237 add readme for flax clm (#12111)
* add readme for flax clm

* use section link for tokenizer

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* update metrics

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-06-14 15:03:55 +05:30
Patrick von Platen
16c0efca2c Add mlm pretraining xla torch readme (#12011)
* fix_torch_device_generate_test

* remove @

* upload

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* Update examples/flax/language-modeling/README.md

* add more info

* finish

* fix

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-06-14 10:31:21 +01:00
Guido Novati
ecd6efe7cb Fix megatron_gpt2 attention block's causal mask (#12007)
* Fix megatron_gpt2 attention block's causal mask.

* compatibility with checkpoints created with recent versions of Megatron-LM

* added integration test for the released Megatron-GPT2 model

* code style changes

* added option to megatron conversion script to read from config file

Co-authored-by: Guido Novati <gnovati@nvidia.com>
2021-06-14 04:57:55 -04:00
Jonathan Chang
783b0dd589 Fix t5 error message (#12136)
* Fix t5 error message

* Fix again
2021-06-13 12:02:57 +01:00
Lysandre Debut
3b1f5caff2 Add from_pretrained to dummy timm objects (#12097)
* Add from_pretrained to dummy timm

* Fix at the source

* Update utils/check_dummies.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Missing pretrained dummies

* Style

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-11 12:27:10 -04:00
Suraj Patil
15b498f3b8 Flax CLM script (#12023)
* first draft

* max_seq_length => block_size

* fix arg names

* fix typos

* fix loss calculation

* add max examples, fix  train eval steps, metrics

* optimizer mask

* fix perpelexity, metric logging

* fix logging

* data_collator = > data_loader

* refactor loss_fn

* support single GPU

* pass distributed to write_metric

* fix jitting

* fix single device training

* fix single device metrics

* close inner progress bars once finished

* add overwrite_cache arg

* ifx dataset caching issue

* add more logs

* few small fixes,

* address nicholas suggestions

* fix docstr

* address patricks suggestions

* make flake happy

* pass new new_dropout_rng to apply_gradients

* reset train metrics after every epoc

* remove distributed logis, small fixes
2021-06-11 15:16:20 +05:30
Patrick von Platen
e47765d884 Fix head masking generate tests (#12110)
* fix_torch_device_generate_test

* remove @

* fix tests
2021-06-11 04:04:07 -04:00
Bhavitvya Malik
d2753dcbec add relevant description to tqdm in examples (#11927)
* add relevant `desc` in examples

* require_version datasets>=1.8.0
2021-06-10 15:59:55 -04:00
Jayendra
9a9314f6d9 Flax VisionTransformer (#11951)
* adding vit for flax

* added test for Flax-vit and some bug-fixes

* overrided methods where variable changes were necessary for flax_vit test

* added FlaxViTForImageClassification for test

* Update src/transformers/models/vit/modeling_flax_vit.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* made changes suggested in PR

* Adding jax-vit models for autoimport

* swapping num_channels and height,width dimension

* fixing the docstring for torch-like inputs for VIT

* add model to main init

* add docs

* doc, fix-copies

* docstrings

* small test fixes

* fix docs

* fix docstr

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* style

Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-06-10 21:17:13 +05:30
Daniel Stancl
0eaeae2e36 Fix a condition in test_generate_with_head_masking (#11911)
* Fix a condition in test_generate_with_head_masking

* Fix usage of head_mask in bigbirg_pegasus

* Fix head masking for speech2text

* Resolve copy mismatch + drop unwanted print statement

* Fix the condition
2021-06-10 15:28:07 +01:00
Matt
bebbdd0fc9 Appending label2id and id2label to models to ensure inference works properly (#12102) 2021-06-10 15:25:04 +01:00
Matt
4cda08decb Minor style edits 2021-06-10 15:10:57 +01:00
Matt
7f08dbd10a Update README.md to cover the TF GLUE example. 2021-06-10 14:33:42 +01:00
Sylvain Gugger
d72e5a3a6d Fix quality 2021-06-10 09:27:11 -04:00
Matt
73a532651a New TF GLUE example (#12028)
* Pushing partially-complete new GLUE example

* First draft of the new TF GLUE example! Needs a little more testing to be sure but it's almost ready.

* Fix to the fit() call

* Bugfixes, making sure TPU and multi-GPU support is ready

* Remove logger line that depends on Pytorch

* Style pass

* Deleting old TF GLUE example

* Include label2id and id2label in the saved model config

* Don't clobber the existing model.config.label2id

* Style fixes

* Update examples/tensorflow/text-classification/run_glue.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-10 14:14:37 +01:00
Tobias Norlund
9d2cee8b48 CLIPFeatureExtractor should resize images with kept aspect ratio (#11994)
* Resize with kept aspect ratio

* Fixed failed test

* Overload center_crop and resize methods instead

* resize should handle non-PIL images

* update slow test

* Tensor => tensor

Co-authored-by: patil-suraj <surajp815@gmail.com>
2021-06-10 18:40:41 +05:30
kumapo
472a867626 Add text_column_name and label_column_name to run_ner and run_ner_no_trainer args (#12083)
* Add text_column_name and label_column_name to run_ner args

* Minor fix: grouping for text and label column name
2021-06-10 08:03:20 -04:00
Patrick von Platen
bc6f51e539 [Wav2Vec2ForPretraining] Correct checkpoints wav2vec2 & fix tests (#12089)
* fix_torch_device_generate_test

* remove @

* fix tests
2021-06-09 20:41:59 +01:00
Stas Bekman
61e191987d rm require_version_examples (#12088) 2021-06-09 11:02:52 -07:00
Suraj Patil
d1500d9151 pass decay_mask fn to optimizer (#12087) 2021-06-09 18:49:27 +01:00
Anton Lozhkov
d472bd7b18 Wav2Vec2 Pretraining (#11306)
* Working quantizer forward

* Working quantizer forward

* Clean up unused model parts, test reproducibility

* Working quantizer forward

* Clean up unused model parts, test reproducibility

* Remove custom outputs from the shared ones

* correct conversion

* correct bug

* add first pretrain script

* save intermediate

* static shapes

* save intermediate

* finish first pretrain script version

* more refactor

* remove wanddb

* refactor more

* improve test

* correct perplexity compute bug

* finish model implementation

* add to docs

* finish docs

* finish pretraining script

* finish pretraining script

* remove wandb

* finish PR for merge

* finish config

* finish

* make deepspeed work

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply suggestions

* fix flaky test

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-09 18:40:56 +01:00
Stas Bekman
b1a8aa94f0 [test] support more than 2 gpus (#12074)
* support more than 2 gpus

* style
2021-06-09 09:23:47 -07:00
NielsRogge
d3eacbb829 Add DETR (#11653)
* Squash all commits of modeling_detr_v7 branch into one

* Improve docs

* Fix tests

* Style

* Improve docs some more and fix most tests

* Fix slow tests of ViT, DeiT and DETR

* Improve replacement of batch norm

* Restructure timm backbone forward

* Make DetrForSegmentation support any timm backbone

* Fix name of output

* Address most comments by @LysandreJik

* Give better names for variables

* Conditional imports + timm in setup.py

* Address additional comments by @sgugger

* Make style, add require_timm and require_vision to testsé

* Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone

* Add png files to fixtures

* Fix type hint

* Add timm to workflows

* Add `BatchNorm2d` to the weight initialization

* Fix retain_grad test

* Replace model checkpoints by Facebook namespace

* Fix name of checkpoint in test

* Add user-friendly message when scipy is not available

* Address most comments by @patrickvonplaten

* Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner

* Better initialization

* Scipy is necessary to get sklearn metrics

* Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel

* Make style

* Improve docs and add 2 community notebooks

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-06-09 11:51:13 -04:00
Stas Bekman
d14e0af274 sync LayerDrop for Wav2Vec2Encoder + tests (#12076) 2021-06-09 13:21:03 +01:00
Koichi Yasuoka
82a2b76c95 Update run_ner.py with id2label config (#12001) 2021-06-09 07:27:05 -04:00
Stas Bekman
0e82f0cbc2 typo 2021-06-08 12:55:17 -07:00
Stas Bekman
11d86d3de4 [Deepspeed Wav2vec2] integration (#11638)
* wip

* wip - but working with https://github.com/microsoft/DeepSpeed/pull/1044

* cleanup

* workaround

* working 5/8 modes

* solve fp32 distributed zero3

* style

* sync

* sync

* rework

* deprecation

* cleanup

* https://github.com/microsoft/DeepSpeed/pull/1044 pr was merged

* clean up

* add a guide

* more prose

* more prose

* fix

* more prose

* sub_group_size was too big

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refactor

* bug fix

* make the true check explicit

* new deepspeed release

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-08 12:32:03 -07:00
Stas Bekman
32290d87f6 [Deepspeed] various fixes (#12058)
* replace deprecated config

* sub_group_size was too big

* complete deprecation removal
2021-06-08 08:36:15 -07:00
Sylvain Gugger
fd6902838a Properly indent block_size (#12070) 2021-06-08 10:27:02 -04:00
cdleong
49bee0aea4 Add torch to requirements.txt in language-modeling (#12040)
* Add torch to requirements.txt in language-modeling

* Update examples/pytorch/language-modeling/requirements.txt

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-08 09:02:35 -04:00
Mario Šaško
f5eec0d8e9 Replace legacy tensor.Tensor with torch.tensor/torch.empty (#12027)
* Replace legacy torch.Tensor constructor with torch.{tensor, empty}

* Remove torch.Tensor in examples
2021-06-08 13:58:38 +01:00
Shamane Siri
e33085d648 updated the original RAG implementation to be compatible with latest Pytorch-Lightning (#11806)
* updated the original RAG implementation to be compatible with the latest PL version

* updated the requirements.txt file

* execute make style

* code quality test

* code quality

* conflix resolved in requirement.txt

* code quality

* changed the MyDDP class name to CustomDDP
2021-06-08 13:42:49 +01:00
NielsRogge
70f88eeccc Fix tapas issue (#12063)
* Fix scatter function to be compatible with torch-scatter 2.7.0

* Allow test again
2021-06-08 05:22:31 -04:00
NielsRogge
e56e3140dd Fix integration tests (#12066) 2021-06-08 05:21:38 -04:00
Stas Bekman
4abc6dd690 skip failing test (#12059) 2021-06-07 20:48:41 -07:00
Russell Klopfer
e363e1d936 adds metric prefix. (#12057)
* adds metric prefix.

* update tests to include prefix
2021-06-07 22:34:10 -04:00
Peter Izsak
8994c1e472 Add optional grouped parsers description to HfArgumentParser (#12042)
* Adding optional argument group to HfArgumentParser

* Minor

* remove whitespace

* Minor styling
2021-06-07 11:47:12 -04:00
Nicolas Patry
2056f26e85 Extend pipelines for automodel tupels (#12025)
* fix_torch_device_generate_test

* remove @

* finish

* refactor

* add test

* fix test

* Attempt at simplification.

* Small fix.

* Fixing non existing AutoModel for TF.

* Naming.

* Remove extra condition.

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2021-06-07 17:41:27 +02:00
François Lagunas
f8bd8c6c7e Fixes bug that appears when using QA bert and distilation. (#12026)
* Fixing bug that appears when using distilation (and potentially other uses).
During backward pass Pytorch complains with:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
This happens because the QA model code modifies the start_positions and end_positions input tensors, using clamp_ function: as a consequence the teacher and the student both modifies the inputs, and backward pass fails.

* Fixing all models QA clamp_ bug.
2021-06-07 11:21:59 -04:00
Patrick von Platen
59f75d538b [JAX] Bump jax lib (#12053)
* fix_torch_device_generate_test

* remove @

* bump up jax lib
2021-06-07 13:04:18 +01:00
Suraj Patil
185122ef22 fix docs of past_key_values (#12049) 2021-06-07 15:24:03 +05:30
Philip May
3857f2b4e3 fix deberta 2 tokenizer integration test (#12017) 2021-06-07 04:55:55 -04:00
Shiva Pundir
20b6f3b80c Fixed Typo in modeling_bart.py (#12035)
* Fixed Typo in modeling_bart.py - Issue #11895

* Fixed Typo in modeling_bart.py
2021-06-07 11:44:25 +05:30
Stas Bekman
1f335aef3b [TrainerArguments] format and sort __repr__, add __str__ (#12018)
* format and sort __repr__, add __str__

* typo

* use __str__ directly

* alias __repr__ = __str__
2021-06-04 09:39:38 -07:00
Stas Bekman
2c73b93099 [Deepspeed] Assert on mismatches between ds and hf args (#12021)
* wip

* add mismatch validation + test

* renames

* Update docs/source/main_classes/deepspeed.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* renames

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-04 08:58:23 -07:00
Patrick von Platen
242ec31aa5 [Flax] Refactor MLM (#12013)
* fix_torch_device_generate_test

* remove @

* finish refactor

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-06-03 16:31:32 +01:00
Nicholas Vadivelu
4674061b2a Fix weight decay masking in run_flax_glue.py (#11964)
* Fix weight decay masking in `run_flax_glue.py`

Issues with the previous implementation:
- The `dict` from `traverse_util.flatten_dict` has keys which are tuples of strings, not one long string with the path separated by periods.
- `optax.masked` applies the transformation wherever the mask is True, so the masks are flipped.
- Flax's LayerNorm calls the scale parameter `scale` not `weight`

* Fix formatting with black

* adapt results

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-06-03 11:35:26 +01:00
Stas Bekman
61c5063491 [deepspeed] add nvme test skip rule (#11997)
* add nvme skip rule

* fix
2021-06-02 12:06:37 -07:00
Stas Bekman
640318befa [deepspeed] Move code and doc into standalone files (#11984)
* move code and docs

* style

* moved

* restore
2021-06-02 09:56:00 -07:00
Kou Yong Kang
d6d747cb28 Update return introduction (#11976)
Make it clear that the `forward` method now returns a dict instead of tuple.

Fix style
2021-06-02 12:53:09 -04:00
Stas Bekman
d406a2729a [docs] fix xref to PreTrainedModel.generate (#11049)
* fix xref to generate

* do the same for search methods

* style

* style
2021-06-02 09:21:05 -07:00
Gunjan Chhablani
123b597f5d Fix examples (#11990) 2021-06-02 10:12:52 -04:00
Gunjan Chhablani
88ca6a231d VisualBERT (#10534)
* Init VisualBERT

* Add cookie-cutter, Config, and Embeddings

* Add preliminary Model

* Add Bert analogous classes

* Add basic code for NLVR, VQA, Flickr

* Update Init

* Fix VisualBert Downstream Models

* Rename classifier to cls

* Comment position_ids buffer

* Remove sentence image predictor output

* Update output dicts

* Remove unnecessary files

* Fix Auto Modeling

* Fix transformers init

* Add conversion script

* Add conversion script

* Fix docs

* Update visualbert modelling

* Update configuration

* Style fixes

* Add model and integration tests

* Add all tests

* Update model mapping

* Add simple detector from original repository

* Update docs and configs

* Fix style

* Fix style

* Update docs

* Fix style

* Fix import issues in style

* Fix style

* Add changes from review

* Fix style

* Fix style

* Update docs

* Fix style

* Fix style

* Update docs/source/model_doc/visual_bert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add changes from review

* Remove convert run script

* Add changes from review

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add changes from review

* Add changes from review

* Add visual embedding example in docs

* Fix "copied from" comments

* Add changes from review

* Fix error, style, checkpoints

* Update docs

* Fix integration tests

* Fix style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-02 18:13:08 +05:30
Patrick von Platen
43f46aa7fd [RAG] Fix rag from pretrained question encoder generator behavior (#11962)
* fix_torch_device_generate_test

* remove @

* fix rag from pretrained loading

* add test

* uplaod

* finish
2021-06-02 09:17:14 +01:00
dependabot[bot]
6db3a87de2 Bump urllib3 from 1.25.8 to 1.26.5 in /examples/research_projects/lxmert (#11983)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.25.8 to 1.26.5.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/1.25.8...1.26.5)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-06-02 03:40:20 -04:00
Stas Bekman
4ba203d9d3 [Trainer] add train loss and flops metrics reports (#11980)
* add train loss and flops metrics reports

* consistency

* add train_loss to skip keys

* restore on_train_end call timing
2021-06-01 15:58:31 -07:00
Stas Bekman
7ec596ecda [DeepSpeed] decouple DeepSpeedConfigHF from Trainer (#11966)
* decouple DeepSpeedConfigHF from Trainer

* add LoggingLevel ctx manager; add new test

* cleanup

* add docs

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* implemented suggested renames

* formatter workaround

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-01 13:24:52 -07:00
Alberto Villa
1c3ab3e5d6 Typo in usage example, changed to device instead of torch_device (#11979) 2021-06-01 14:58:49 -04:00
Patrick von Platen
47a98fc4cb ByT5 model (#11971)
* allow tf to use uneven num of layers

* add tokenizer

* finish docs

* finish docs

* Apply suggestions from code review

* include in index

* finish

* Update docs/source/model_doc/byt5.rst

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* apply sylvais suggestions

* make style

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2021-06-01 19:07:37 +01:00
Jeoung-Minju
1eb58b4560 typo correction (#11973)
* typo correction

* type corrections
2021-06-01 12:24:59 -04:00
Stas Bekman
79712e7e7a [deepspeed] docs (#11940)
* deepspeed docs

* cleanup

* cleanup
2021-06-01 09:21:21 -07:00
Lysandre
985d708842 Run the integration tests on schedule tests instead of master tests 2021-06-01 15:58:31 +02:00
Volodymyr Byno
9996558bff Neptune.ai integration (#11937)
An option that turns on neptune.ai logging
--report_to 'neptune'

Additional ENV variables:
	NEPTUNE_PROJECT
	NEPTUNE_API_TOKEN
	NEPTUNE_RUN_NAME (optional)
	NEPTUNE_STOP_TIMEOUT (optional)
2021-06-01 09:40:52 -04:00
Lysandre Debut
ae6ce28f31 Authorize args when instantiating an AutoModel (#11956) 2021-06-01 09:27:54 -04:00
Philip May
fcad801825 Add regression tests for slow sentencepiece tokenizers. (#11737)
* add test_vocab_size for sentencepiece tok.

* add test_get_vocab for sentencepiece tok.

* add test_convert_token_and_id for sentencepiece tok.

* add test_tokenize_and_convert_tokens_to_string for all tok.

* improve test_tokenize_and_convert_tokens_to_string for sp. tok.

* add common tokenizer integration tests
- for albert
- for barthez

* add tokenizer integration tests to bert gen.

* add most tokenizer integration tests

* fix camembert tokenizer integration test

* add tokenizer integration test to marian

* add tokenizer integration test to reformer

* add typing and doc to tokenizer_integration_test_util

* fix tokenizer integration test of reformer

* improve test_sentencepiece_tokenize_and_convert_tokens_to_string

* empty commit to trigger CI

* fix tokenizer integration test of reformer

* remove code not needed anymore

* empty commit to trigger CI

* empty commit to trigger CI
2021-06-01 09:24:39 -04:00
Josh Tanner
c3d958b2c0 reinitialize wandb config for each hyperparameter search run (#11945) 2021-06-01 09:18:33 -04:00
Riccardo Bassani
99dbbdb91e bugfixes training_args.py (#11922)
modified according to:
https://pytorch.org/xla/release/1.8.1/_modules/torch_xla/core/xla_model.html
2021-06-01 09:04:51 -04:00
Fan Zhang
7e73601f32 modify qa-trainer (#11872)
* modify qa-trainer

* fix flax model
2021-06-01 08:28:41 -04:00
Shamane Siri
9ec0f01b6c RAG-2nd2end-revamp (#11893)
* initial

* code quality test

* code quality

* added test functions in test_modeling_rag.py and test_retrieval_rag.py to test end2end retreiver

* minor change in test_modeling_rag

* fixed tests

* Update examples/research_projects/rag-end2end-retriever/README.md

typo corrected as suggested by lhoestq

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

* Update examples/research_projects/rag-end2end-retriever/finetune_rag.py

type change suggested by lhoestq

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

* Update src/transformers/models/rag/retrieval_rag.py

Adding this change as mentioned by lhoestq.

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

* completed the minor changes suggested by the reviewers

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
2021-06-01 07:32:26 +01:00
Suraj Patil
ad25fd62bd Add FlaxCLIP (#11883)
* add flax CLIP

* default input_shape

* add tests

* fix test

* fix name

* fix docs

* fix shapes

* attend at least 1 token

* flax conv to torch conv

* return floats

* fix equivalence tests

* fix import

* return attention_weights and update tests

* fix dosctrings

* address patricks comments

* input_shape arg

* add tests for get_image_features and get_text_features methods

* fix tests
2021-06-01 09:44:31 +05:30
Philip May
cfca638acb Add MT5ForConditionalGeneration as supported arch. to summarization README (#11961)
* Add MT5ForConditionalGeneration as supported arch.

* Update README.md
2021-05-31 21:24:33 +05:30
Nicholas Vadivelu
1ab147d648 Remove redundant nn.log_softmax in run_flax_glue.py (#11920)
* Remove redundant `nn.log_softmax` in `run_flax_glue.py`

`optax.softmax_cross_entropy` expects unnormalized logits, and so it already calls `nn.log_softmax`, so I believe it is not needed here. `nn.log_softmax` is idempotent so mathematically it shouldn't have made a difference.

* Remove unused 'flax.linen' import
2021-05-31 15:29:04 +01:00
Philip May
fb60c309c6 fix assert (#11935) 2021-05-31 04:02:10 -04:00
Lysandre
04a9709c27 Remove datasets submodule 2021-05-31 09:18:49 +02:00
Lysandre Debut
8d171628fe Test optuna and ray (#11924) 2021-05-28 07:52:01 -04:00
Jayendra
af1a10bff4 [Flax] Return Attention from BERT, ELECTRA, RoBERTa and GPT2 (#11918)
* Added logic to return attention from flax-bert model and added test cases to check that

* Added new line at the end of file to test_modeling_flax_common.py

* fixing code style

* Fixing Roberta and Elextra models too from cpoying bert

* Added temporary hack to not run test_attention_outputs for FlaxGPT2

* Returning attention weights from GPT2 and changed the tests accordingly.

* last fixes

* bump flax dependency

Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-05-28 16:16:56 +05:30
Bhadresh Savani
e1205e478a Added Sequence Classification class in GPTNeo (#11906)
* seq classification changes

* fix tests
2021-05-28 06:27:02 -04:00
Nicolas Patry
80d712fac6 Adding new argument max_new_tokens for generate. (#11476)
* Adding new argument `max_new_tokens` for generate.

This is a proposal to add a new argument `max_new_tokens` to `generate`.
This include a `MaxNewTokensCriteria` that enables callers that don't
know about the token length ahead (like pipelines callers) to manage
more easily the length of their generated output.

* Adding a test for the user warning when both`max_length` and
`max_new_tokens` are used together.

* Removed redundant `no_grad`.
2021-05-27 14:22:58 +02:00
Josh Tanner
2dd6fb2585 Update deepspeed config to reflect hyperparameter search parameters (#11896)
* rebuild deepspeed config for hyperparameter search

* reformat code to fix style issues
2021-05-27 07:53:33 -04:00
Patrick von Platen
42fe0dc23e Add Emotion Speech Noteboook (#11900) 2021-05-27 10:46:10 +01:00
Patrick von Platen
996a315e76 Flax Generate (#11777)
* fix_torch_device_generate_test

* remove @

* add

* indexing

* correct a couple of tests

* fix tests

* add logits processor

* finish top_k, top_p, temp

* add docs

* correct flax prng key default

* improve generate

* add generation docs

* add docs

* make style

* revert model outputs change

* make style

* correct typo

* fix tests

* fix slow test

* add raise

* finish generation

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-05-27 00:18:17 +01:00
Avital Oliver
2df546918e Link official Cloud TPU JAX docs (#11892) 2021-05-26 15:44:40 -04:00
joerenner
1530384e5b changing find_batch_size to work with tokenizer outputs (#11890)
* changing find_batch_size to work with tokenizer outputs

trainer_pt_utils.find_batch_size does not recognize the batch size of BatchEncoding objects. This can cause an error when a trainer relies on find_batch_size to report the number of observed examples in the evaluation loop.

* Trigger CI

Co-authored-by: jrenner <joseph.renner@inria.fr>
2021-05-26 11:59:06 -04:00
Patrick von Platen
d5a72b6e19 [Flax] Allow dataclasses to be jitted (#11886)
* fix_torch_device_generate_test

* remove @

* change dataclasses to flax ones

* fix typo

* fix jitted tests

* fix bert & electra
2021-05-26 15:01:13 +01:00
talkhaldi
e6126e1932 Correcting comments in T5Stack to reflect correct tuple order (#11330)
* Correcting comments to reflect correct tuple order

In order to match the actual order (line 513 and 516, and as accessed in 968), I've changed the order mentioned in comments L962 and L966-967.

* Update modeling_t5.py

Updating another comment as well

* Removing extra space

* Fixing style and quality

* style & quality

* Update src/transformers/models/t5/modeling_t5.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-05-26 14:07:23 +01:00
Daniel Stancl
0b93358447 Fix usage of head masks by TF encoder-decoder models' generate() function (#11775)
* Fix Bart

* Fix Blenderbot{,_small}

* Fix LED

* Fix Marian

* Fix MBart

* Fix Pegasus

* Fix T5

* Add test for generation with head_mask

* Add a common TF test

* Override a test for the LED model as head masking is not yet properly implemented

* Remove all head_masks from input preparation for LED

* Drop masking for T5 as it needs a bit of refactor
2021-05-26 14:02:44 +01:00
francescorubbo
0b0a598452 Ensure input tensor are on device. (#11874)
The feature extractor does not create tensors on the appropriate device,
so we call `ensure_tensor_on_device` before feeding the processed inputs
to the model.
2021-05-26 04:19:37 -04:00
Ahmet Akkoç
a9c797f93d [Wav2Vec2ForCTC] example typo fixed (#11878) 2021-05-25 17:06:14 -04:00
Stas Bekman
1b6530104d [Examples] create model with custom config on the fly (#11798)
* create custom model on the flight

* better wording

* add update_from_string

* cleanup

* cleanup

* Update src/transformers/configuration_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* more bool options

* style

* fix logger

* add test

* add the doc

* assert on conflict of options

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-05-25 10:40:49 -07:00
Stas Bekman
6287c929c1 [lm examples] fix overflow in perplexity calc (#11855)
* fix overflow in perplexity calc

* use inf

* fix
2021-05-25 08:11:26 -07:00
Patrick von Platen
7630c11f32 [Wav2Vec2] SpecAugment Fast (#11764)
* first try

* finish
2021-05-25 13:59:52 +01:00
Sylvain Gugger
f086652b16 Add option to log only once in multinode training (#11819)
* Add option to long only once in multinode training

* Use an alternate property
2021-05-25 08:03:43 -04:00
Wang Ran (汪然)
b8344a274f typo (#11858) 2021-05-25 04:23:46 -04:00
Shiro T
f9880f62ad fixed a small typo in the doc (#11856) 2021-05-25 04:18:55 -04:00
Lysandre Debut
6da129cb31 Enable memory metrics in tests that need it (#11859) 2021-05-25 04:06:19 -04:00
Lysandre Debut
db0b2477cc Add some tests to the slow suite #11860 2021-05-25 04:06:06 -04:00
Sylvain Gugger
afe479adb5 [Trainer] Report both steps and num samples per second (#11818)
* [Trainer] Report both steps and num samples per second

* Fix batch number

* Update src/transformers/trainer_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Address review comments

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-05-24 19:51:42 -04:00
Nick Lane-Smith
eaab9397cd Fix two typos in docs (#11852)
* typo2

* fix typo
2021-05-24 14:26:02 -04:00
Teven
8a2a3a25af Fix flos single node (#11844)
* fixing flos bug/typo in non-distributed setting

* storing flos every logging_interval
2021-05-24 20:15:52 +02:00
Sylvain Gugger
adb785b0fe Switch mem metrics flag (#11851)
* Switch mem metrics flag

* Update src/transformers/training_args.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-05-24 13:30:39 -04:00
Sylvain Gugger
fcdb85e9d2 Fix reference to XLNet (#11846) 2021-05-24 09:26:40 -04:00
Patrick von Platen
f580604157 [Flax] Fix PyTorch import error (#11839)
* fix_torch_device_generate_test

* remove @

* change pytorch import to flax import
2021-05-24 10:41:10 +01:00
Lysandre Debut
0cbddfb190 Replace double occurrences as the last step (#11367) 2021-05-24 03:38:59 -04:00
ctheodoris
73fde1defe Faster list concat for trainer_pt_utils.get_length_grouped_indices() (#11825)
get_length_grouped_indices() in LengthGroupedSampler and DistributedLengthGroupedSampler
is prohibitively slow for large number of megabatches (in test case takes hours for ~270k
megabatches with 100 items each) due to slow list concatenation with sum(megabatches, []).

Resolves: #11795

Co-authored-by: ctheodoris <cvtheodo@ds.dfci.harvard.edu>
2021-05-22 10:27:20 -04:00
Patrick von Platen
da22245ed9 Add flax text class colab (#11824)
* fix_torch_device_generate_test

* remove @

* add flax glue link
2021-05-21 23:11:58 +01:00
Stas Bekman
a26f4d6208 [Deepspeed] support zero.Init in from_config (#11805)
* support zero.Init in from_config

* no need for eval test
2021-05-21 09:07:46 -07:00
Patrick von Platen
82335185fe [Flax] Small fixes in run_flax_glue.py (#11820)
* fix_torch_device_generate_test

* remove @

* correct best seed for flax fine-tuning

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-05-21 16:52:23 +01:00
Sylvain Gugger
b8697bc622 Avoid TensorFlow import in Trainer 2021-05-21 09:23:31 -04:00
yujun
e2c1dd0966 fix roformer config doc (#11813) 2021-05-21 08:06:11 -04:00
Lysandre Debut
1b652295c5 Patch recursive import (#11812) 2021-05-21 06:50:01 -04:00
Patrick von Platen
bd9871657b [Flax] Align GLUE training script with mlm training script (#11778)
* speed up flax glue

* remove unnecessary line

* remove folder

* remove run in loop

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-05-21 09:36:56 +01:00
Keren Fuentes
223943872e Fix failing test on Windows Platform (#11589)
* add separator for windows

* fixes test_is_copy_consistent on Windows

* fixing writing encoding issue on extended test (for Windows)

* resolving comments
2021-05-20 19:54:23 -04:00
Michael Benayoun
f4a0d6ff86 A cleaner and more scalable implementation of symbolic tracing (#11763)
Cleaner and more scalable implementation of symbolic tracing with torch.fx, and provides support for new architectures:
- ALBERT
- DistilBERT
- MobileBERT
- MegatronBERT
- GPT2
- GPT Neo

Co-authored-by: Michael Benayoun <michael@huggingface.co>
2021-05-20 18:02:29 +02:00
Sylvain Gugger
469384a777 Fix regression in regression (#11785)
* Fix regression in regression

* Add test
2021-05-20 09:55:13 -04:00
Sylvain Gugger
5ad5cc7198 Fix pattern in conf.py (#11784) 2021-05-20 09:30:31 -04:00
yujun
206f06f2dd Add new model RoFormer (use rotary position embedding ) (#11684)
* add roformer

* Update docs/source/model_doc/roformer.rst

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update docs/source/model_doc/roformer.rst

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* update

* add TFRoFormerSinusoidalPositionalEmbedding and fix TFMarianSinusoidalPositionalEmbedding

* update docs

* make style and make quality

* roback

* unchanged

* rm copies from , this is a error in TFMarianSinusoidalPositionalEmbedding

* update Copyright year

* move # Add modeling imports here to the correct position

* max_position_embeddings can be set to 1536

* # Copied from transformers.models.bert.modeling_bert.BertOutput with Bert->RoFormer

* # Copied from transformers.models.bert.modeling_bert.BertLayer.__init__ with Bert->RoFormer

* update tokenization_roformer

* make style

* add staticmethod apply_rotary_position_embeddings

* add TF staticmethod apply_rotary_position_embeddings

* update torch apply_rotary_position_embeddings

* fix tf apply_rotary_position_embeddings error

* make style

* add pytorch RoFormerSelfAttentionRotaryPositionEmbeddingTest

* add TF rotary_position_embeddings test

* update test_modeling_rofomer

* Update docs/source/model_doc/roformer.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/roformer/convert_roformer_original_tf_checkpoint_to_pytorch.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/roformer/modeling_roformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/roformer/modeling_roformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/roformer/modeling_tf_roformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refact roformer tokenizer

* add RoFormerTokenizerFast

* add RoFormerTokenizationTest

* add require_jieba

* update Copyright

* update tokenizer & add copy from

* add option rotary_value

* use rust jieba

* use rjieba

* use rust jieba

* fix test_alignement_methods

* slice normalized_string is too slow

* add config.embedding_size when embedding_size!=hidden_size

* fix pickle tokenizer

* Update docs/source/model_doc/roformer.rst

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* make style and make quality

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-05-20 08:00:34 -04:00
Lysandre Debut
075fdab4fe Deprecate commands from the transformers-cli that are in the hf-cli (#11779) 2021-05-20 03:16:03 -04:00
Albert Villanova del Moral
2582e59a57 Add DOI badge to README (#11771) 2021-05-19 09:48:56 -04:00
Patrick von Platen
00440e350f [Flax MLM] Refactor run mlm with optax (#11745)
* refactor

* update

* update

* update

* refactor run mlm

* finalize

* refactor more

* fix typo

* update

* finish refactor

* modify run mlm

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* small fixes

* upload

* upload

* finish run mlm script

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-05-19 12:00:58 +01:00
Patrick von Platen
43891be19b [T5 failing CI] Fix generate test (#11770)
* fix_torch_device_generate_test

* remove @
2021-05-19 05:31:17 -04:00
Daniel Stancl
680d181ce8 Fix usage of head masks by PT encoder-decoder models' generate() function (#11621)
* Add missing head masking for generate() function

* Add head_mask, decoder_head_mask and cross_attn_head_mask
into prepare_inputs_for_generation for generate() function
for multiple encoder-decoder models.

* Add test_genereate_with_head_masking

* [WIP] Update the new test and handle special cases

* make style

* Omit ProphetNet test so far

* make fix-copies
2021-05-19 00:44:53 +01:00
Suraj Patil
ca33278fdb FlaxGPT2 (#11556)
* flax gpt2

* combine masks

* handle shared embeds

* add causal LM sample

* style

* add tests

* style

* fix imports, docs, quality

* don't use cache

* add cache

* add cache 1st version

* make use cache work

* start adding test for generation

* finish generation loop compilation

* rewrite test

* finish

* update

* update

* apply sylvains suggestions

* update

* refactor

* fix typo

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-05-18 22:50:51 +01:00
Tomy Hsieh
eb3e072a3b Fix a small error in summarization example (#11762) 2021-05-18 14:38:36 -04:00
Avital Oliver
77f9bd18af Add Flax Examples and Cloud TPU README (#11753)
* Add Flax Examples README

* Apply suggestions from code review

* Update examples/flax/README.md

* add nice table

* fix

* fix

* apply suggestions

* upload

* finish flax readme.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-05-18 17:45:16 +01:00
Philipp Schmid
04e25c6286 add dataset_name to data_args and added accuracy metric (#11760)
* add `dataset_name` to data_args and added accuracy metric

* added documentation for dataset_name

* spelling correction
2021-05-18 16:27:29 +02:00
Vyom Pathak
fd3b12e8c3 Fixed: Better names for nlp variables in pipelines' tests and docs. (#11752)
* Fixed: Better names for nlp variables in pipelines' tests and docs.

* Fixed: Better variable names
2021-05-18 09:47:28 -04:00
Patrick von Platen
cebb96f53a Add more subsections to main doc (#11758)
* add headers to main doc

* Apply suggestions from code review

* update

* upload
2021-05-18 14:38:56 +01:00
Tommy Chiang
da7e73b721 Fix incorrect newline in #11650 (#11757) 2021-05-18 15:28:13 +02:00
Sylvain Gugger
a515caa331 Fix checkpoint deletion (#11748) 2021-05-18 07:42:39 -04:00
Nicolas Patry
b88e0e016d [TokenClassification] Label realignment for subword aggregation (#11680)
* [TokenClassification] Label realignment for subword aggregation

Tentative to replace https://github.com/huggingface/transformers/pull/11622/files

- Added `AggregationStrategy`
- `ignore_subwords` and `grouped_entities` arguments are now fused
  into `aggregation_strategy`. It makes more sense anyway because
  `ignore_subwords=True` with `grouped_entities=False` did not have a
  meaning anyway.
- Added 2 new ways to aggregate which are MAX, and AVERAGE
- AVERAGE requires a bit more information than the others, for now this
case is slightly specific, we should keep that in mind for future
changes.
- Testing has been modified to reflect new argument, and to check the
correct deprecation and the new aggregation_strategy.
- Put the testing argument and testing results for aggregation_strategy,
close together, so that readers can understand what is supposed to
happen.
- `aggregate` is now only tested on a small model as it does not mean
anything to test it globally for all models.
- Previous tests are unchanged in desired output.
- Added a new test case that showcases better the difference between the
  FIRST, MAX and AVERAGE strategies.

* Wrong framework.

* Addressing three issues.

1- Tags might not follow B-, I- convention, so any tag should work now
(assumed as B-TAG)
2- Fixed an issue with average that leads to a substantial code change.
3- The testing suite was not checking for the "index" key for "none"
strategy. This is now fixed.

The issue is that "O" could not be chosen by AVERAGE strategy because
those tokens were filtered out beforehand, so their relative scores were
not counted in the average. Now filtering on
ignore_labels will happen at the very end of the pipeline fixing
that issue.
It's a bit hard to make sure this stays like that because we do
not have a end-to-end test for that behavior

* Formatting.

* Adding formatting to code + cleaner handling of B-, I- tags.

Co-authored-by: Francesco Rubbo <rubbo.francesco@gmail.com>
Co-authored-by: elk-cloner <rezakakhki.rk@gmail.com>

* Typo.

Co-authored-by: Francesco Rubbo <rubbo.francesco@gmail.com>
Co-authored-by: elk-cloner <rezakakhki.rk@gmail.com>
2021-05-18 09:53:20 +02:00
Patrick von Platen
c73e35323d push (#11750) 2021-05-17 19:54:33 +01:00
Sylvain Gugger
936b57158a Use new evaluation loop in TrainerQA (#11746) 2021-05-17 10:10:13 -04:00
Patrick von Platen
73893fc771 [BigBird Pegasus] Make tests faster (#11744)
* improve tests

* remove bogus file

* make style

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-05-17 06:30:53 -04:00
Michael Benayoun
a0531c8a24 fixed shape issue for T5 tracing (#11742)
Co-authored-by: Michael Benayoun <michael@huggingface.co>
2021-05-17 06:17:31 -04:00
Julien Chaumond
0fc56df5fb Add visual + link to Premium Support webpage (#11740)
* Update README.md

* Update index.rst
2021-05-17 05:28:56 -04:00
Julien Chaumond
2f88bd9c4c Remove tapas model card (#11739) 2021-05-17 04:42:37 -04:00
Marc van Zee
726e953d44 Improvements to Flax finetuning script (#11727)
* Add Cloud details to README

* Flax script and readme updates

* Some simplifications of Flax script
2021-05-17 09:26:33 +01:00
Michael Benayoun
86d5fb0b36 Experimental symbolic tracing feature with torch.fx for BERT, ELECTRA and T5 (#11475)
Symbolic tracing feature for BERT, ELECTRA and T5

Co-authored-by: Michael Benayoun <michael@huggingface.co>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-05-14 20:57:30 +02:00
Marc van Zee
94a2348706 Add Cloud details to README (#11706)
* Add Cloud details to README

* Flax script and readme updates
2021-05-14 14:51:25 +01:00
Patrick von Platen
113eaa7575 correct example script (#11726) 2021-05-14 12:02:57 +01:00
Oyvind Tafjord
bd3b599c12 Fix T5 beam search using parallelize (#11717) 2021-05-14 10:44:03 +01:00
Volodymyr Byno
218d552f30 Fix loading the best model on the last stage of training (#11718) 2021-05-13 16:11:12 -04:00
Sylvain Gugger
252082001d Fix v4.6.0 doc 2021-05-13 10:45:28 -04:00
Sylvain Gugger
cbbf49f644 Fix doc deployment 2021-05-13 10:34:14 -04:00
lexhuismans
91cf29153b [T5] Add 3D attention mask to T5 model (2) (#9643) (#11197)
* Add 3D attention mask to T5 model (#9643)

Added code for 3D attention mask in T5 model. Similar to BERT model.

* Add test for 3D attention mask

Added test for 3D attention mask: test_decoder_model_past_with_3d_attn_mask()
3D attention mask of the shape [Batch_size, Seq_length, Seq_length] both for
attention mask and decoder attention mask. Test is passing.
2021-05-13 12:02:27 +01:00
Vasudev Gupta
6ee1a4fd3e add everything (#11651) 2021-05-13 11:51:30 +01:00
Patrick von Platen
57b6a80de8 [Flax] Fix BERT initialization & token_type_ids default (#11695)
* fix some stuff

* fix roberta & electra as well

* del run bug

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-05-13 10:58:19 +01:00
Lysandre Debut
daf0d6a97b Fix gpt-2 warnings (#11709) 2021-05-13 03:35:44 -04:00
Philip May
37ed3ab719 Enable option for subword regularization in more tokenizers. (#11417)
* improve slow class tok usage at xlm rob

* add subword regularization for barthez

* improve barthez tok. test

* fix tokenizer tests

* add subword regularization for camembert

* add subword regularization for deberta v2 tokenizer

* add more doc to deberta v2 tokenizer

* add subword regularization for speech to text tok.

* fix sp_model_kwargs type in speech 2 text tok.

* add subword regularization for M2M100 tok.

* add more concrete type hints

* fix tests for m2m100 and s2t tok.

* add missing Any import

* fix syntax error in m2m100 tok.

* fix unpickle of m2m100 and s2t tok.

* fix test of m2m100 and s2t tok.

* improve unpickle of deberta v2 tok.

* add test for pickle of barthez & camembert

* fix pickle of barthez & camembert

* add test for deberta v2 tok. pickle

* fix m2m100 tok. pickle

* fix s2t tok. pickle

* add subword regularization to albert tok.

* refactor subword reg. test into TokenizerTesterMixin

improve albert tok. test

remove sample argument form albert tok.

check subword reg. using TokenizerTesterMixin

improve tok. tests

improve xlm roberta tok. tests

improve xlm roberta tok. tests

* add subword regularization for big bird t.

* improve xlm roberta tok. test

* add subword regularization for mbart50 tok.

* add subword regularization for pegasus tok.

* add subword regularization for reformer tok.

* add subword regularization for T5 tok.

* fix t5 tok. test formatting

* add subword regularization for xlm_proph. tok.

* add subword regularization for xlnet tok.

* add subword regularization for gert_gen tok.

* add typing to tokenizers

* add typing to xlm rob. tok

* add subword regularization for marian tok.

* add reverse tok. test

* fix marian tok test

* fix marian tok test

* fix casing in tok. tests

* fix style of tok. common test

* fix deberta v2 tok test

* add type annotations to tok. tests

* add type annotations to tok. __init__

* add typing to kokenizer

* add type annotations to tok. __init__

* don't specify the default when it's None

* fix barthez tok. doc

* move sentencepiece tok. tests to TokenizerTesterMixin

* fix unused imports

* fix albert tok. test

* add comment to sentencepiece test options

* fix Any import at big bird tok.

* fix Any import at xlm prophetnet tok.

* empty commit to trigger CI
2021-05-13 02:44:55 -04:00
NielsRogge
fa84540e98 Vit deit fixes (#11309)
* Improve docs of DeiT and ViT, add community notebook

* Add gitignore for test_samples

* Add notebook with Trainer

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-05-12 11:46:02 -04:00
Lysandre
d77eb0cf92 Docs for v4.7.0.dev0 2021-05-12 17:08:35 +02:00
Lysandre
64e78564a5 Release: v4.6.0
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
2021-05-12 17:03:03 +02:00
Patrick von Platen
fd6204b2a7 [Lazy init] Force fall back to slow init for composite models (#11705)
* fix encoder-decoder & RAG

* finalize

* Update src/transformers/models/encoder_decoder/modeling_encoder_decoder.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/rag/modeling_rag.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-05-12 10:52:54 -04:00
Suraj Patil
5c1cda9d3c fix example in config doc (#11696) 2021-05-12 09:48:52 -04:00
Philip May
77f4c46b50 remove defaults to None if optional (#11703) 2021-05-12 09:11:10 -04:00
Marc van Zee
6797cdc077 Updates README and fixes bug (#11701) 2021-05-12 13:52:52 +01:00
Suraj Patil
f063c56d94 Fix clip docs (#11694)
* fix doc url

* fix example
2021-05-12 15:28:30 +05:30
Suraj Patil
8719afa1ad CLIP (#11445)
* begin second draft

* fix import, style

* add loss

* fix embeds, logits_scale, and projection

* fix imports

* add conversion script

* add feature_extractor and processor

* style

* add tests for tokenizer, extractor and processor

* add vision model tests

* add weight init

* add more tests

* fix save_load  test

* model output, dosstrings, causal mask

* config doc

* add clip model tests

* return dict

* bigin integration test

* add integration tests

* fix-copies

* fix init

* Clip => CLIP

* fix module name

* docs

* fix doc

* output_dim => projection_dim

* fix checkpoint names

* remoe fast tokenizer file

* fix conversion script

* fix tests, quality

* put causal mask on device

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix attribute test

* style

* address sylvains comments

* style

* fix docstrings

* add qucik_gelu in activations, docstrings

* clean-up attention test

* fix act fun

* fix config

* fix torchscript tests

* even batch_size

* remove comment

* fix ouput tu_tuple

* fix save load tests

* fix add tokens test

* add fast tokenizer

* update copyright

* new processor API

* fix docs

* docstrings

* docs

* fix doc

* fix doc

* fix tokenizer

* fix import in doc example

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* check types of config

* valhalla => openai

* load image using url

* fix test

* typo

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-05-12 13:48:15 +05:30
Marc van Zee
4ce6bcc310 Adds Flax BERT finetuning example on GLUE (#11564)
* Adds Flax BERT finetuning example

* fix traced jax tensor type

* Use Optax losses and learning schedulers

* Add 1GPU training results

* merge into master & make style

* fix input

* del file

* Fix bug in loss and add torch runs

* finish bert flax fine-tune

* Update examples/flax/text-classification/README.md

* Update examples/flax/text-classification/run_flax_glue.py

* add requirements

* finalize

* finalize

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-05-11 19:02:59 +01:00
Sylvain Gugger
f13f1f8fb8 Test checkpointing (#11682)
* Add test and see where CI is unhappy

* Load with strict=False
2021-05-11 12:02:48 -04:00
Julien Plu
d9b286272c Fix TF Roberta for mixed precision training (#11675) 2021-05-11 12:01:03 -04:00
Sylvain Gugger
a135f59536 Auto modelcard (#11599)
* Autogenerate model cards from the Trainer

* ModelCard deprecated

* Fix test

* Style

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments

* Quality

* With all metadata

* Metadata

* Post-merge conflict mess

* Data args and all examples

* Default license and languages when possible

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-05-11 11:30:34 -04:00
Matt
b3429ab678 Grammar and style edits for the frontpage README (#11679)
* Grammar and style edits for the frontpage README

* Going all-in on em-dashes because you only live once

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-05-11 15:49:34 +01:00
nxznm
901153c61e Fix docstring of description about input_ids (#11672) 2021-05-11 08:12:02 -04:00
Jonathan Chang
64232bc0df Add --text_column to run_summarization_no_trainer (#11673) 2021-05-11 07:58:38 -04:00
Julien Plu
024cd19bb7 Add MacOS TF version (#11674)
Co-authored-by: Julien Plu <jplu@argos.local>
2021-05-11 05:42:21 -04:00
Pavel Soriano
9120ae7d66 Fixes NoneType exception when topk is larger than one coupled with a small context in the Question-Answering pipeline (#11628)
* added fix to decode function. added test to qa pipeline tests

* completed topk docstring

* fixed formatting with black

* applied style_doc to fix line length
2021-05-10 13:28:10 -04:00
Patrick von Platen
dcb0e61430 push (#11667) 2021-05-10 17:38:17 +01:00
Sylvain Gugger
05a930671f Save scaler state dict when checkpointing (#11663) 2021-05-10 10:58:30 -04:00
Matt
ef8d32c5ea Fix suggested by @bhadreshpsavani (#11660) 2021-05-10 14:28:04 +01:00
Vasudev Gupta
575c979144 Update community.md (#11654) 2021-05-10 09:48:21 +01:00
Tanmay Laud
f7f872955d Big Bird Fast Tokenizer implementation (#11075)
* Added Big Bird Fast Tokenizer initial file

* style fixes

* flake fixes

* Added big bird fast tokenizer to init files

* Added big bird fast to Auto tokenization

* fix styles

* minor quality fixes

* Added initial test code

* Fix SpmConverter when precompiled_charsmap doesn't exist

* fixed post processor

* minor style fix

* minor fix input names

* Actually fix identity normalization

* style

* Added token type ids to fast tokenizer

* style

* flake fix

* fix copies

Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
2021-05-10 03:01:23 -04:00
Bhavitvya Malik
80da304a0f updated user permissions based on umask (#11119)
* updated user permissions based on umask

* updated user permissions based on umask

* changes as per suggestions

* minor changes
2021-05-10 02:45:29 -04:00
Quentin Lhoest
1a0b41781d Update requirements.txt (#11634) 2021-05-10 11:19:52 +05:30
NielsRogge
f785c51692 Update code example (#11631)
* Update code example

* Code review
2021-05-10 11:18:43 +05:30
Tommy Chiang
7e406f4a65 [Examples] Fix invalid links after reorg (#11650) 2021-05-10 11:16:48 +05:30
Tommy Chiang
f2ffcaf49f [Examples] Check key exists in datasets first (#11503) 2021-05-09 15:42:38 -04:00
Stas Bekman
ba0d50f214 [examples] fix sys.path in conftest.py (#11636)
* restore conftest.py

* fix conftest and make copies

* remove unneeded parts

* remove unwanted files
2021-05-07 14:44:22 -07:00
Stas Bekman
cd9b8d7efe [self-push CI] sync with self-scheduled (#11637)
forgot to add the missing `libaio-dev` to this workflow
2021-05-07 14:06:33 -07:00
Lysandre Debut
da37eb8e43 Reduce to 1 worker and set timeout for GPU TF tests (#11633) 2021-05-07 11:55:20 -04:00
Lysandre Debut
39084ca663 Add the ImageClassificationPipeline (#11598)
* Add the ImageClassificationPipeline

* Code review

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>

* Have `load_image` at the module level

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2021-05-07 08:08:40 -04:00
Patrick von Platen
e7bff0aabe make fix copy (#11627) 2021-05-07 07:48:51 -04:00
Vasudev Gupta
dc3f6758cf Add BigBirdPegasus (#10991)
* init bigbird pegasus

* add debugging nb ; update config

* init conversion

* update conversion script

* complete conversion script

* init forward()

* complete forward()

* add tokenizer

* add some slow tests

* commit current

* fix copies

* add docs

* add conversion script for bigbird-roberta-summarization

* remove TODO

* small fixups

* correct tokenizer

* add bigbird core for now

* fix config

* fix more

* revert pegasus-tokenizer back

* make style

* everything working for pubmed; yayygit status

* complete tests finally

* remove bigbird pegasus tok

* correct tokenizer

* correct tests

* add tokenizer files

* finish make style

* fix test

* update

* make style

* fix tok utils base file

* make fix-copies

* clean a bit

* small update

* fix some suggestions

* add to readme

* fix a bit, clean tests

* fix more tests

* Update src/transformers/__init__.py

* Update src/transformers/__init__.py

* make fix-copies

* complete attn switching, auto-padding left

* make style

* fix auto-padding test

* make style

* fix batched attention tests

* put tolerance at 1e-1 for stand-alone decoder test

* fix docs

* fix tests

* correct slow tokenizer conversion

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* complete remaining suggestions

* fix test

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-05-07 09:27:43 +02:00
Jonathan Chang
6f40e31766 Fix comment in run_clm_no_trainer.py (#11624) 2021-05-07 12:32:30 +05:30
Sylvain Gugger
33fd83bc01 Fix RNG saves in distributed mode. (#11620)
* Fix RNG saves in distributed mode.

* Update src/transformers/trainer.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-05-06 17:14:12 -04:00
Stas Bekman
619200cc42 [cuda ext tests] fixing tests (#11619)
* fixing tests

* cleanup
2021-05-06 13:35:28 -07:00
Patrick von Platen
44c5621db0 fix tests (#11615) 2021-05-06 20:42:51 +02:00
Sylvain Gugger
7eee950ac3 Re-styling in seq2seq attention (#11613) 2021-05-06 14:24:19 -04:00
Eldar Kurtic
cf409e5594 Fix docstring typo (#11611) 2021-05-06 17:09:28 +05:30
Vipul Raheja
f594090a93 fix typo in command (#11605) 2021-05-06 12:32:54 +05:30
Lysandre Debut
079557c1c5 Fix Python version (#11607) 2021-05-06 02:50:11 -04:00
baeseongsu
c1780ce7a4 fix head_mask for albert encoder part(AlbertTransformer) (#11596)
* fix head mask for albert encoder part

* fix head_mask for albert encoder part
2021-05-06 02:18:02 -04:00
Mats Sjöberg
864c1dfe34 Accept tensorflow-rocm package when checking TF availability (#11595) 2021-05-05 14:44:29 -04:00
Patrick von Platen
3e3e41ae20 Pytorch - Lazy initialization of models (#11471)
* lazy_init_weights

* remove ipdb

* save int

* add necessary code

* remove unnecessary utils

* Update src/transformers/models/t5/modeling_t5.py

* clean

* add tests

* correct

* finish tests

* finish tests

* fix some more tests

* fix xlnet & transfo-xl

* fix more tests

* make sure tests are independent

* fix tests more

* finist tests

* final touches

* Update src/transformers/modeling_utils.py

* Apply suggestions from code review

* Update src/transformers/modeling_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* clean tests

* give arg positive name

* add more mock weights to xlnet

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-05-05 17:22:20 +02:00
Lysandre
8fa8e19429 Skip Funnel test 2021-05-05 12:38:01 +02:00
Deepali
83e59d8e0b add importlib_metadata and huggingface_hub as dependency in the conda recipe (#11591)
* add importlib_metadata as dependency (#11490)

Co-authored-by: Deepali Chourasia <deepch23@us.ibm.com>

* add huggingface_hub dependency

Co-authored-by: Deepali Chourasia <deepch23@us.ibm.com>
2021-05-05 03:36:18 -04:00
Stas Bekman
bf0dfa98d3 copies need to be fixed too (#11585) 2021-05-05 03:35:15 -04:00
Stas Bekman
c065025c47 [trainer] document resume randomness (#11588)
* document resume randomness

* fix link

* reword

* fix

* reword

* style
2021-05-04 14:17:11 -07:00
Sylvain Gugger
6b241e0e3b Reproducible checkpoint (#11582)
* Set generator in dataloader

* Use generator in all random samplers

* Checkpoint all RNG states

* Final version

* Quality

* Test

* Address review comments

* Quality

* Remove debug util

* Add python and numpy RNGs

* Split states in different files in distributed

* Quality

* local_rank for TPUs

* Only use generator when accepted

* Add test

* Set seed to avoid flakiness

* Make test less flaky

* Quality
2021-05-04 16:20:56 -04:00
Patrick Fernandes
0afe4a90f9 [Flax] Add Electra models (#11426)
* add electra model to flax

* Remove Electra Next Sentence Prediction model added by mistake

* fix parameter sharing and loosen equality threshold

* fix styling issues

* add mistaken removen imports

* fix electra table

* Add FlaxElectra to automodels and fixe docs

* fix issues pointed out the PR

* fix flax electra to comply with latest changes

* remove stale class

* add copied from

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-05-04 20:56:09 +02:00
Philipp Schmid
226e74b610 Removes SageMakerTrainer code but keeps class as wrapper (#11587)
* removed all old code

* make quality
2021-05-04 14:31:18 -04:00
Patrick von Platen
084a187da3 [FlaxRoberta] Add FlaxRobertaModels & adapt run_mlm_flax.py (#11470)
* add flax roberta

* make style

* correct initialiazation

* modify model to save weights

* fix copied from

* fix copied from

* correct some more code

* add more roberta models

* Apply suggestions from code review

* merge from master

* finish

* finish docs

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-05-04 19:57:59 +02:00
Sylvain Gugger
2ce0fb84cc Make quality scripts work when one backend is missing. (#11573)
* Make quality scripts work when one backend is missing.

* Check env variable is properly set

* Add default

* With print statements

* Fix typo

* Set env variable

* Remove debug code
2021-05-04 09:53:44 -04:00
Lysandre Debut
09b0bcfea9 Enable added tokens (#11325)
* Fix tests

* Reorganize

* Update tests/test_modeling_mobilebert.py

* Remove unnecessary addition
2021-05-04 08:13:57 -04:00
abhishek thakur
c40c7e213b Add multi-class, multi-label and regression to transformers (#11012)
* add to  bert

* review comments

* Update src/transformers/configuration_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/configuration_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* self.config.problem_type

* fix style

* fix

* fin

* fix

* update doc

* fix

* test

* Test more problem types

* Update src/transformers/configuration_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix

* remove

* fix

* quality

* make fix-copies

* remove test

Co-authored-by: abhishek thakur <abhishekkrthakur@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-05-04 02:23:40 -04:00
Stas Bekman
7c622482e8 fix resize_token_embeddings (#11572) 2021-05-03 13:12:06 -07:00
Sylvain Gugger
fe82b1bfa0 Update training tutorial (#11533)
* Update training tutorial

* Apply suggestions from code review

Co-authored-by: Hamel Husain <hamelsmu@github.com>

* Address review comments

* Update docs/source/training.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* More review comments

* Last review comments

Co-authored-by: Hamel Husain <hamelsmu@github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-05-03 13:18:46 -04:00
Sylvain Gugger
f4c9a7e62e Accumulate opt state dict on do_rank 0 (#11481) 2021-05-03 13:18:27 -04:00
Nicolas Patry
1e8e06862f Fixes a useless warning. (#11566)
Fixes #11525
2021-05-03 18:48:13 +02:00
Sylvain Gugger
87dd1a00ef Fix metric computation in run_glue_no_trainer (#11569) 2021-05-03 11:42:55 -04:00
Muktan
a721a5eefd [Wav2vec2] Fixed tokenization mistakes while adding single-char tokens to tokenizer (#11538)
* Fixed tokenization mistakes while adding single-char tokens to tokenizer

* Added tests and Removed unnecessary comments.

* finalize wav2vec2 tok

* add more aggressive tests

* Apply suggestions from code review

* fix useless import

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-05-03 17:19:12 +02:00
NielsRogge
f3cf8ae7b3 Add LUKE (#11223)
* Rebase with master

* Minor bug fix in docs

* Copy files from adding_luke_v2 and improve docs

* change the default value of use_entity_aware_attention to True

* remove word_hidden_states

* fix head models

* fix tests

* fix the conversion script

* add integration tests for the pretrained large model

* improve docstring

* Improve docs, make style

* fix _init_weights for pytorch 1.8

* improve docs

* fix tokenizer to construct entity sequence with [MASK] entity when entities=None

* Make fix-copies

* Make style & quality

* Bug fixes

* Add LukeTokenizer to init

* Address most comments by @patil-suraj and @LysandreJik

* rename _compute_extended_attention_mask to get_extended_attention_mask

* add comments to LukeSelfAttention

* fix the documentation of the tokenizer

* address comments by @patil-suraj, @LysandreJik, and @sgugger

* improve docs

* Make style, quality and fix-copies

* Improve docs

* fix docs

* add "entity_span_classification" task

* update example code for LukeForEntitySpanClassification

* improve docs

* improve docs

* improve the code example in luke.rst

* rename the classification layer in LukeForEntityClassification from typing to classifier

* add bias to the classifier in LukeForEntitySpanClassification

* update docs to use fine-tuned hub models in code examples of the head models

* update the example sentences

* Make style & quality

* Add require_torch to tokenizer tests

* Add require_torch to tokenizer tests

* Address comments by @sgugger and add community notebooks

* Make fix-copies

Co-authored-by: Ikuya Yamada <ikuya@ikuya.net>
2021-05-03 09:07:29 -04:00
Frederik Bode
6a11e4c2ad fix the mlm longformer example by changing [MASK] to <mask> (#11559) 2021-05-03 12:43:30 +01:00
Lysandre Debut
1c86157d9d Remove datasets submodule. (#11563) 2021-05-03 06:02:33 -04:00
Patrick von Platen
c448c01f25 [Wav2Vec2] Fix convert (#11562)
* push

* small change

* correct other typo
2021-05-03 11:53:30 +02:00
Suraj Patil
623281aa12 [Flax BERT/Roberta] few small fixes (#11558)
* small fixes

* style
2021-05-03 10:35:06 +02:00
lewtun
a5d2967bd8 Fix examples in M2M100 docstrings (#11540)
Replaces `tok` with `tokenizer` so examples can run with copy-paste
2021-05-03 10:56:31 +05:30
jingyihe
980208650a Fixed docs for the shape of scores in generate() (#10057)
* Fixed the doc for the shape of return scores tuples in generation_utils.py.

* Fix the output shape of `scores` for `DecoderOnlyOutput`.

* style fix
2021-05-02 10:10:47 +02:00
Stas Bekman
4e7bf94e72 [DeepSpeed] fp32 support (#11499)
* prep for deepspeed==0.3.16

* new version

* too soon

* support and test fp32 mode

* troubleshooting doc start

* workaround no longer needed

* add fp32 doc

* style

* cleanup, add tf32 note

* clarify

* release was made
2021-04-30 12:51:48 -07:00
Stas Bekman
282f3ac3ef [debug utils] activation/weights underflow/overflow detector (#11274)
* sync

* add activation overflow debug utility

* cleanup

* document detect_overflow

* import torch

* add deprecation warning

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* convert to rst, add note

* add class

* fix docs

* improve the doc

* rework to dump a lot more info about each frame

* complete expansion

* cleanup

* format

* cleanup

* doesn't have to be transformers

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* wrap long line

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-30 11:15:46 -07:00
Hamel Husain
804c2974d5 Improve task summary docs (#11513)
* fix task summary docs

* refactor to use model.config.id2label instead of list

* fix nit

* Update docs/source/task_summary.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-30 09:06:47 -04:00
Sylvain Gugger
bc80f8bc37 Add Stas and Suraj as authors (#11526) 2021-04-30 09:03:13 -04:00
Bhadresh Savani
84326a28f8 [Examples] Added support for test-file in QA examples with no trainer (#11510)
* added support for test-file

* fixed typo

* added suggested changes

* reformatted code

* modifed files

* fix post processing error

* Trigger CI

* removed extra lines
2021-04-30 09:02:50 -04:00
Lysandre Debut
af0692a2ca Run model templates on master (#11527) 2021-04-30 08:47:12 -04:00
Suraj Patil
57c8e822f7 reszie token embeds (#11524) 2021-04-30 08:47:01 -04:00
Matt
20d6931e32 Update TF text classification example (#11496)
Big refactor, fixes and multi-GPU/TPU support
2021-04-30 13:45:33 +01:00
bonniehyeon
8b945ef03e Fix do_eval default value in training_args.py (#11511)
* Fix do_eval default value in training_args.py

* Update PULL_REQUEST_TEMPLATE.md
2021-04-30 08:35:12 -04:00
Takuya Makino
c2cd02ac62 Accepts BatchEncoding in LengthSampler (#11431) 2021-04-30 08:27:46 -04:00
Shubham Sanghavi
30ede8994e Implement Fast Tokenization for Deberta (#11387) 2021-04-30 08:08:15 -04:00
Nicolas Patry
db9dd09cf9 Adding AutomaticSpeechRecognitionPipeline. (#11337)
* Adding `AutomaticSpeechRecognitionPipeline`.

- Because we added everything to enable this pipeline, we probably
should add it to `transformers`.
- This PR tries to limit the scope and focuses only on the pipeline part
(what should go in, and out).
- The tests are very specific for S2T and Wav2vec2 to make sure both
architectures are supported by the pipeline. We don't use the mixin for
tests right now, because that requires more work in the `pipeline`
function (will be done in a follow up PR).
- Unsure about the "helper" function `ffmpeg_read`. It makes a lot of
  sense from a user perspective, it does not add any additional
dependencies (as in hard dependency, because users can always use their
own load mechanism). Meanwhile, it feels slightly clunky to have so much
optional preprocessing.
- The pipeline is not done to support streaming audio right now.

Future work:

- Add `automatic-speech-recognition` as a `task`. And add the
FeatureExtractor.from_pretrained within `pipeline` function.
- Add small models within tests
- Add the Mixin to tests.
- Make the logic between ForCTC vs ForConditionalGeneration better.

* Update tests/test_pipelines_automatic_speech_recognition.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Adding docs + main import + type checking + LICENSE.

* Doc style !.

* Fixing TYPE_HINT.

* Specifying waveform shape in the docs.

* Adding asserts + specify in the documentation the shape of the input
np.ndarray.

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Adding require to tests + move the `feature_extractor` doc.

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-04-30 11:54:08 +02:00
CeShine Lee
76116f479b T5 Gradient Checkpointing (#11353)
* Implement gradient checkpoinging for T5Stack

* A bit more robust type checking

* Add `gradient_checkpointing` to T5Config

* Formatting

* Set requires_grad only when training

* None return value will only cause problems when training

* Change the output tuple according to `use_cache`

* Enable gradient checkpointing for the decoder

Squashed commit of the following:

commit 658bdd0bd1215353a8770f558bda2ea69a0ad0c7
Author: Ceshine Lee <shuanck@gmail.com>
Date:   Sat Apr 24 14:08:17 2021 +0800

    Only set `require_grad` for gradient checkpointing

commit acaeee6b2e675045fb28ce2176444c1d63e908bd
Author: Ceshine Lee <shuanck@gmail.com>
Date:   Sat Apr 24 13:59:35 2021 +0800

    Make gradient checkpointing work with the decoder

* Formatting
2021-04-30 14:13:55 +05:30
Manuel Romero
58c789e3d2 Update README.md (#11489)
Add link to code
2021-04-30 04:29:59 -04:00
Patrick von Platen
022a1e9e67 make style (#11520) 2021-04-30 09:54:58 +02:00
Philip May
e0db8276a6 add sp_model_kwargs to unpickle of xlm roberta tok (#11430)
add test for pickle

simplify test

fix test code style

add missing pickle import

fix test

fix test

fix test
2021-04-30 03:44:58 -04:00
Frederik Bode
b43e3f93ac correct the dimension comment of matrix multiplication (#11494)
Co-authored-by: Frederik Bode <frederik@paperbox.ai>
2021-04-30 09:42:13 +02:00
Lysandre Debut
f37f2adb68 Pin HuggingFace Hub dependency (#11502) 2021-04-30 02:57:50 -04:00
Lysandre
60d5bda4fd Patch notification service 2021-04-30 08:56:18 +02:00
Sylvain Gugger
b29eb247d3 Split checkpoint from model_name_or_path in examples (#11492)
* Split checkpoint from model_name_or_path in examples

* Address review comments

* Address review comments
2021-04-29 18:33:47 -04:00
Michael Benayoun
d6ec54ba36 solved coefficient issue for the TF version of gelu_fast (#11514)
Co-authored-by: Michael Benayoun <michael@huggingface.co>
2021-04-29 21:47:26 +02:00
Sylvain Gugger
ad1f7bef13 Reformat to make code clearer in tokenizer call (#11497)
* Reformat to make code clearer

* Reformat to make code clearer
2021-04-29 07:51:09 -04:00
Patrick von Platen
f748bd4242 [Flax] Add docstrings & model outputs (#11498)
* add attentions & hidden states

* add model outputs + docs

* finish docs

* finish tests

* finish impl

* del @

* finish

* finish

* correct test

* apply sylvains suggestions

* Update src/transformers/models/bert/modeling_flax_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* simplify more

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-29 12:04:51 +02:00
Hamel Husain
3f6add8bab fix #1149 (#11493) 2021-04-28 11:16:41 -04:00
Hamel Husain
c0eb218a55 Update PreTrainedTokenizerBase to check/handle batch length for text_pair parameter (#11486)
* Update tokenization_utils_base.py

* add assertion

* check batch len

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add error message

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-28 10:11:17 -04:00
Sylvain Gugger
2d27900b5d Update min versions in README and add Flax (#11472)
* Update min versions in README and add Flax

* Adapt index
2021-04-28 09:10:06 -04:00
Suraj Patil
8d43c71a1c fix docs for decoder_input_ids (#11466)
* fix docs for decoder_input_ids

* revert the changes for bart and mbart
2021-04-27 19:36:36 +05:30
Hamel Husain
7ceff67e1a Finish Making Quick Tour respect the model object (#11467)
* finish quicktour

* fix import

* fix print

* explain config default better

* Update docs/source/quicktour.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-27 10:04:12 -04:00
Hamel Husain
88ac60f7b5 update QuickTour docs to reflect model output object (#11462)
* update docs to reflect model output object

* run make style`
2021-04-26 22:18:37 -04:00
Ashwin Geet D'Sa
741d48f5c7 Remove max length beam scorer (#11378)
* removed max_len

* removed max_length from BeamSearchScorer

* correct max length

* finish

* del vim

* finish & add test

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-04-27 00:28:40 +02:00
Stas Bekman
bc2571e61c [Deepspeed] ZeRO-Infinity integration plus config revamp (#11418)
* adding Z-inf

* revamp config process

* up version requirement

* wip

* massive rewrite

* cleanup

* cleanup

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* consistent json commas

* act on suggestions

* leave this feature for 0.3.16

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-26 10:40:32 -07:00
Jaimeen Ahn
0661abc545 Variable Correction for Consistency in Distillation Example (#11444)
As the error comes from the inconsistency of variable meaning number of gpus in parser and its actual usage in the train.py script, 'gpus' and 'n_gpu' respectively,  the correction makes the example work
2021-04-26 13:30:48 -04:00
Bhadresh Savani
1d30ec95c7 [Examples] Fixes inconsistency around eval vs val and predict vs test (#11380)
* added changes for uniformity

* modified files

* corrected typo

* fixed qa scripts

* fix typos

* fixed predict typo in qa no trainer

* fixed test file

* reverted trainer changes

* reverted trainer changes in custom exmaples

* updated readme

* added changes in deepspeed test

* added changes for predict and eval
2021-04-26 09:24:31 -07:00
Sylvain Gugger
7959d83599 Give each test a different repo name (#11453) 2021-04-26 11:52:23 -04:00
Sylvain Gugger
b03b2a653d Style 2021-04-26 11:45:04 -04:00
Stas Bekman
ce11318e7e make sure to test against the local checkout (#11437) 2021-04-26 08:42:43 -07:00
Stas Bekman
a753cafdc0 [docs] fix invalid class name (#11438)
* fix invalid class name

* proper ref

* proper ref
2021-04-26 08:37:32 -07:00
Kostas Stathoulopoulos
6715e3b6a1 Clarify description of the is_split_into_words argument (#11449)
* Improve documentation for is_split_into_words argument

* Change description wording
2021-04-26 11:29:36 -04:00
Sylvain Gugger
ab2cabb964 Pass along seed to DistributedSampler (#11406)
* Pass along seed to DistributedSampler

* Add seed to DistributedLengthGroupedSampler
2021-04-26 10:26:52 -04:00
LSinev
b24ead87e1 fix some typos in docs, comments, logging/errors (#11432) 2021-04-26 09:14:25 -04:00
Amine Abdaoui
e3e70f9551 docs(examples): fix link to TPU launcher script (#11427) 2021-04-26 09:08:43 -04:00
Sylvain Gugger
d7633a4e46 Add basic support for FP16 in SageMaker model parallelism (#11407)
* Add FP16 support for SageMaker MP

* Add print debugs

* Squeeze

* Remove debug statements

* Add defensive check

* Typo
2021-04-26 08:55:14 -04:00
Daniel Stancl
38a716cd41 TF BART models - Add cross_attentions to model output and fix cross-attention head masking (#10699)
* Add cross_attn_head_mask to BART

* Fix cross_attentions in TFBart-like models

* This commit enables returning of `cross_attentions`
for TFBart-like models

* It also fixes attention head masking in cross-attenion module

* Update TF model templates

* Fix missing , in TF model templates

* Fix typo: congig -> config
2021-04-26 14:16:21 +02:00
Sylvain Gugger
4bd6b54fa4 Pin black to 21.4b0 2021-04-26 08:12:54 -04:00
Sylvain Gugger
c1625b3261 With style 2021-04-26 08:07:29 -04:00
Sylvain Gugger
4b72cfd958 Pin black to 20.8.b1 2021-04-26 08:06:50 -04:00
Patrick von Platen
32dbb2d954 make style (#11442) 2021-04-26 13:50:34 +02:00
Vasudev Gupta
04ab2ca639 add pooling layer support (#11439) 2021-04-26 09:05:53 +02:00
abiolaTresor
30f065890e updating the checkpoint for GPT2ForSequence Classification to one with classification head (#11434) 2021-04-26 10:28:51 +05:30
cronoik
35cd8eed88 EncoderDecoderConfigs should not create new objects (#11300)
* removes the creation of separate config objects and uses the existing ones instead+overwrite resize_token_embeddings from parent class because it is not working for the EncoderDecoderModel

* rollback to current version of the huggingface master branch

* reworked version that ties the encoder and decoder config of the parent encoderdecoder instance

* overwrite of resize_token_embeddings throws an error now

* review comment suggestion

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* implemented warning in case encoderdecoder is created with differing configs of encoderdecoderconfig and decoderconfig or encoderconfig

* added test to avoid diverging configs of wrapper class and wrapped classes

* Update src/transformers/models/encoder_decoder/modeling_encoder_decoder.py

* make style

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-04-25 11:45:46 +02:00
Daniel Stancl
f45cb66bf6 Add head_mask, decoder_head_mask, cross_head_mask to ProphetNet (#9964)
* Add head_mask & decoder_head_mask + some corrections

* Fix head masking for N-grams

* Enable test_headmasking for encoder and decod

* Fix one typo regarding in modeling_propgetnet.py

* Enable test_headmasking for ProphetNetStandaloneDecoderModelTest
and ProphetNetStandaloneEncoderModelTest in test_modeling_prophetnet.py

* make style

* Fix cross_head_mask

* Fix attention head mask naming

* `cross_head_mask` -> `cross_attn_head_mask`

* `cross_layer_head_mask` -> `cross_attn_layer_head_mask`

* Still need to merge #10605 to master to pass the tests
2021-04-25 11:06:16 +02:00
Sylvain Gugger
52166f672e Style 2021-04-23 20:40:17 -04:00
cronoik
9cac4fab07 documentation linked to the parent class PreTrainedTokenizerFast but it should be the slow tokenizer (#11410) 2021-04-23 20:19:15 -04:00
Sylvain Gugger
b7fc043fce Merge branch 'master' of github.com:huggingface/transformers 2021-04-23 18:47:55 -04:00
Sylvain Gugger
81a6c7cd39 Use 3 workers for torch tests 2021-04-23 18:47:46 -04:00
Philip May
195bfd118a Enable option for subword regularization in XLMRobertaTokenizer (#11149)
* enable subword regularization.

* fix tokenizer storage

* fix docstring formatting

* Update src/transformers/models/xlm_roberta/tokenization_xlm_roberta.py

Co-authored-by: Stefan Schweter <stefan@schweter.it>

* fix docstring formatting

* add test for subword regularization tokenizer

* improve comments of test

* add sp_model_kwargs

* reformat docstring to match the style

* add some more documentation

* Update src/transformers/models/xlm_roberta/tokenization_xlm_roberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* improve docstring

* empty commit to trigger CI

* Update src/transformers/models/xlm_roberta/tokenization_xlm_roberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix docstring formatting for sphinx

Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-23 17:52:31 -04:00
Sylvain Gugger
1ef152eb48 Default to accuracy metric (#11405) 2021-04-23 14:49:59 -04:00
Daniel Stancl
e3ff165aa5 Fix cross-attention head mask for Torch encoder-decoder models (#10605)
* Fix cross-attention head mask for Torch BART models

* Fix head masking for cross-attention module for the following
models: BART, Blenderbot, Blenderbot_small, M2M_100, Marian, MBart,
Pegasus

* Enable test_headmasking for M2M_100 model

* Fix cross_head_mask for FSMT, LED and T5

* This commit fixes `head_mask` for cross-attention modules
in the following models: FSMT, LED, T5

* It also contains some smaller changes in doc so that
it is be perfectly clear the shape of `cross_head_mask`
is the same as of `decoder_head_mask`

* Update template

* Fix template for BartForCausalLM

* Fix cross_head_mask for Speech2Text models

* Fix cross_head_mask in templates

* Fix args order in BartForCausalLM template

* Fix doc in BART templates

* Make more explicit naming

* `cross_head_mask` -> `cross_attn_head_mask`

* `cross_layer_head_mask` -> `cross_attn_layer_head_mask`

* Fix doc

* make style quality

* Fix speech2text docstring
2021-04-23 18:58:06 +02:00
Sylvain Gugger
ca6b80cadb Wrong branch Sylvain... 2021-04-23 12:46:54 -04:00
Sylvain Gugger
3951fc55ee Try to trigger failure more 2021-04-23 12:44:54 -04:00
Sylvain Gugger
bd41a0f74d Style 2021-04-23 12:32:37 -04:00
Nicola De Cao
1811883e80 Fixing bug in generation (#11297)
When passing `inputs_embeds` and not `input_ids=None` the generation function fails because `input_ids` is created but the function but it should not.
2021-04-23 18:24:26 +02:00
Kiran R
5c00918681 added support for exporting of t5 to onnx with past_key_values (#10651) 2021-04-23 18:14:20 +02:00
Patrick von Platen
50f4539b82 push (#11400) 2021-04-23 15:36:27 +02:00
Sylvain Gugger
bf2e0cf70b Trainer push to hub (#11328)
* Initial support for upload to hub

* push -> upload

* Fixes + examples

* Fix torchhub test

* Torchhub test I hate you

* push_model_to_hub -> push_to_hub

* Apply mixin to other pretrained models

* Remove ABC inheritance

* Add tests

* Typo

* Run tests

* Install git-lfs

* Change approach

* Add push_to_hub to all

* Staging test suite

* Typo

* Maybe like this?

* More deps

* Cache

* Adapt name

* Quality

* MOAR tests

* Put it in testing_utils

* Docs + torchhub last hope

* Styling

* Wrong method

* Typos

* Update src/transformers/file_utils.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Address review comments

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-04-23 09:17:37 -04:00
Teven
7bc86bea68 Fixed trainer total_flos relaoding in distributed mode (#11383)
* Fixed trainer total_flos relaoding in distributed mode

* logging flos at the end of training
2021-04-23 07:53:33 -04:00
Patrick von Platen
74e84f1fa6 make blenderbot test slow (#11395) 2021-04-23 07:49:09 -04:00
Yoshitomo Matsubara
c3d6f33918 fixed typos (#11391) 2021-04-23 07:48:42 -04:00
Max Del
a90d3f1862 Fix typo in text (#11396) 2021-04-23 07:37:19 -04:00
Patrick von Platen
2dc2d79ac7 correct conversion (#11394) 2021-04-23 11:59:34 +02:00
Patrick von Platen
b48cf7124c correct typo (#11393) 2021-04-23 11:34:59 +02:00
Patrick von Platen
8c9b5fcbaf [Flax] Big FlaxBert Refactor (#11364)
* improve flax

* refactor

* typos

* Update src/transformers/modeling_flax_utils.py

* Apply suggestions from code review

* Update src/transformers/modeling_flax_utils.py

* fix typo

* improve error tolerance

* typo

* correct nasty saving bug

* fix from pretrained

* correct tree map

* add note

* correct weight tying
2021-04-23 09:53:09 +02:00
Sylvain Gugger
3ed5e97ba0 Fix Trainer with remove_unused_columns=False (#11382)
* Fix Trainer with remove_unused_columns=False

* Typo
2021-04-22 11:16:24 -04:00
PenutChen
0f3ad1507e Fix typo (#11369) 2021-04-22 10:10:16 -04:00
Matt
2617396094 Correctly cast num_train_epochs to int (#11379) 2021-04-22 13:49:59 +01:00
Takuya Makino
881945c0b5 Add space (#11373) 2021-04-22 17:48:58 +05:30
johnson7788
5b5e4ca366 [run_translation.py] fix typo (#11372)
fix typo

Co-authored-by: johnson <johnson@github.com>
2021-04-22 17:47:11 +05:30
Patrick von Platen
58d8795d74 [Flax] Correct typo (#11374)
* finish

* fix copy
2021-04-22 13:11:44 +02:00
Patrick von Platen
880154d2e1 [Wav2Vec2] Fix special tokens for Wav2Vec2 tokenizer (#11349)
* fix wav2vec2 tok

* up
2021-04-22 12:23:08 +02:00
Sylvain Gugger
6f14eab50b Add in torchhub 2021-04-21 19:17:29 -04:00
Sylvain Gugger
ff26f8ee3a Add huggingface_hub dep for #11328 2021-04-21 19:12:58 -04:00
wlhgtc
5e04d70868 Fix token_type_ids error for big_bird model. (#11355)
* MOD: fit chinese wwm to new datasets

* MOD: move wwm to new folder

* MOD: formate code

* Styling

* MOD add param and recover trainer

* MOD: add token_type_ids method for big bird

* MOD: format code

* MOD: format code

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2021-04-21 19:37:57 +02:00
Stas Bekman
5aaf5aac0b [contributing doc] explain/link to good first issue (#11346)
* explain/link to good first issue

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-21 10:10:11 -07:00
Matt
6fe79e57d7 Move old TF text classification script to legacy (#11361)
And update README to explain the work-in-progress!
2021-04-21 17:36:18 +01:00
Patrick von Platen
50595a3336 Remove boiler plate code (#11340)
* remove boiler plate code

* adapt roberta

* correct docs

* finish refactor
2021-04-21 18:34:38 +02:00
Matt
ac588594e2 Merge new TF example script (#11360)
First of the new and more idiomatic TF examples!
2021-04-21 17:04:55 +01:00
Stas Bekman
9f72e8f4e1 [testing doc] bring doc up to date (#11359)
* bring doc up to date

* fix
2021-04-21 08:51:00 -07:00
lewtun
41f3133a3a Extract metric_key_prefix during NotebookProgressCallback.on_evaluate (#11347)
* Pass metric_key_prefix as kwarg to on_evaluate

* Replace eval_loss with metric_key_prefix_loss

* Default to "eval" if metric_key_prefix not in kwargs

* Add kwargs to CallbackHandler.on_evaluate signature

* Revert "Add kwargs to CallbackHandler.on_evaluate signature"

This reverts commit 8d4c85ed512f558f7579d36771e907b3379947b7.

* Revert "Pass metric_key_prefix as kwarg to on_evaluate"

This reverts commit 7766bfe2718601230ae593d37b1317bd53cfc075.

* Extract metric_key_prefix from metrics
2021-04-21 11:12:09 -04:00
Sylvain Gugger
dabeb15292 Examples reorg (#11350)
* Base move

* Examples reorganization

* Update references

* Put back test data

* Move conftest

* More fixes

* Move test data to test fixtures

* Update path

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments and clean

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-21 11:11:20 -04:00
Stas Bekman
ca7ff64f5b [deepspeed] fix resume from checkpoint (#11352)
This PR fixes a bug that most likely somehow got exposed (not caused) by https://github.com/huggingface/transformers/pull/11318 - surprisingly the same test worked just fine before that other PR.
2021-04-21 07:48:15 -07:00
Sylvain Gugger
74712e22f3 Honor contributors to models (#11329)
* Honor contributors to models

* Fix typo

* Address review comments

* Add more authors
2021-04-21 09:47:27 -04:00
Nicolas Patry
aad95c7cde Removed max_length from being mandatory within generate. (#11314)
* Removed `max_length` from being mandatory within `generate`.

- Moving on to fully using `StoppingCriteria` for `greedy` and `sample`
modes.
- `max_length` still used for `beam_search` and `group_beam_search`
(Follow up PR)
- Fixes a bug with MaxLengthStoppingCriteria (we should stop as soon a
we hit the max_length, the comparison needs to be or equal, that affects
the tests).
- Added options to use `logits_processor` and `stopping_criteria`
directly within `generate` function (so some users can define their own
`logits_processor` and `stopping_criteria`).
- Modified the backward compat tests to make sure we issue a warning.

* Fix `max_length` argument in `generate`.

* Moving validate to being functional.

- Renamed `smax_length` to `stoppping_max_length`.

* Removing `logits_processor` and `stopping_criteria` from `generate`
arguments.

* Deepcopy.

* Fix global variable name.
2021-04-21 11:56:45 +02:00
Yusuke Mori
95dab34d55 Add an error message that fires when Reformer is not in training mode, but one runs .backward() (#11117) 2021-04-21 00:23:37 +02:00
Sylvain Gugger
f1b938fda8 Update to use datasets remove_cloumns method (#11343)
* Update to use datasets remove_cloumns method

* Quality
2021-04-20 14:12:01 -04:00
Suraj Patil
cfd2eaa8cf [GPTNeo] create local attention mask ones (#11335)
* create local attention mask ones

* remove old method, address patricks comment
2021-04-20 18:37:44 +05:30
Patrick von Platen
f464f10a2c [Generate] Remove outdated code (#11331)
* remove update function

* update

* refactor more

* refactor
2021-04-20 15:16:02 +03:00
rajvi-k
bfd83c17a7 Added translation example script (#11196)
* initial changes

* modified evaluation

* updated evaluation

* updated evaluation on text translation example script

* added translation example script

* Formatted translation example script

* Reformatted translation example

* Fixed evaluation bug and added support for other tokenisers

* Fixed evaluation bug and added support for other tokenisers

* Added translation example script

* Formatted summarization example script

* Removed typos from summarization example script
2021-04-20 07:18:47 -04:00
Sylvain Gugger
c0328a6c26 Load checkpoint without re-creating the model (#11318) 2021-04-19 20:31:29 -04:00
Sylvain Gugger
95037a169f [Trainer] Add a progress bar for batches skipped (#11324) 2021-04-19 19:04:52 -04:00
Stas Bekman
95ffbe1686 [Trainer] fix the placement on device with fp16_full_eval (#11322)
* fix the placement on device with fp16_full_eval

* deepspeed never goes on device
2021-04-19 11:55:33 -07:00
TAE YOUNGDON
3981ce3dd2 modify double considering special tokens in language_modeling.py (#11275)
* Update language_modeling.py

in "class TextDatasetForNextSentencePrediction(Dataset)", double considering "self.tokenizer.num_special_tokens_to_add(pair=True)" 

so, i remove self.block_size, and add parameter for "def create_examples_from_document". like "class LineByLineWithSOPTextDataset" do

* Update language_modeling.py
2021-04-19 11:24:43 -04:00
e
5a34d8d982 move device statements outside if statements (#11292) 2021-04-19 08:25:40 -04:00
Sylvain Gugger
d9c62047a8 Trainer support for IterableDataset for evaluation and predict (#11286)
* Bulk of the work

* Polish and tests

* Update QA Trainer

* Avoid breaking the predict method

* Deprecation warnings

* Store real eval dataloder

* Get eval dataset reference before wrap
2021-04-16 16:01:58 -04:00
Lysandre
e783ea7304 Fix failing workflows 2021-04-16 08:09:51 -04:00
Nicolas Patry
92970c0cb9 Enabling multilingual models for translation pipelines. (#10536)
* [WIP] Enabling multilingual models for translation pipelines.

* decoder_input_ids -> forced_bos_token_id

* Improve docstring.

* Rebase

* Fixing 2 bugs

- Type token_ids coming from `_parse_and_tokenize`
- Wrong index from tgt_lang.

* Fixing black version.

* Adding tests for _build_translation_inputs and add them for all
tokenizers.

* Mbart actually puts the lang code at the end.

* Fixing m2m100.

* Adding TF support to `deep_round`.

* Update src/transformers/pipelines/text2text_generation.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Adding one line comment.

* Fixing M2M100 `_build_translation_input_ids`, and fix the call site.

* Fixing tests + deep_round -> nested_simplify

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-16 11:31:35 +02:00
Lysandre Debut
5254220e7f Workflow fixes (#11270) 2021-04-15 23:21:17 -04:00
Stas Bekman
dfc6dd8584 update dependency_versions_table (#11273)
missed this updating when bumped the version.
2021-04-15 19:10:29 -07:00
Sylvain Gugger
2550b41aa2 Tokenizer fast save (#11234)
* Save fast tokenizers in both formats

* Fix for HerBERT

* Proper fix

* Properly test new behavior
2021-04-15 09:32:32 -04:00
Sylvain Gugger
6e1ee47b36 Support for set_epoch (#11258) 2021-04-15 07:36:32 -04:00
Nicolas Patry
c3fcba3219 Adding pipeline task aliases. (#11247)
* Adding task aliases and adding `token-classification` and
`text-classification` tasks.

* Cleaning docstring.
2021-04-15 09:51:24 +02:00
Sylvain Gugger
aaaed56ffc Trainer iterable dataset (#11254)
* IterableDatasetShard

* Test and integration in Trainer

* Update src/transformers/trainer_pt_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-14 17:02:26 -04:00
Stas Bekman
83206ca6a8 [deepspeed] test on one node 2 gpus max (#11237)
* test on one node 2 gpus max

* fix the other place

* refactor

* fix

* cleanup

* more exact version
2021-04-14 11:06:59 -07:00
Sylvain Gugger
25e1af36e0 Fix #10128 (#11248) 2021-04-14 11:47:54 -04:00
Stas Bekman
63ca402380 [troubleshooting] add 2 points of reference to the offline mode (#11236)
* add 2 points of reference to the offline mode

* link the new doc

* add error message

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style

* rename

* Trigger CI

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-14 08:39:23 -07:00
Yusuke Mori
075e821d1d Add prefix to examples in model_doc rst (#11226)
* Add prefix to examples in model_doc rst

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-14 10:58:55 -04:00
Thomas Wood
4670b57ce9 Fix dimention misspellings. (#11238)
* Update modeling_gpt_neo.py

dimention -> dimension

* Update configuration_speech_to_text.py

dimention -> dimension
2021-04-14 10:39:37 -04:00
Sudharsan S T
f25444cb22 Close open files to suppress ResourceWarning (#11240)
Co-authored-by: Sudharsan Thirumalai <sudharsan.t@sprinklr.com>
2021-04-14 10:31:04 -04:00
Lysandre Debut
7fe5aaa8b0 Stale bot updated (#10562)
* Updated stale bot

* Specify issue number

* Remove particular handling of assignees

* Unleash the stalebot

* Remove debug branch
2021-04-14 10:24:31 -04:00
Joel Stremmel
9337c6c668 make embeddings plural in warning message (#11228) 2021-04-14 10:13:25 -04:00
Nithin Holla
653076ca30 Save the Wav2Vec2 processor before training starts (#10910)
Co-authored-by: nithin19 <nithin@amberscript.com>
2021-04-14 14:52:06 +03:00
Stas Bekman
3d339ee659 [Deepspeed] zero3 tests band aid (#11235)
* temp band-aid

* style
2021-04-13 17:58:09 -04:00
Lysandre Debut
1ad7b0398c Run CI on deepspeed and fairscale (#11172)
* Run CI on deepspeed and fairscale

* Test it on this branch :)

* Rename

* Update the CI image
2021-04-13 15:47:06 -04:00
Sylvain Gugger
f38cd4373f Indent code block in the documentation (#11233)
* Indent code block

* Indent code blocks version 2

* Quality
2021-04-13 15:36:36 -04:00
Sylvain Gugger
9d8e8a8703 Avoid using no_sync on SageMaker DP (#11229) 2021-04-13 15:34:00 -04:00
Philipp Schmid
9fa2995993 added cache_dir=model_args.cache_dir to all example with cache_dir arg (#11220) 2021-04-13 18:35:18 +02:00
Sylvain Gugger
3312e96bfb Doc check: a bit of clean up (#11224) 2021-04-13 12:14:25 -04:00
Suraj Patil
edca520d0f Refactor GPT2 (#11225)
* refactor GPT2

* fix mlp and head pruning

* address Sylvains comments

* apply suggestion from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-13 21:15:24 +05:30
Sylvain Gugger
893e51a53f Document v4.5.1 2021-04-13 11:28:17 -04:00
Sylvain Gugger
81009b7a5c Replace error by warning when loading an architecture in another (#11207)
* Replace error by warning when loading an architecture in another

* Style

* Style again

* Add a test

* Adapt old test
2021-04-13 10:33:52 -04:00
Yusuke Mori
22fa0a6004 Add documentation for BertJapanese (#11219)
* Start writing BERT-Japanese doc

* Fix typo, Update toctree

* Modify model file to use comment for document, Add examples

* Clean bert_japanese by make style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Split a big code block into two

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add prefix >>> to all lines in code blocks

* Clean bert_japanese by make fixup

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-13 09:49:15 -04:00
Suraj Patil
896d7be974 fix docstrings (#11221) 2021-04-13 08:58:08 -04:00
Lysandre Debut
823df93955 Fix GPT-2 warnings (#11213)
* Fix GPT-2 warnings

* Update src/transformers/models/gpt2/modeling_gpt2.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-04-13 08:53:03 -04:00
Lysandre Debut
0cd89d8c83 Add Matt as the TensorFlow reference (#11212) 2021-04-13 08:52:30 -04:00
Ceyda Cinarel
7c205bf40c wav2vec2 converter: create the proper vocab.json while converting fairseq wav2vec2 finetuned model (#11041)
* add vocab while converting wav2vec2 original finetuned model

* check save directory exists

* return_attention_mask fix

* quality
2021-04-13 15:54:33 +05:30
calpt
d49d3cf6d6 Use MSELoss in (M)BartForSequenceClassification (#11178) 2021-04-13 15:24:46 +05:30
Philipp Schmid
f243a5ec0d Sagemaker test docs update for framework upgrade (#11206)
* increased train_runtime for model parallelism

* added documentation for framework upgrade
2021-04-12 19:08:33 -04:00
Lysandre Debut
74d7c24d8d Import torch.utils.checkpoint in ProphetNet (#11214) 2021-04-12 18:56:17 -04:00
cronoik
38a10c6b52 Replaced which with who (#11183) 2021-04-12 18:08:28 -04:00
NielsRogge
9f1260971f Add DeiT (PyTorch) (#11056)
* First draft of deit

* More improvements

* Remove DeiTTokenizerFast from init

* Conversion script works

* Add DeiT to ViT conversion script

* Add tests, add head model, add support for deit in vit conversion script

* Update model checkpoint names

* Update image_mean and image_std, set resample to bicubic

* Improve docs

* Docs improvements

* Add DeiTForImageClassificationWithTeacher to init

* Address comments by @sgugger

* Improve feature extractors

* Make fix-copies

* Minor fixes

* Address comments by @patil-suraj

* All models uploaded

* Fix tests

* Remove labels argument from DeiTForImageClassificationWithTeacher

* Fix-copies, style and quality

* Fix tests

* Fix typo

* Multiple docs improvements

* More docs fixes
2021-04-12 18:07:10 -04:00
Takuya Makino
cb251ba619 Fix typo (#11188) 2021-04-12 17:35:32 -04:00
fghuman
0c6fcd3034 Added documentation for data collator. (#10941)
* Added documentation for data collator.

* Update docs/source/data_collator.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Added documentation for data collator.

* Added documentation for the data collator.

* Merge branch 'doc_DataCollator' of C:\Users\mahii\PycharmProjects\transformers with conflicts.

* Update documentation for the data collator.

* Update documentation for the data collator.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Amna <A.A.Ahmad@student.tudelft.nl>
2021-04-12 11:59:46 -04:00
Masatoshi TSUCHIYA
ef102c4886 model_path should be ignored as the checkpoint path (#11157)
* model_path is refered as the path of the trainer, and should be ignored as the checkpoint path.

* Improved according to Sgugger's comment.
2021-04-12 09:06:41 -04:00
Sylvain Gugger
623cd6aef9 Fix style 2021-04-12 08:14:29 -04:00
cronoik
a99f7f5c75 Minor typos fixed (#11182) 2021-04-12 07:55:40 -04:00
Sylvain Gugger
26212c14e5 Reactivate Megatron tests an use less workers 2021-04-09 18:09:53 -04:00
Lysandre
716120cbd6 Fix Typo 2021-04-09 17:46:52 -04:00
Philipp Schmid
6f90c29eaa added json dump and extraction of train run time (#11167)
* added json dump and extraction of train run time

* make style happy
2021-04-09 15:18:00 -04:00
Stas Bekman
07f0bb691d [examples run_clm] fix _LazyModule hasher error (#11168)
* fix _LazyModule hasher error

* reword
2021-04-09 11:39:12 -07:00
Suraj Patil
c161dd56df [examples/translation] support mBART-50 and M2M100 fine-tuning (#11170)
* keep a list of multilingual tokenizers

* add forced_bos_token argument
2021-04-09 23:58:42 +05:30
Kevin Canwen Xu
fb41f9f50c Add a special tokenizer for CPM model (#11068)
* Add a special tokenizer for CPM model

* make style

* fix

* Add docs

* styles

* cpm doc

* fix ci

* fix the overview

* add test

* make style

* typo

* Custom tokenizer flag

* Add REAMDE.md

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-04-10 02:07:47 +08:00
Sylvain Gugger
45fc8c7951 Make get_special_tokens_mask consider all tokens (#11163) 2021-04-09 11:57:44 -04:00
Saviour Owolabi
6060746570 Update README.md (#11161)
Corrected a typo ('Downlowd' to 'Download')
2021-04-09 11:52:21 -04:00
Keisuke Hirota
b9b60c1630 Fix LogitsProcessor documentation (#11130)
* Change duplicated LogitsProcessor to LogitsWarper in LogitsProcessorList document

* Write more detailed information about LogitsProcessor's scores argument

* apply suggestion from review

* style

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-04-09 12:39:44 +05:30
Niklas Muennighoff
8b78a32be1 [Community notebooks] Add Wav2Vec notebook for creating captions for YT Clips (#11142)
* Add Wav2Vec Inference notebook

* Update docs/source/community.md

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-04-09 12:10:37 +05:30
Stas Bekman
0311ba2153 typo (#11152)
* typo

* style
2021-04-08 19:47:31 -07:00
Sylvain Gugger
269c9638df Merge branch 'master' of github.com:huggingface/transformers 2021-04-08 21:14:56 -04:00
Sylvain Gugger
d31c7b104e Skip Megatron tests for now 2021-04-08 21:14:43 -04:00
Stas Bekman
c2e0fd5283 [setup] make fairscale and deepspeed setup extras (#11151)
* make fairscale and deepspeed setup extras

* fix default

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* no reason not to ask for the good version

* update the CIs

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-08 15:46:54 -07:00
Sylvain Gugger
ba8b1f4754 Add support for multiple models for one config in auto classes (#11150)
* Add support for multiple models for one config in auto classes

* Use get_values everywhere

* Prettier doc
2021-04-08 18:41:36 -04:00
Stas Bekman
97ccf67bb3 [setup] extras[docs] must include 'all' (#11148)
* extras[doc] must include 'all'

* fix

* better

* regroup
2021-04-08 18:10:44 -04:00
Stas Bekman
66446909b2 [tests] relocate core integration tests (#11146)
* relocate core integration tests

* add sys.path context manager

* cleanup

* try

* try2

* fix path

* doc

* style

* add dep

* add 2 more deps
2021-04-08 13:13:17 -07:00
Andrea Cappelli
6c40e49712 Run mlm pad to multiple for fp16 (#11128)
* Add mlm collator pad to multiple option (#10627)

* Use padding to 8x in run mlm (#10627)
2021-04-08 16:12:49 -04:00
Sylvain Gugger
dfed4ec263 Don't duplicate logs in TensorBoard and handle --use_env (#11141) 2021-04-08 16:12:36 -04:00
Philipp Schmid
9c9b8e707b Updates SageMaker docs for updating DLCs (#11140) 2021-04-08 16:05:53 -04:00
Lysandre Debut
ba2cf5f90d Add fairscale and deepspeed back to the CI (#11147)
* Add fairscale and deepspeed back to the CI

* Add deepspeed to single GPU tests
2021-04-08 11:36:45 -07:00
Stas Bekman
1ed24afe91 [trainer] solve "scheduler before optimizer step" warning (#11144)
* solve "scheduler before optimizer step" warning

* style

* correct the state evaluation test
2021-04-08 11:28:48 -07:00
Julien Demouth
02ec02d6d3 Add nvidia megatron models (#10911)
* Add support for NVIDIA Megatron models

* Add support for NVIDIA Megatron GPT2 and BERT

Add the megatron_gpt2 model. That model reuses the existing GPT2 model. This
commit includes a script to convert a Megatron-GPT2 checkpoint downloaded
from NVIDIA GPU Cloud. See examples/megatron-models/README.md for details.

Add the megatron_bert model. That model is implemented as a modification of
the existing BERT model in Transformers. This commit includes a script to
convert a Megatron-BERT checkpoint downloaded from NVIDIA GPU Cloud. See
examples/megatron-models/README.md for details.

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Remove model.half in tests + add "# Copied ..."

Remove the model.half() instruction which makes tests fail on the CPU.

Add a comment "# Copied ..." before many classes in the model to enable automatic
tracking in CI between the new Megatron classes and the original Bert ones.

* Fix issues

* Fix Flax/TF tests

* Fix copyright

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update docs/source/model_doc/megatron_bert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/megatron_gpt2.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Resolve most of 'sgugger' comments

* Fix conversion issue + Run make fix-copies/quality/docs

* Apply suggestions from code review

* Causal LM & merge

* Fix init

* Add CausalLM to last auto class

Co-authored-by: Julien Demouth <jdemouth@nvidia.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-04-08 14:09:11 -04:00
Stas Bekman
c6d664849b [DeepSpeed] ZeRO Stage 3 (#10753)
* synced gpus

* fix

* fix

* need to use t5-small for quality tests

* notes

* complete merge

* fix a disappearing std stream problem

* start zero3 tests

* wip

* tune params

* sorting out the pre-trained model loading

* reworking generate loop wip

* wip

* style

* fix tests

* split the tests

* refactor tests

* wip

* parameterized

* fix

* workout the resume from non-ds checkpoint pass + test

* cleanup

* remove no longer needed code

* split getter/setter functions

* complete the docs

* suggestions

* gpus and their compute capabilities link

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* style

* remove invalid paramgd

* automatically configure zero3 params that rely on hidden size

* make _get_resized_embeddings zero3-aware

* add test exercising resize_token_embeddings()

* add docstring

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-08 09:53:01 -07:00
Stas Bekman
acc851e1ff [run_clm] clarify why we get the tokenizer warning on long input (#11145)
* clarify why we get the warning here

* Update examples/language-modeling/run_clm.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* wording

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-08 09:46:28 -07:00
Yusuke Mori
5bf5d50c8d Typo fix of the name of BertLMHeadModel in BERT doc (#11133) 2021-04-08 08:22:58 -04:00
Jannis Born
f8e90d6fb9 Fix typing error in Trainer class (prediction_step) (#11138)
* fix: docstrings in prediction_step

* ci: Satisfy line length requirements

* ci: character length requirements
2021-04-08 08:22:25 -04:00
Sylvain Gugger
ffe0761777 Fix and refactor check_repo (#11127) 2021-04-07 17:56:21 -04:00
Philipp Schmid
3fd7eee18f Adds use_auth_token with pipelines (#11123)
* added model_kwargs to infer_framework_from_model

* added model_kwargs to tokenizer

* added use_auth_token as named parameter

* added dynamic get for use_auth_token
2021-04-07 20:32:59 +02:00
Stas Bekman
1c15128312 [versions] handle version requirement ranges (#11110)
* handle version requirement ranges

* add mixed requirement test

* cleanup
2021-04-07 09:09:38 -07:00
Vasudev Gupta
7442801df5 fix tests (#11109) 2021-04-07 10:07:26 -04:00
Lysandre Debut
c0d97cee13 Adds a note to resize the token embedding matrix when adding special … (#11120)
* Adds a note to resize the token embedding matrix when adding special tokens

* Remove superfluous space
2021-04-07 10:06:45 -04:00
Sylvain Gugger
02f7c2fe66 Some styling of the training table in Notebooks (#11118) 2021-04-07 10:00:33 -04:00
Sylvain Gugger
11505fa139 Dummies multi backend (#11100)
* Replaces requires_xxx by one generic method

* Quality and update check_dummies

* Fix inits check

* Post-merge cleanup
2021-04-07 09:56:40 -04:00
Stas Bekman
424419f549 [examples] fix white space (#11099)
these get concatenated without whitespace, so fix it
2021-04-07 09:20:58 -04:00
Stas Bekman
c9035e4537 fix: The 'warn' method is deprecated (#11105)
* The 'warn' method is deprecated

* fix test
2021-04-07 09:20:06 -04:00
Leo Gao
247bed3857 GPTNeo: handle padded wte (#11079)
* GPTNeo: handle padded wte

* Switch to config.vocab_size

* apply review suggestion

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-04-07 17:35:20 +05:30
cronoik
083ad7d46c dead link fixed (#11103) 2021-04-07 07:50:47 -04:00
Sylvain Gugger
fd338abdeb Style 2021-04-06 19:54:13 -04:00
SHYAM SUNDER KUMAR
aef4cf8c52 accelerate question answering examples with no trainer (#11091)
* accelerate question answering examples with no trainer

* removed train and eval flags also fixed fill np array function

* Update examples/question-answering/run_qa_beam_search_no_trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/question-answering/run_qa_no_trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-06 19:35:21 -04:00
Sylvain Gugger
403d530eec Auto feature extractor (#11097)
* AutoFeatureExtractor

* Init and first tests

* Tests

* Damn you gitignore

* Quality

* Defensive test for when not all backends are here

* Use pattern for Speech2Text models
2021-04-06 19:20:08 -04:00
Stas Bekman
520198f56f [doc] gpt-neo (#11098)
make the example work
2021-04-06 16:42:06 -04:00
Lysandre
9853c5dd58 Development on v4.6.0dev0 2021-04-06 12:53:25 -04:00
Lysandre
4906a29f7f Release v4.5.0
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
2021-04-06 12:37:47 -04:00
Suraj Patil
2a8115f083 [WIP] GPT Neo cleanup (#10985)
* better names

* add attention mixin

* all slow tests in one class

* make helper methods static so we can test

* add local attention tests

* better names

* doc

* apply review suggestions
2021-04-06 12:24:15 -04:00
Philipp Schmid
76800fb8e6 added new merged Trainer test (#11090) 2021-04-06 15:12:21 +02:00
Philipp Schmid
b219d6b5a5 added social thumbnail for docs (#11083) 2021-04-06 14:56:18 +02:00
Sylvain Gugger
6c1bee7d89 Link to new blog 2021-04-06 08:55:40 -04:00
Stas Bekman
f7328de46d HF emoji unicode doesn't work in console (#11081)
It doesn't look like using 🤗 is a great idea for printing to console. See attachment.

This PR proposes to replace 🤗 with "HuggingFace" for an exception message.

@LysandreJik
2021-04-06 08:03:00 -04:00
Hemil Desai
6ab7d1a429 Add Readme for language modeling scripts with accelerate (#11073) 2021-04-05 20:56:12 -04:00
Sylvain Gugger
2199608ca6 Make a base init in FeatureExtractionMixin (#11074) 2021-04-05 18:02:28 -04:00
Sylvain Gugger
04ceee7d24 Fix distributed gather for tuples of tensors of varying sizes (#11071) 2021-04-05 16:21:49 -04:00
Sylvain Gugger
f05a8a0c5e Document common config attributes (#11070) 2021-04-05 15:29:01 -04:00
Sylvain Gugger
090e3e6896 Add center_crop to ImageFeatureExtractoMixin (#11066) 2021-04-05 15:28:51 -04:00
konstin
abb7430003 Replace pkg_resources with importlib_metadata (#11061)
* Replace pkg_resources with importlib_metadata

Fixes #10964. The other reason for this change is that pkg_resources has been [deprecated](8fe85c22ce) in favor of importlib_metadata.

* Reduce to a single importlib_metadata import switch

* Trigger CI

Co-authored-by: Stas Bekman <stas@stason.org>
2021-04-05 12:12:19 -07:00
Hemil Desai
b51b87c41d Add examples/language_modeling/run_clm_no_trainer.py (#11026)
* Initial draft for clm no trainer

* Remove unwanted args

* Fix bug

* Update examples/language-modeling/run_clm_no_trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-05 12:27:52 -04:00
Amala Deshmukh
e1c02e018c Add example for registering callbacks with trainers (#10928)
* Add example for callback registry

Resolves: #9036

* Update callback registry documentation

* Added comments for other ways to register callback
2021-04-05 12:27:23 -04:00
Lysandre Debut
9f4e0c23d6 Documentation about loading a fast tokenizer within Transformers (#11029)
* Documentation about loading a fast tokenizer within Transformers

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-05 10:51:16 -04:00
Sylvain Gugger
6c25f5228e Refactor AutoModel classes and add Flax Auto classes (#11027)
* Refactor AutoModel classes and add Flax Auto classes

* Add new objects to the init

* Fix hubconf and sort models

* Fix TF tests

* Missing coma

* Update src/transformers/models/auto/auto_factory.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Fix init

* Fix dummies

* Other init to fix

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-05 10:11:28 -04:00
Lysandre Debut
eb3479e7cf Some models have no tokenizers (#11064) 2021-04-05 09:37:49 -04:00
Lysandre Debut
773e4c7263 Remove unnecessary space (#11060) 2021-04-05 09:36:20 -04:00
Lysandre Debut
ef62f038fd Pin docutils (#11062)
* Pin docutils

* Versions table
2021-04-05 09:35:21 -04:00
Eren Şahin
6e31014110 [doc] update code-block rendering (#11053)
double : prevents code-block section to be rendered, so made it single :
2021-04-05 09:06:07 -04:00
Stas Bekman
3d39226a51 s|Pretrained|PreTrained| (#11048) 2021-04-04 18:08:42 -07:00
Sylvain Gugger
b0d49fd536 Add a script to check inits are consistent (#11024) 2021-04-04 20:41:34 -04:00
versis
335c0ca35c fixed typo: logging instead of logger (#11025) 2021-04-02 09:22:22 -04:00
Philipp Schmid
34e1bec649 added new notebook and merge of trainer (#11015)
* added new notebook and merge of trainer

* Update docs/source/sagemaker.md

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-01 23:13:47 +02:00
Julien Chaumond
e8da77d181 [doc] no more bucket 2021-04-01 14:25:47 -04:00
Joe Davison
f4ad3d8cea minor typo fix
*negative* log-likelihood
2021-04-01 11:58:37 -06:00
cronoik
57c1749efa DebertaTokenizer Rework closes #10258 (#10703)
* closes #10258

* typo

* reworked deberta test

* implemented the comments from BigBird01 regarding sequence pair encoding of deberta

* Update style

* VOCAB_FILES_NAMES is now a oneliner as suggested by @sgugger

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added #fmt: on as requested by @sgugger

* Style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-01 13:53:53 -04:00
NielsRogge
30677dc743 Add Vision Transformer and ViTFeatureExtractor (#10950)
* Squash all commits into one

* Update ViTFeatureExtractor to use image_utils instead of torchvision

* Remove torchvision and add Pillow

* Small docs improvement

* Address most comments by @sgugger

* Fix tests

* Clean up conversion script

* Pooler first draft

* Fix quality

* Improve conversion script

* Make style and quality

* Make fix-copies

* Minor docs improvements

* Should use fix-copies instead of manual handling

* Revert "Should use fix-copies instead of manual handling"

This reverts commit fd4e591bce4496d41406425c82606a8fdaf8a50b.

* Place ViT in alphabetical order

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-01 11:16:05 -04:00
cchen-dialpad
af6732225c Improve the speed of adding tokens from added_tokens.json (#10780)
* use bisect to add one token to unique_no_split_tokens

* fix style
2021-04-01 08:56:12 -04:00
Josh
c301c26370 Fix Adafactor documentation (recommend correct settings) (#10526)
* Update optimization.py

Fix documentation to reflect optimal settings for Adafactor

* update and expand on the recommendations

* style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* flip scale_parameter to True for the 2nd recommendatoin

Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-31 21:03:38 -07:00
Hemil Desai
838f83d84c Add examples/language_modeling/run_mlm_no_trainer.py (#11001)
* Add initial script for finetuning MLM models with accelerate

* Add evaluation metric calculation

* Fix bugs

* Use no_grad on evaluation

* update script docstring

* Update examples/language-modeling/run_mlm_no_trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* PR feedback

* Fix CI failure

* Update examples/language-modeling/run_mlm_no_trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-31 18:49:45 -04:00
JohnnyC08
455f81711f Update training_args.py (#11000)
In the group by length documentation length is misspelled as legnth
2021-03-31 18:28:07 -04:00
Patrick von Platen
01068abdb9 add blog to docs (#10997) 2021-03-31 18:36:00 +03:00
Sylvain Gugger
cd56f3fe7e Merge trainers (#10975)
* Replace is_sagemaker_distributed_available

* Merge SageMakerTrainer into Trainer

* Test with shorter condition

* Put back deleted line

* Deprecate SageMakerTrainer and SageMakerTrainingArguments

* Apply suggestions from code review

Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>

Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
2021-03-31 10:01:30 -04:00
Patrick von Platen
b6dddda4d2 add notebook (#10995) 2021-03-31 17:00:56 +03:00
Sylvain Gugger
acc3bd9d2a Enforce string-formatting with f-strings (#10980)
* First third

* Styling and fix mistake

* Quality

* All the rest

* Treat %s and %d

* typo

* Missing )

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-31 10:00:27 -04:00
Sylvain Gugger
d0b3797a3b Add more metadata to the user agent (#10972)
* Add more metadata to the user agent

* Fix typo

* Use DISABLE_TELEMETRY

* Address review comments

* Use global env

* Add clean envs on circle CI
2021-03-31 09:36:07 -04:00
Suraj Patil
a8549bdd82 fix example in config (#10993) 2021-03-31 17:38:57 +05:30
Lysandre Debut
a96edb85c9 GPT Neo configuration needs to be set to use GPT2 tokenizer (#10992) 2021-03-31 08:03:20 -04:00
Lysandre Debut
bf0840accc Fix the checkpoint for I-BERT (#10994) 2021-03-31 08:02:51 -04:00
Philipp Schmid
ced7284a60 Sagemaker test fix (#10987)
* wrong makefile command

* ddp test fix
2021-03-31 07:44:22 -04:00
WybeKoper
645f45c462 Fixed some typos and removed legacy url (#10989)
* Fixed typos

* Removed legacy colab notebook from readme

Co-authored-by: WybeKoper <WybeKoper@users.noreply.github.com>
2021-03-31 16:53:15 +05:30
Patrick von Platen
e87505f3a1 [Flax] Add other BERT classes (#10977)
* add first code structures

* add all bert models

* add to init and docs

* correct docs

* make style
2021-03-31 09:45:58 +03:00
Yih-Dar
e031162a6b fix md file to avoid evaluation crash (#10962) 2021-03-30 21:26:22 +03:00
Philipp Schmid
3e09d813aa [examples/s2s] added py7zr dep (#10971)
* added py7zr

* comment out check_min for sagemaker test

* added min version again
2021-03-30 23:17:12 +05:30
Nicolas Patry
c32b432a67 Fixed a bug where the pipeline.framework would actually contain (#10970)
a fully qualified model.

We simply forgot to change the call for this one when this landed:
https://github.com/huggingface/transformers/pull/10888

It's odd that tests didn't catch that. Should we add some ?
(It's a pretty edgy test case, but it does run within the API).
2021-03-30 13:26:35 -04:00
Philipp Schmid
e3c8443f08 improved sagemaker documentation for git_config and examples (#10966)
* improved branch usage

* fixed grammar and comma
2021-03-30 18:00:52 +02:00
Suraj Patil
83d38c9ff3 GPT Neo few fixes (#10968)
* fix checkpoint names

* auto model

* fix doc
2021-03-30 11:15:55 -04:00
Patrick von Platen
7772ddb473 fix big bird gpu test (#10967) 2021-03-30 17:03:48 +03:00
Suraj Patil
860264379f GPT Neo (#10848)
* lets begin

* boom boom

* fix out proj in attn

* fix attention

* fix local attention

* add tokenizer

* fix imports

* autotokenizer

* fix checkpoint name

* cleanup

* more clean-up

* more cleanup

* output attentions

* fix attn mask creation

* fix imports

* config doc

* add tests

* add slow tests

* quality

* add conversion script

* copyright

* typo

* another bites the dust

* fix attention tests

* doc

* add embed init in convert function

* fix copies

* remove tokenizer

* enable caching

* address review comments

* improve config and create attn layer list internally

* more consistent naming

* init hf config from mesh-tf config json file

* remove neo tokenizer from doc

* handle attention_mask in local attn layer

* attn_layers => attention_layers

* add tokenizer_class in config

* fix docstring

* raise if len of attention_layers is not same as num_layers

* remove tokenizer_class from config

* more consistent naming

* fix doc

* fix checkpoint names

* fp16 compat

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-30 09:42:30 -04:00
Philipp Schmid
a04eb8d369 Fix summarization notebook link (#10959) 2021-03-30 08:28:58 -04:00
Patrick von Platen
8780caa388 [WIP][Flax] Add general conversion script (#10809)
* save intermediate

* finish first version

* delete some more

* improve import

* fix roberta

* Update src/transformers/modeling_flax_pytorch_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_flax_pytorch_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* small corrections

* apply all comments

* fix deterministic

* make fix-copies

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-30 12:13:59 +03:00
Philipp Schmid
604c085087 Sagemaker test (#10925)
* init

* first working test

* added todo for setup.py

* working test for single node multi node ddp and smd

* added tensorflow single node test

* added directory for pytorch and tensorflow due to different requirements.txt

* added directory for pytorch and tensorflow

* added comment for run_glue until it is available

* added output_dir to it

* smaller dataset to make test running faster

* adjust HP and script

* adjusted parameter for tensorflow

* refactored test scripts

* adjusted make file

* init

* first working test

* added todo for setup.py

* working test for single node multi node ddp and smd

* added tensorflow single node test

* added directory for pytorch and tensorflow due to different requirements.txt

* added directory for pytorch and tensorflow

* added comment for run_glue until it is available

* added output_dir to it

* smaller dataset to make test running faster

* adjust HP and script

* adjusted parameter for tensorflow

* refactored test scripts

* adjusted make file

* updated dlc container

* commented in all tests

* added both ecr images

* added new master branches

* debug

* added new datasets version

* init

* strange rebase bug

* removed changes

* changed min version for tests to work

* updated DLC

* added model parallel test

* removed test files

* removed test files

* tested with ned dlc

* added correct sagemaker sdk version

* adjust DLCs for official one

* reworked tests

* quality

* removed default profile added documentation to it

* added step in release for sagemaker tests

* reverted version for example script removed duplicated script and added install from master to requirements.txt

* removed mistaken .DS_Stores from mac

* fixed tests

* added Sylvains feedback

* make style

* added lysandre's feedback
2021-03-30 08:28:02 +02:00
Vasudev Gupta
6dfd027279 BigBird (#10183)
* init bigbird

* model.__init__ working, conversion script ready, config updated

* add conversion script

* BigBirdEmbeddings working :)

* slightly update conversion script

* BigBirdAttention working :) ; some bug in layer.output.dense

* add debugger-notebook

* forward() working for BigBirdModel :) ; replaced gelu with gelu_fast

* tf code adapted to torch till rand_attn in bigbird_block_sparse_attention ; till now everything working :)

* BigBirdModel working in block-sparse attention mode :)

* add BigBirdForPreTraining

* small fix

* add tokenizer for BigBirdModel

* fix config & hence modeling

* fix base prefix

* init testing

* init tokenizer test

* pos_embed must be absolute, attn_type=original_full when add_cross_attn=True , nsp loss is optional in BigBirdForPreTraining, add assert statements

* remove position_embedding_type arg

* complete normal tests

* add comments to block sparse attention

* add attn_probs for sliding & global tokens

* create fn for block sparse attn mask creation

* add special tests

* restore pos embed arg

* minor fix

* attn probs update

* make big bird fully gpu friendly

* fix tests

* remove pruning

* correct tokenzier & minor fixes

* update conversion script , remove norm_type

* tokenizer-inference test add

* remove extra comments

* add docs

* save intermediate

* finish trivia_qa conversion

* small update to forward

* correct qa and layer

* better error message

* BigBird QA ready

* fix rebased

* add triva-qa debugger notebook

* qa setup

* fixed till embeddings

* some issue in q/k/v_layer

* fix bug in conversion-script

* fixed till self-attn

* qa fixed except layer norm

* add qa end2end test

* fix gradient ckpting ; other qa test

* speed-up big bird a bit

* hub_id=google

* clean up

* make quality

* speed up einsum with bmm

* finish perf improvements for big bird

* remove wav2vec2 tok

* fix tokenizer

* include docs

* correct docs

* add helper to auto pad block size

* make style

* remove fast tokenizer for now

* fix some

* add pad test

* finish

* fix some bugs

* fix another bug

* fix buffer tokens

* fix comment and merge from master

* add comments

* make style

* commit some suggestions

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix typos

* fix some more suggestions

* add another patch

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix copies

* another path

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* update

* update nit suggestions

* make style

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-30 08:51:34 +03:00
Sylvain Gugger
700229f8a4 Fixes in the templates (#10951)
* Fixes in the templates

* Define in all cases

* Dimensionality -> Dimension

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-03-29 17:36:13 -04:00
Stas Bekman
05c966f24b [vulnerability] dep fix (#10954)
Fixes https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/Pygments/open

@LysandreJik
2021-03-29 17:25:47 -04:00
Stas Bekman
fb7fca718a [trainer metrics] fix cpu mem metrics; reformat runtime metric (#10937)
* fix cpu mem metrics; reformat runtime metric

* adjust dependency

* extend docs

* soft dependency

* cleanup

* fix the runtime metric issue

* restore

* move docs, cross reference from 2 places, improve

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-29 13:47:02 -07:00
Daniel Stancl
5057213bcc Add examples/multiple-choice/run_swag_no_trainer.py (#10934)
* Initial commit

* Another bunch of updates

* make style quliaty + delete debug arg from bash script

* Use compue_metrics func

* Do a few fixes

* Add copyright

* Fix typos
2021-03-29 16:41:09 -04:00
pcuenca
ae6b6963ad Allow use of pre-computed lengths when grouping by length. (#10953)
A new argument `length_column_name` has been added to
`TrainingArguments`, with default value `"length"`. If this column
exists and `group_by_length` is `True`, the train sampler will use
it for grouping rather than computing it before training starts.

This is an optimization that allows the user to prepare data for fast
processing, preventing sequential access to the dataset as described in
issue #10909.
2021-03-29 15:44:19 -04:00
Sylvain Gugger
4002f95eb6 Remove duplicate code 2021-03-29 15:27:12 -04:00
Daniel Stancl
d7b50ce469 Add examples/run_ner_no_trainer.py (#10902)
* Add NER example with accelerate library

* This commit contains the first (yet really unfinished)
version of a script for showing how to train HuggingFace model
with their new accelerate library.

* Fix metric calculation

* make style quality

* mv ner_no_trainer to token-classification dir

* Delete --debug flag from running script

* hf_datasets -> raw_datasets

* Make a few slight adjustments

* Add an informative comment + rewrite a help comment

* Change header

* Fix a few things

* Enforce to use fast tokenizers only

* DataCollatorWithPadding -> DataCollatorForTokenClassification

* Change bash script: python3 -> accelerate launch

* make style

* Add a few missing things (see below)

* Add a max-lenghth padding to predictions and labels to
enable accelerate gather functionality

* Add PyTorch no trainer example to the example README.md

* Remove --do-train from args as being redundant for now

* DataCollatorWithPadding -> DataCollatorForTokenClassification

* Remove some obsolete args.do_train conditions from the script

* Delete --do_train from bash running script

* Delete use_slow_tokenizer from args

* Add unintentionally removed flag --label_all_tokens

* Delete --debug flag from running script
2021-03-29 15:11:23 -04:00
Sylvain Gugger
06a6fea782 Instantiate model only once in pipeline (#10888)
* Instantiate model only once in pipeline

* Remove documentation of deprecated method

* Add FutureWarning

* Update src/transformers/pipelines/base.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-29 10:39:14 -04:00
Masatoshi Suzuki
cc2366bbb9 Ignore not initialized NO_CONFIG_TOKENIZERs (#10936) 2021-03-29 10:26:15 -04:00
WybeKoper
ddea8771c6 Updated colab links in readme of examples (#10932)
Co-authored-by: WybeKoper <WybeKoper@users.noreply.github.com>
2021-03-29 08:47:09 -04:00
Guillaume Filion
b3544e4cc5 Return global attentions (see #7514) (#10906) 2021-03-29 15:00:23 +03:00
Bhadresh Savani
4f21e1ddd6 fixed finename (#10939) 2021-03-28 09:48:12 -07:00
Sylvain Gugger
b0595d33c1 Add ImageFeatureExtractionMixin (#10905)
* Add ImageFeatureExtractionMixin

* Add dummy vision objects

* Add require_vision

* Add tests

* Fix test
2021-03-26 11:23:56 -04:00
Stas Bekman
3c27d246e5 [vulnerability] fix dependency (#10914)
this PR fixes https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/PyYAML/open
2021-03-26 09:06:11 -04:00
Tomy Hsieh
4b2b50aa7b Rename NLP library to Datasets library (#10920)
* Rename NLP library to Datasets library

* Update github template

* Fix styling
2021-03-26 08:07:59 -04:00
lexhuismans
86c6f8a8b1 Fix comment (#10886) 2021-03-25 21:23:56 +03:00
Sylvain Gugger
9856c9213d Reorder init imports 2021-03-25 12:51:43 -04:00
Sylvain Gugger
e70068a719 Fix typo 2021-03-25 12:40:25 -04:00
Sylvain Gugger
f183a7a3c3 Sort init imports 2021-03-25 12:38:54 -04:00
Amir Tahmasbi
4684bfc757 Layout lm tf 2 (#10636)
* Added embeddings layer

* Added layoutlm layers, main model, maskedlm and token classification classes

* Added model classes to tf auto models

* Added model to PT to TF conversion script

* Added model to doc README

* Added tests

* Removed unused imports

* Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py

* Made tests pass!

* Fixed typos in imports and docs

* Fixed a typo in embeddings layer

* Removed imports

* Fixed formatting issues, imports, tests

* Added layoutlm layers, main model, maskedlm and token classification classes

* Added model classes to tf auto models

* Added model to PT to TF conversion script

* Removed unused imports

* Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py

* Made tests pass!

* Fixed typos in imports and docs

* Removed imports

* Fixed small formatting issues

* Removed duplicates import from main __init__.py

* Chnaged deafult arg to true for adding  pooling layer to tf layoutlm

* Fixed formatting issues

* Style

* Added copied from to classes copied from bert

* Fixed doc strings examples to work with layoutlm inputs

* Removed PyTorch reference in doc strings example

* Added integration tests

* Cleaned up initialization file

* Updated model checkpoint identifiers

* Fixed imports

Co-authored-by: Amir Tahmasbi <amir@ehsai.ca>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-03-25 12:32:38 -04:00
Philipp Schmid
1a3e0c4fe6 make local setup more clearer and added missing links (#10899) 2021-03-25 09:01:31 -04:00
Jethro Kuan
5f1491d3b3 run_glue_no_trainer: datasets -> raw_datasets (#10898)
Use the correct variable (raw_datasets) instead of the module (datasets)
where appropriate.
2021-03-25 08:28:17 -04:00
Sidd Karamcheti
1c06240e1b Update training args ignore_skip_data -> ignore_data_skip (#10891) 2021-03-24 16:44:51 -04:00
Sylvain Gugger
3b20e910b4 Remove version warning in pretrained BART models (#10890)
* Remove version warning in pretrained BART models

* Put it at the base model
2021-03-24 15:21:40 -04:00
Lysandre Debut
3c12e3c1c4 Fix overflowing bad word ids (#10889)
* Removes overflowing bad word IDs

* Raise warning
2021-03-24 15:13:56 -04:00
Eliza Szczechla
1f5ea9e04a Add notebook on fine-tuning Bart (#10883)
Co-authored-by: Eliza <eliza@habanero.tiger.com.pl>
2021-03-24 11:03:37 -04:00
imzhengzx
f81077fcf3 error type of tokenizer in __init__ definition (#10879)
the orignal code in line 246 is
```
tokenizer: Optional["PreTrainedTokenizerBase"] = None,
```

it should be
```
tokenizer: Optional[PreTrainedTokenizerBase] = None,
```
2021-03-24 11:00:14 -04:00
Sylvain Gugger
1aed2b908e Add new notebook links in the docs (#10876) 2021-03-24 09:45:08 -04:00
Sylvain Gugger
a735f727cc Fix test_trainer_distributed (#10875) 2021-03-23 19:03:06 -04:00
Philipp Schmid
8c297cdb30 Sm trainer smp init fix (#10870)
* rewrote is_sagemaker_model_parallel_available

* added is_sagemaker_model_parallel_available to SageMakerTrainer

* removed unnecessary mp_parameters as TrainingArguments

* make style happy

* added mp_parameters again to parse mp-specific args.
2021-03-23 20:07:55 +01:00
RafaelWO
d4d4447d53 fixed prefix_allowed_tokens_fn docstring in generate() (#10862) 2021-03-23 13:48:22 -04:00
Bhadresh Savani
7ef40120a0 [Examples] Added predict stage and Updated Example Template (#10868)
* added predict stage

* added test keyword in exception message

* removed example specific saving predictions

* fixed f-string error

* removed extra line

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-03-23 10:37:59 -07:00
Stas Bekman
fb2b89840b [file_utils] import refactor (#10859)
* import refactor

* fix the fallback
2021-03-23 09:41:41 -07:00
Lysandre
3f48b2bc3e Update stable docs 2021-03-23 11:01:16 -04:00
Philipp Schmid
77ffd5edd5 Amazon SageMaker Documentation (#10867)
* added finished documentation

* changed version from 1.6 to 1.6.0 for distributed

* updated versions

* updated urls
2021-03-23 10:56:44 -04:00
Sylvain Gugger
bf1f43fbd7 Update the example template for a no Trainer option (#10865) 2021-03-23 10:02:39 -04:00
Marta Maślankowska
2eb596f085 Fix p_mask cls token masking in qa pipeline (#10863) 2021-03-23 09:08:39 -04:00
Bhadresh Savani
eb330e8904 fixed typo (#10861) 2021-03-23 08:15:28 -04:00
Stas Bekman
e21f89f64c fix nan in full-fp16 label_smoothing eval (#10815) 2021-03-22 19:23:24 -07:00
Sylvain Gugger
b5b957a65c Make convert_to_onnx runable as script again (#10857) 2021-03-22 22:16:39 -04:00
Patrick von Platen
77bf3fe787 [Generate] Add save mode logits processor to remove nans and infs if necessary (#10769)
* push

* finish

* finish

* make fix copies

* change name
2021-03-23 01:00:05 +03:00
Eliza Szczechla
9f8fa4e973 Use DataCollatorForSeq2Seq in run_summarization in all cases (#10856)
Co-authored-by: Eliza <eliza@habanero.tiger.com.pl>
2021-03-22 15:05:39 -04:00
Ruan Chaves
a8d4d6776d Modify the Trainer class to handle simultaneous execution of Ray Tune and Weights & Biases (#10823)
* Modify the _hp_search_setup method on the Trainer class to handle the wandb argument passed by Ray Tune to model config.

* Reformat single quotes as double quotes.
2021-03-22 14:04:51 -04:00
Boris Dayma
125ccead71 feat(wandb): logging and configuration improvements (#10826)
* feat: ensure unique artifact id

* feat: allow manual init

* fix: simplify reinit logic

* fix: no dropped value + immediate commits

* fix: wandb use in sagemaker

* docs: improve documenation and formatting

* fix: typos

* docs: improve formatting
2021-03-22 10:45:17 -04:00
Sidd Karamcheti
b230181d41 Add simple one character fix so that on_step_begin and on_step_end are called at the right times (#10839) 2021-03-22 09:15:39 -04:00
Stas Bekman
24ab5b08a3 [makefile] autogenerate target (#10814)
* autogenerate target

* clarify comment
2021-03-22 09:14:22 -04:00
Sebastian Olsson
2c6684239f Correct AutoConfig call docstrings (#10822) 2021-03-22 09:12:44 -04:00
Stas Bekman
8fb4671811 [vulnerability] in example deps fix (#10817)
Takes care of:
https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/jinja2/open

@LysandreJik

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-22 09:05:24 -04:00
dependabot[bot]
dbfe379514 Bump jinja2 from 2.11.2 to 2.11.3 in /examples/research_projects/lxmert (#10818)
Bumps [jinja2](https://github.com/pallets/jinja) from 2.11.2 to 2.11.3.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/master/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/2.11.2...2.11.3)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-03-22 08:54:50 -04:00
Qiushi Pan
29904a967b Update FINE_TUNE_XLSR_WAV2VEC2.md (#10849)
Fix typo.
2021-03-22 07:58:59 -04:00
Patrick von Platen
0f226f78ce push (#10846) 2021-03-22 10:32:21 +03:00
Suraj Patil
82b8d8c7b0 Update FINE_TUNE_XLSR_WAV2VEC2.md 2021-03-21 22:47:09 +05:30
Patrick von Platen
af6125ffdb Update FINE_TUNE_XLSR_WAV2VEC2.md 2021-03-21 12:31:33 +03:00
Patrick von Platen
5aaf6e1460 small improvements for wav2vec2 info script (#10829) 2021-03-21 11:41:44 +03:00
Eric Lam
be87b84276 Add new community notebook - wav2vec2 with GPT (#10794)
* Add new community notebook - wav2vec2 with GPT

* Update:community.md, new nb add
* feat: notebook of wav2vec xlsr ctc decoding with gpt logit adjustment
* Update: Wav2vec2 CTC decoding with gpt2 adjustment

* Update docs/source/community.md

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-03-21 13:29:53 +05:30
Suraj Patil
68b55885ed add doc for Local machine (#10828) 2021-03-21 13:25:34 +05:30
Sylvain Gugger
21e86f99e6 Sort init import (#10801)
* Initial script

* Add script to properly sort imports in init.

* Add to the CI

* Update utils/custom_init_isort.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Separate scripts that change content from quality

* Move class_mapping_update to style_checks

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-19 16:17:13 -04:00
Julien Chaumond
1438c487df wav2vec doc tweaks (#10808)
* wording/typos tweaks

* Make model upload instructions simpler
2021-03-19 12:48:54 -04:00
Patrick von Platen
b9570a813c Update FINE_TUNE_XLSR_WAV2VEC2.md 2021-03-19 19:45:28 +03:00
Philipp Schmid
f2b744f690 Add transformers id to hub requests (#10811)
* add uuid.hext to user_agent

* add log

* changed order of it

* renamed as session id

* renamed variable

* reverted naming of the const
2021-03-19 16:26:32 +01:00
Sylvain Gugger
946400fb68 Expand a bit the presentation of examples (#10799)
* Expand a bit the presentation of examples

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Address review comments

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-03-19 10:06:08 -04:00
Bhadresh Savani
fd1d9f1ab8 [Example] Updating Question Answering examples for Predict Stage (#10792)
* added prediction stage and eval fix

* style correction

* removed extra lines
2021-03-19 09:42:17 -04:00
Patrick von Platen
e8968bd03a [XLSR-Wav2Vec2 Info doc] Add a couple of lines (#10806)
* finish

* fix

* fix

* fix

* fix
2021-03-19 12:52:54 +03:00
Théo Matussière
117dba9948 fix backend tokenizer args override: key mismatch (#10686)
* fix backend tokenizer args override: key mismatch

* no touching the docs

* fix mpnet

* add mpnet to test

* fix test

Co-authored-by: theo <theo@matussie.re>
2021-03-18 22:13:45 -04:00
Stas Bekman
427ea3fecb addressing vulnerability report in research project deps (#10802)
Following up on a security alert:
https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/Pillow/open
2021-03-18 22:02:10 -04:00
Patrick von Platen
2ae678229f Update FINE_TUNE_XLSR_WAV2VEC2.md 2021-03-19 00:29:20 +03:00
Patrick von Platen
68a3215949 Update FINE_TUNE_XLSR_WAV2VEC2.md 2021-03-19 00:27:40 +03:00
Patrick von Platen
03df3fbcb4 Update FINE_TUNE_XLSR_WAV2VEC2.md 2021-03-19 00:26:49 +03:00
Patrick von Platen
e84adbed40 Add XLSR-Wav2Vec2 Fine-Tuning README.md (#10786)
* upload

* upload fine-tuning script

* improve

* adapt

* Apply suggestions from code review

* correct

* upload

* finalize

* remove @

* correct typos
2021-03-19 00:22:43 +03:00
Sylvain Gugger
dcebe254fa Document v4.4.2 2021-03-18 15:19:25 -04:00
Sylvain Gugger
008672e6e5 Fix distributed evaluation (#10795)
* Fix distributed evaluation

* Use logger
2021-03-18 13:12:04 -04:00
Stas Bekman
9352b5151a [examples/seq2seq/README.md] fix t5 examples (#10734)
* [examples/seq2seq] fix t5 examples

This PR:
* fixes T5 examples to include `--source_prefix` - it's **not** optional. If you give it a try you will see that you get 10x worse bleu scores w/o it. w/ `27.6849`, w/ `2.374`
* added a normal translation example w/o the peculiarities of MBart and T5
* reduces the default max samples to 50 so it's much faster to test quickly

summarization seems to be broken for t5 score-wise: https://github.com/huggingface/transformers/issues/10733

@sgugger

* specify explicitly the t5 models requiring the special handling

* one more

* update the t5 summarization example to use cnn_dailymail

* move max*samples into the top level README.md

* better wording

* better wording
2021-03-18 09:55:39 -07:00
Vimarsh Chaturvedi
094afa515d from_pretrained: check that the pretrained model is for the right model architecture (#10586)
* Added check to ensure model name passed to from_pretrained and model are the same

* Added test to check from_pretrained throws assert error when passed an incompatiable model name

* Modified assert in from_pretrained with f-strings. Modified test to ensure desired assert message is being generated

* Added check to ensure config and model has model_type

* Fix FlauBERT heads

Co-authored-by: vimarsh chaturvedi <vimarsh chaturvedi>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-03-18 12:51:42 -04:00
Julien Chaumond
4f3e93cfaf [file_utils] do not gobble certain kinds of requests.ConnectionError (#10235)
* do not gobble certain kinds of requests.ConnectionError

* Apply review comments

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-03-18 12:37:45 -04:00
James Thomin
ce9724e1bd Fix bug in input check for LengthGroupSampler (#10783)
This commit fixes a bug in the LengthGroupSampler where if
model_input_name is not set, the default value is None instead of
"input_ids"
2021-03-18 10:25:57 -04:00
Suraj Patil
5f19c07a70 add run_common_voice script (#10767)
* add initial script

* finish script

* add shell script example

* accept chars_to_ignor as cl arg

* align the script with other example scripts

* add torchaudio dep
2021-03-18 17:21:16 +05:30
Mohamed El-Geish
af8afdc88d wav2vec2: support datasets other than LibriSpeech (#10581)
* wav2vec2: support datasets other than LibriSpeech

* Formatting run_asr.py to pass code quality test

* bundled orthography options and added verbose logs

* fixing a typo in timit fine-tuning script

* update comment for clarity

* resize_lm_head and load custom vocab from file

* adding a max_duration_in_seconds filter

* do not assign `duration_filter` lambda, use a def

* log untransliterated text as well

* fix base model for arabic

* fix duration filter when target_sr is not set

* drop duration_in_seconds when unneeded

* script for wav2vec2-large-lv60-timit-asr

* fix for "tha" in arabic corpus (huggingface#10581)

* adding more options to work with common_voice

* PR feedback (huggingface#10581)

* small README change
2021-03-18 10:20:26 +03:00
Patrick von Platen
0b98ca368f [Flax] Adapt Flax models to new structure (#9484)
* Create modeling_flax_eletra with code copied from modeling_flax_bert

* Add ElectraForMaskedLM and ElectraForPretraining

* Add modeling test for Flax electra and fix naming and arg in Flax Electra model

* Add documentation

* Fix code style

* Create modeling_flax_eletra with code copied from modeling_flax_bert

* Add ElectraForMaskedLM and ElectraForPretraining

* Add modeling test for Flax electra and fix naming and arg in Flax Electra model

* Add documentation

* Fix code style

* Fix code quality

* Adjust tol in assert_almost_equal due to very small difference between model output, ranging 0.0010 - 0.0016

* Remove redundant ElectraPooler

* save intermediate

* adapt

* correct bert flax design

* adapt roberta as well

* finish roberta flax

* finish

* apply suggestions

* apply suggestions

Co-authored-by: Chris Nguyen <anhtu2687@gmail.com>
2021-03-18 09:44:17 +03:00
Funtowicz Morgan
5c0bf39782 Add support for detecting intel-tensorflow version (#10781)
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2021-03-18 01:25:47 +01:00
Mansi Mane
0282e24eef Smmp batch not divisible by microbatches fix (#10778)
* Added debug prints

* Added config

* Added prints

* Added prints

* Added extra samples to SequentialDistributedSampler

* Added extra samples to SequentialDistributedSampler

Updated SequentialDistributedSampler call

* Added deubg prints

* Removed extra prints

* Making predicitons and labels multiple of batchsize

* updated number of microbatches

* Removed extra prints

* Made start_remainder similar to DistributedSamplerWithLoop

* Minor spacing update

* Added debug prints

Added config

Added prints

Added prints

* Added extra samples to SequentialDistributedSampler

Updated SequentialDistributedSampler call

Added extra samples to SequentialDistributedSampler

Added deubg prints

Removed extra prints

Making predicitons and labels multiple of batchsize

updated number of microbatches

Removed extra prints

Squashing redundant commits

* Made start_remainder similar to DistributedSamplerWithLoop

Minor spacing update

Made start_remainder similar to DistributedSamplerWithLoop

* Test and styling

* Rename test

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2021-03-17 19:18:11 -04:00
Sylvain Gugger
40b049c701 Check copies blackify (#10775)
* Apply black before checking copies

* Fix for class methods

* Deal with lonely brackets

* Remove debug and add forward changes

* Separate copies and fix test

* Add black as a test dependency
2021-03-17 18:11:20 -04:00
Stas Bekman
393739194e [examples] document resuming (#10776)
* document resuming in examples

* fix

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* put trainer code last, adjust notes

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-17 12:48:35 -07:00
Stas Bekman
85a114ef47 [Issue template] need to update/extend who to tag (#10728)
* [Issue template] need to update/extend who to tag

1. need to update who to tag for `tensorflow`
2. also requesting to add someone to tag for models hub issues - perhaps separate sub-entries for UI and code - e.g. I don't know who to tag for broken models: https://github.com/huggingface/transformers/issues/10726

Thanks.

* model hub instructions

* s/jplu/LysandreJik/
2021-03-17 11:33:14 -07:00
Stas Bekman
3318c246f3 make failure to find a resume checkpoint fatal + tests (#10777) 2021-03-17 11:16:37 -07:00
Stas Bekman
cd8c93f701 [DeepSpeed] improve checkpoint loading code plus tests (#10760)
* deepspeed checkpoint loading code plus tests

* style

* style
2021-03-17 10:22:58 -07:00
Stas Bekman
01c7fb04be [DeepSpeed] simplify init (#10762) 2021-03-17 10:21:03 -07:00
Patrick von Platen
0486ccdd3d small improvements (#10773) 2021-03-17 18:10:17 +03:00
Sylvain Gugger
d7e0d59bb7 Fix URLs 2021-03-17 11:03:43 -04:00
Stas Bekman
8715d20c97 [doc] [testing] extend the pytest -k section with more examples (#10761)
* [doc] [testing] extend -k section

This PR adds more examples on using `pytest -k` - I always forget that I want to use `-k A OR B` when I want several tests - I keep trying AND and it doesn't match any.

* style
2021-03-17 09:23:38 -04:00
Patrick von Platen
f20d75a13f up (#10771) 2021-03-17 16:15:14 +03:00
Cheng Li
c83fbc5f2d [Deepspeed] Allow HF optimizer and scheduler to be passed to deepspeed (#10464)
* pass hf optimizer and scheduler to deepspeed if not specified in ds config

* pass hf optimizer and scheduler to deepspeed if not specified in ds config

* update

* make init_deepspeed support config dict

* fix docstring formatting

* clean up trainer's comments

* add new tests

* fix type

* composit argparse doesn't work

* style

* add a new test, rename others

* document new functionality

* complete tests, add docs

* style

* correct level

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add new methods to the doc

* must tell DS we are using a non-native optimizer

* add protection against cpu_offload + HF optimizer combo

* fix the cli overrides

* sync docs + tests

* restore AdamW

* better docs

* need new version

* no longer needed

* remove outdate information

* refactor duplicated code

Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-16 15:51:09 -07:00
Lysandre Debut
c23248443c Patches full import failure when sentencepiece is not installed (#10752)
* Patches full import failure when sentencepiece is not installed

* Dummies :)
2021-03-16 15:58:20 -04:00
Lysandre
73fe40898d Docs for v4.4.1 2021-03-16 15:41:49 -04:00
Lysandre Debut
2097aa1826 Patches the full import failure and adds a test (#10750)
* Patches the full import failure and adds a test

* Add comment
2021-03-16 15:37:52 -04:00
Lysandre
1b5ce1e63b Development on v4.5.0dev0 2021-03-16 11:41:15 -04:00
Lysandre
c988db5af2 Release v4.4.0
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
2021-03-16 11:33:35 -04:00
Sylvain Gugger
5c02b97ca2 Fix URLs from #10744 (#10748) 2021-03-16 11:31:29 -04:00
Sylvain Gugger
a0a027c2ed Add DistributedSamplerWithLoop (#10746)
* Add DistributedSamplerWithLoop

* Fix typo

* Test and small fix
2021-03-16 11:22:39 -04:00
Lysandre Debut
1449222217 Fix DeBERTa + Conversational pipeline slow tests (#10743)
* Fix DeBERTa-v2 variable assignment

* Fix conversational pipeline test
2021-03-16 11:18:20 -04:00
Suraj Patil
d3d388b934 fix M2M100 example (#10745) 2021-03-16 20:20:00 +05:30
Sylvain Gugger
b5492582d0 Remove old links to CDN (#10744) 2021-03-16 10:48:53 -04:00
Lysandre Debut
5dcc08f1df Fix S2T example (#10741) 2021-03-16 08:55:07 -04:00
Sylvain Gugger
813d730c46 Release utils (#10735)
* Examples version update

* Refactor a bit

* All version updates

* Fixes

* README cleanup

* Post-release/patch

* Fixes

* More fixes

* Tests

* More fixes

* Moar fixes

* Make commands and update setup

* Replace spaces with weird tabs

* Fix test

* Style
2021-03-16 08:41:47 -04:00
Patrick von Platen
9f8619c6aa Flax testing should not run the full torch test suite (#10725)
* make flax tests pytorch independent

* fix typo

* finish

* improve circle ci

* fix return tensors

* correct flax test

* re-add sentencepiece

* last tokenizer fixes

* finish maybe now
2021-03-16 08:05:37 +03:00
Russell Klopfer
87d685b8a9 independent training / eval with local files (#10710)
* independent training / eval with local files

* remove redundant assert
2021-03-15 19:35:26 -04:00
Sylvain Gugger
4c379daf64 Add minimum version check in examples (#10724)
* Add minimum version check in examples

* Style

* No need for new line maybe?

* Add helpful comment
2021-03-15 19:29:54 -04:00
Joe Davison
966ba081c9 zero-shot pipeline multi_class -> multi_label (#10727) 2021-03-15 16:02:46 -06:00
Lysandre Debut
58f672e65c Tests run on Docker (#10681)
* Tests run on Docker

Co-authored-by: Morgan <funtowiczmo@gmail.com>

* Comments from code review

* Reply to itself

* Dependencies

Co-authored-by: Morgan <funtowiczmo@gmail.com>
2021-03-15 17:28:01 -04:00
MikeG112
d41dd5359b [Wav2Vec2] Fix documentation inaccuracy (#10694)
* Update super class reference

* Update default value reference

* Update src/transformers/models/wav2vec2/feature_extraction_wav2vec2.py

* Fix format style

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-03-15 20:11:17 +03:00
Sylvain Gugger
f5c097fc4d Fix backward compatibility with EvaluationStrategy (#10718) 2021-03-15 10:20:38 -04:00
Patrick von Platen
d9e693e1d0 make wav2vec2 test deterministic (#10714) 2021-03-15 09:50:05 -04:00
Sylvain Gugger
6bef764506 Multiple fixes in SageMakerTrainer (#10687)
* Handle save differently

* Missing imports

* Fix typo

* Adapt to recent changes in save_pretrained

* Forgotten brackets

* Optimizer load

* Fix world size

* Deal wth None

* Remove needless self
2021-03-15 09:28:15 -04:00
Adam Pocock
3f1714f8a7 Adding required flags to non-default arguments in hf_argparser (#10688)
* Adding required flags to non-default arguments.

Signed-off-by: Adam Pocock <adam.pocock@oracle.com>

* make style fix.

* Update src/transformers/hf_argparser.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-15 09:27:55 -04:00
Théo Matussière
6f840990a7 split seq2seq script into summarization & translation (#10611)
* split seq2seq script, update docs

* needless diff

* fix readme

* remove test diff

* s/summarization/translation

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* cr

* fix arguments & better mbart/t5 refs

* copyright

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* reword readme

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* s/summarization/translation

* short script names

* fix tests

* fix isort, include mbart doc

* delete old script, update tests

* automate source prefix

* automate source prefix for translation

* s/translation/trans

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* fix script name (short version)

* typos

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* exact parameter

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* remove superfluous source_prefix calls in docs

* rename scripts & warn for source prefix

* black

* flake8

Co-authored-by: theo <theo@matussie.re>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-03-15 09:11:42 -04:00
Igor Shalyminov
505494a86f GPT2DoubleHeadsModel made parallelizable (#10658)
* GPT2DoubleHeadsModel made parallelizeable

* GPT2DoubleHeadsModel added as parallelizeable onto the GPT2 test suite
2021-03-15 09:10:44 -04:00
Sylvain Gugger
e12d6f513e Distributed barrier before loading model (#10685) 2021-03-15 08:28:15 -04:00
Sylvain Gugger
339fc51acc fix styling 2021-03-15 07:59:35 -04:00
cronoik
4c41c6622c Wrong link to super class (#10709)
Documentation was referring to slow tokenizer class while it should be the fast tokenizer.
2021-03-15 07:39:10 -04:00
Suraj Patil
fcf10214e0 enable loading Mbart50Tokenizer with AutoTokenizer (#10690)
* enable auto tokenizer for mbart50 tokenizers

* fix imports
2021-03-15 16:20:37 +05:30
Patrick von Platen
bd8f6cafd4 make rag tests smaller (#10679) 2021-03-15 10:07:12 +03:00
Stas Bekman
4c32f9f26e AdamW is now supported by default (#9624) 2021-03-12 13:40:07 -08:00
ymfa
fa35cda91e Pass encoder outputs into GenerationMixin (#10599)
* Pass encoder_outputs into generate()

* Remove an if-statement

* Reformat

* Minimize changes to generate()

* Comment on input_ids
2021-03-12 21:43:11 +05:30
PaulLerner
00cad2e5c1 fix: #10628 expanduser path in TrainingArguments (#10660)
* fix: #10628 expanduser path in TrainingArguments

* docs: explain why we expand paths in TrainingArguments

* Style

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2021-03-12 09:18:19 -05:00
Sylvain Gugger
e8246f78f9 Add auto_wrap option in fairscale integration (#10673)
* Add auto_wrap option in fairscale integration

* Style
2021-03-12 07:50:20 -05:00
Lysandre Debut
184ef8ecd0 TensorFlow tests: having from_pt set to True requires torch to be installed. (#10664)
* TF model exists for Blenderbot 400M

* Marian

* RAG
2021-03-12 14:16:40 +03:00
Nicolas Patry
543d0549f8 Adding new parameter to generate: max_time. (#9846)
* [WIP] Adding new parameter to `generate`:  `max_time`.

Generation by tokens number is sometimes a bit clunky because we don't
know how many tokens are good enough or even how many tokens are in
the payload (for pipelines users for instance). This leads to hard
to understand behavior.

This PR proposes a new argument `max_time` which is a float of seconds
for the allowed time for `generate` to run on.
Ideally combinations of `max_tokens=None`, `max_time=2` could be used to
generate as many tokens as possible within time budget.

NB: Another possible approach consists of passing a callback to `generate`
  putting the caller in charge of the actual decision of when to stop
  generating tokens. It opens the door to 'which args should we pass'
  to this callback. It's hard to imagine other use-cases for this
  early stopping behavior than time (that are not already covered by
  parameters of generate)

* Revamp with StoppingCriteria

* Removing deprecated mentions.

* Forgot arguments to stopping criteria.

* Readding max_length it's not just used as a stopping criteria.

* Default value for `stopping_criteria`.

* Address @patrickvonplaten comments.

- More docstrings
- Actual doc
- Include in global namespace
- Remove TF work.

* Put back `max_length` (deprecation different PR).

* Doc quality.

* Fixing old behavior without `stopping_criteria` but with `max_length`.

Making sure we don't break that in the future.

* Adding more tests for possible inconsistencies between

`max_length` and `stopping_criteria`.

* Fixing the torch imports.
2021-03-12 10:11:50 +01:00
Lysandre Debut
ea46e3fa9c Adjust loss difference (#10669) 2021-03-12 09:09:46 +03:00
Benjamin Fineran
c526bde319 fix typing error for HfArgumentParser for Optional[bool] (#10672)
* fix typing error for TrainingArguments Optional[bool]

* updating equality check for Optional[bool]
2021-03-11 17:42:54 -05:00
Sylvain Gugger
fa1a8d102f Tentative fix for HFArgumentParser in Python 3.8 2021-03-11 14:44:29 -05:00
WybeKoper
2f8485199c Fix broken link (#10656)
* Fixed broken link

* fixed max length violation

Co-authored-by: WybeKoper <WybeKoper@users.noreply.github.com>
2021-03-11 14:29:02 -05:00
jeswan
a01ea31b5c Add DeBERTa to MODEL_FOR_PRETRAINING_MAPPING (#10668)
* add deberta to pretraining mapping

* add deberta_v2 to PRETRAINING_MAPPING
2021-03-11 13:56:47 -05:00
Lysandre Debut
9fbb4cdc80 Specify minimum version for sacrebleu (#10662) 2021-03-11 13:45:06 -05:00
Sylvain Gugger
fda703a553 Fix integration slow tests (#10670)
* PoC

* Fix slow tests for the PT1.8 Embedding problem
2021-03-11 13:43:53 -05:00
Funtowicz Morgan
3ab6820370 Onnx fix test (#10663)
* Allow to pass kwargs to model's from_pretrained when using pipeline.

* Disable the use of past_keys_values for GPT2 when exporting to ONNX.

* style

* Remove comment.

* Appease the documentation gods

* Fix style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-03-11 13:38:29 -05:00
Lysandre Debut
a637ae00c4 Fixes Pegasus tokenization tests (#10671) 2021-03-11 13:35:50 -05:00
Lysandre Debut
7e4428749c Conversion to tensors requires padding (#10661) 2021-03-11 12:58:15 -05:00
Lysandre Debut
2adc8c926a W2v2 test require torch (#10665)
* Adds a @require_torch to a test that requires it

* Tokenizer too

* Style
2021-03-11 12:56:12 -05:00
Suraj Patil
055ed78f52 [S2T] fix example in docs (#10667) 2021-03-11 22:43:37 +05:30
Sylvain Gugger
89693e170d Remove special treatment for custom vocab files (#10637)
* Remove special path for custom vocab files

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Expand error message

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-03-11 11:11:56 -05:00
Lysandre Debut
6d9e11a193 S2S + M2M100 should be available in tokenization_auto (#10657)
* S2S + M2M100 should be available in tokenization_auto

* Requires sentencepiece

* SentencePiece for S2T as well :)
2021-03-11 09:53:36 -05:00
Patrick von Platen
602d63f05c [XLSR-Wav2Vec2] Add multi-lingual Wav2Vec2 models (#10648)
* add conversion script

* add wav2vec2 xslr models

* finish

* Update docs/source/model_doc/xlsr_wav2vec2.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-11 17:44:18 +03:00
Sylvain Gugger
63c295ac05 Ensure metric results are JSON-serializable (#10632) 2021-03-11 09:00:23 -05:00
ArvidYin
27d9e05ce2 Update README.md (#10647)
correct spell error: 'nether'
2021-03-11 08:58:04 -05:00
Lysandre Debut
053f0197b8 merge_file -> merges_file (#10653) 2021-03-11 08:34:08 -05:00
Sylvain Gugger
26a33cfd8c Document Trainer limitation on custom models (#10635) 2021-03-10 14:58:22 -05:00
Philipp Schmid
49c61a4ae7 Extend trainer logging for sm (#10633)
* renamed logging to hf_logging

* changed logging from hf_logging to logging and loggin to native_logging

* removed everything trying to fix import Trainer error

* adding imports again

* added custom add_handler function to logging.py

* make style

* added remove_handler

* added another conditional to assert
2021-03-10 20:53:49 +01:00
Sylvain Gugger
1aa9c13f70 Fix GPU tests with speech 2021-03-10 12:51:06 -05:00
Sylvain Gugger
2295d783d5 Copy tokenizer files in each of their repo (#10624)
* Move tokenizer files in each repo

* Fix mBART50 tests

* Fix mBART tests

* Fix Marian tests

* Update templates
2021-03-10 11:26:23 -05:00
Suraj Patil
d26b37e744 Speech2TextTransformer (#10175)
* s2t

* fix config

* conversion script

* fix import

* add tokenizer

* fix tok init

* fix tokenizer

* first version working

* fix embeds

* fix lm head

* remove extra heads

* fix convert script

* handle encoder attn mask

* style

* better enc attn mask

* override _prepare_attention_mask_for_generation

* handle attn_maks in encoder and decoder

* input_ids => input_features

* enable use_cache

* remove old code

* expand embeddings if needed

* remove logits bias

* masked_lm_loss => loss

* hack tokenizer to support feature processing

* fix model_input_names

* style

* fix error message

* doc

* remove inputs_embeds

* remove input_embeds

* remove unnecessary docstring

* quality

* SpeechToText => Speech2Text

* style

* remove shared_embeds

* subsample => conv

* remove Speech2TextTransformerDecoderWrapper

* update output_lengths formula

* fix table

* remove max_position_embeddings

* update conversion scripts

* add possibility to do upper case for now

* add FeatureExtractor and Processor

* add tests for extractor

* require_torch_audio => require_torchaudio

* add processor test

* update import

* remove classification head

* attention mask is now 1D

* update docstrings

* attention mask should be of type long

* handle attention mask from generate

* alwyas return attention_mask

* fix test

* style

* doc

* Speech2TextTransformer => Speech2Text

* Speech2TextTransformerConfig => Speech2TextConfig

* remove dummy_inputs

* nit

* style

* multilinguial tok

* fix tokenizer

* add tgt_lang setter

* save lang_codes

* fix tokenizer

* add forced_bos_token_id to tokenizer

* apply review suggestions

* add torchaudio to extra deps

* add speech deps to CI

* fix dep

* add libsndfile to ci

* libsndfile1

* add speech to extras all

* libsndfile1 -> libsndfile1

* libsndfile

* libsndfile1-dev

* apt update

* add sudo to install

* update deps table

* install libsndfile1-dev on CI

* tuple to list

* init conv layer

* add model tests

* quality

* add integration tests

* skip_special_tokens

* add speech_to_text_transformer in toctree

* fix tokenizer

* fix fp16 tests

* add tokenizer tests

* fix copyright

* input_values => input_features

* doc

* add model in readme

* doc

* change checkpoint names

* fix copyright

* fix code example

* add max_model_input_sizes in tokenizer

* fix integration tests

* add do_lower_case to tokenizer

* remove clamp trick

* fix "Add modeling imports here"

* fix copyrights

* fix tests

* SpeechToTextTransformer => SpeechToText

* fix naming

* fix table formatting

* fix typo

* style

* fix typos

* remove speech dep from extras[testing]

* fix copies

* rename doc file,

* put imports under is_torch_available

* run feat extract tests when torch is available

* dummy objects for processor and extractor

* fix imports in tests

* fix import in modeling test

* fxi imports

* fix torch import

* fix imports again

* fix positional embeddings

* fix typo in import

* adapt new extractor refactor

* style

* fix torchscript test

* doc

* doc

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix docs, copied from, style

* fix docstring

* handle imports

* remove speech from all extra deps

* remove s2t from seq2seq lm mapping

* better names

* skip training tests

* add install instructions

* List => Tuple

* doc

* fix conversion script

* fix urls

* add instruction for libsndfile

* fix fp16 test

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-10 21:42:04 +05:30
Sylvain Gugger
efb5c0a453 Add new GLUE example with no Trainer. (#10555)
* Add new GLUE example with no Trainer.

* Style

* Address review comments
2021-03-10 09:29:19 -05:00
Suraj Patil
44f64132a5 remove final_logits_bias (#10606) 2021-03-10 09:52:31 +05:30
Allen Wang
6f52fce673 Fixes an issue in text-classification where MNLI eval/test datasets are not being preprocessed. (#10621)
* Fix MNLI tests

* Linter fix
2021-03-09 22:13:45 -05:00
Sylvain Gugger
72d9e039f9 Fix tests of TrainerCallback (#10615)
* Fix tests of TrainerCallback

* Update tests/test_trainer_callback.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-09 16:25:32 -05:00
Sylvain Gugger
0d909f6bd8 Fairscale FSDP fix model save (#10596)
* Hotfix fairscale FSDP

* Evaluation works

* Save on process zero
2021-03-09 14:42:07 -05:00
Bhadresh Savani
ac17f71159 added max_sample args and metrics changes (#10602) 2021-03-09 12:06:56 -05:00
Philipp Schmid
c19c811a2d Trigger add sm information (#10610)
* added sm to ua

* update id

* removed id

* removed comments

* added env variable

* changed variable name

* make quality happy

* added sguggers feedback

* make styling happy and remove brackets

* added sm to ua

* update id

* removed id

* removed comments

* added env variable

* changed variable name

* make quality happy

* added sguggers feedback

* make styling happy and remove brackets
2021-03-09 17:31:45 +01:00
Suraj Patil
20c10258a4 layerdrop 0 (#10604) 2021-03-09 17:35:07 +03:00
Lysandre
95ab06778c Update cache version for github actions 2021-03-09 07:10:58 -05:00
Patrick von Platen
9a06b6b11b [FeatureExtractorSavingUtils] Refactor PretrainedFeatureExtractor (#10594)
* save first version

* finish refactor

* finish refactor

* correct naming

* correct naming

* shorter names

* Update src/transformers/feature_extraction_common_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* change name

* finish

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-09 12:16:59 +03:00
Stas Bekman
b6a28e9ac9 [docs] How to solve "Title level inconsistent" sphinx error (#10600)
* How to solve: Title level inconsistent

* list chars
2021-03-08 20:16:33 -08:00
Lysandre Debut
546cbe7e9e Speedup tf tests (#10601)
* Pipeline tests should be slow

* Temporarily mark some tests as slow

* Temporarily mark Barthez tests as slow
2021-03-08 21:44:07 -05:00
Ratthachat (Jung)
696e8a4365 Add TFRag (#9002)
* Create modeling_tf_dpr.py

* Add TFDPR

* Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot

last commit accidentally deleted these 4 lines, so I recover them back

* Add TFDPR

* Add TFDPR

* clean up some comments, add TF input-style doc string

* Add TFDPR

* Make return_dict=False as default

* Fix return_dict bug (in .from_pretrained)

* Add get_input_embeddings()

* Create test_modeling_tf_dpr.py

The current version is already passed all 27 tests!
Please see the test run at : 
https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing

* fix quality

* delete init weights

* run fix copies

* fix repo consis

* del config_class, load_tf_weights

They shoud be 'pytorch only'

* add config_class back

after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion

* newline after .. note::

* import tf, np (Necessary for ModelIntegrationTest)

* slow_test from_pretrained with from_pt=True

At the moment we don't have TF weights (since we don't have official official TF model)
Previously, I did not run slow test, so I missed this bug

* Add simple TFDPRModelIntegrationTest

Note that this is just a test that TF and Pytorch gives approx. the same output.
However, I could not test with the official DPR repo's output yet

* upload correct tf model

* remove position_ids as missing keys

* create modeling_tf_rag

* add tests for tf

* add tf tests

* revert wrong pt commit

* further refactor

* further refactor

* refactor

* Update modeling_tf_rag.py

- input_processing
- fix prepare_input_for_generation (mostly fix generate bug)
- bring back from_pretrained hack in order to test generate

* delete colab pieces of code

* Show case of greedy "generate"

Temporarily change from beam_search test to greedy_search test to show case that TF and PT do get equivalent output.

* cosmetic update

* correct typos

* update

* push some progress

* make easy check

* fix rag save from pretrained

* Update src/transformers/modeling_tf_utils.py

* remove commented out lines

* delete unnecessary lines

* add simple test case for nq_checkpoint

Add nq_checkpoint test to show that current version without hack still fails

* temporarily put ugly hack back again

* Add TFRagSequenceForGeneration!!

* __init__.py , import TFRagSequenceForGeneration

* Add TFRagSequence tests!

* rag init.py - add TFRagSequenceForGeneration

* fix from_pretrained

* fix prepare_inputs_for_generation

* Beam search for RagToken!

* minor clean up

* add tf.cast in TFRagModel

* More tf.cast

* Add all remaining tests (still have issues)

* delete all T5 related

* make style

* fix load weight prefix

* fix bart

* fix return_dict for tf_rag

make all tests pass .. Hooray

* fix some tests

* fix code quality

* fix qualtiy check

* finish tests tf rag

* add tf rag to docs

* remove TFT5 from docstring

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* remove TFT5 from docstring

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Delete outdated comments

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* improve doc strings

* add generative model classes

* fix adjust token logic

* refactor generate for TFRag

* using shape_list, not _get_shape

Co-authored-by: Julien Plu <plu.julien@gmail.com>

* axis=[1]->axis=1

* delete NEED_HELP comment

* improve readability

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* improve readability

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* improve readability

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Indicating model is in a developing state in docstrings

As suggested by Julien

* small last changes

* apply sylvains suggestions

* finish tf rag

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patrickvonplaten <patrick@huggingface.co>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-09 00:49:51 +03:00
Sylvain Gugger
3ced9b3eb9 Check layer types for Optimizer construction (#10598)
* Check layer types for Optimizer construction

* Duplicate class
2021-03-08 16:40:11 -05:00
Sylvain Gugger
821d518e03 Revert "Tests"
This reverts commit b35e7b68ca.
2021-03-08 16:05:55 -05:00
Sylvain Gugger
4196bfeda0 Revert "Style"
This reverts commit a8ec52efc2.
2021-03-08 16:05:52 -05:00
Sylvain Gugger
a8ec52efc2 Style 2021-03-08 16:04:46 -05:00
Sylvain Gugger
b35e7b68ca Tests 2021-03-08 16:04:30 -05:00
Stas Bekman
f284089ec4 [examples tests on multigpu] resolving require_torch_non_multi_gpu_but_fix_me (#10561)
* batch 1

* this is tpu

* deebert attempt

* the rest
2021-03-08 11:11:40 -08:00
Bhadresh Savani
dfd16af832 Added max_sample_ arguments (#10551)
* reverted changes of logging and saving metrics

* added max_sample arguments

* fixed code

* white space diff

* reformetting code

* reformatted code
2021-03-08 13:57:10 -05:00
Stas Bekman
917f104502 [examples tests] various fixes (#10584)
* fix sharded ddp enum

* test fixes

* stronger validation + apex breaks other tests
2021-03-08 10:28:44 -08:00
Stas Bekman
6f84531e61 offline mode for firewalled envs (part 2) (#10569)
* more readable test

* add all the missing places

* one more nltk

* better exception check

* revert
2021-03-08 08:52:20 -08:00
Sylvain Gugger
5469369480 Fix version control with anchors (#10595)
* Fix version control with anchors

* Simplify
2021-03-08 10:19:22 -05:00
Stas Bekman
f882966004 fix double wrapping + test (#10583) 2021-03-08 10:15:55 -05:00
Mehrad Moradshahi
b880508440 tokenization_marian.py: use current_spm for decoding (#10357)
* Fix Marian decoding

Tokenizer's decode and batch_decode now accepts a new argument (use_source_tokenizer) which indicates whether the source spm should be used to decode ids. This is useful for Marian models specificallly when decoding source input ids.

* Adapt docstrings

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2021-03-08 08:14:31 -05:00
Lysandre
8fd7eb34e2 Correct YAML 2021-03-08 07:13:49 -05:00
Lysandre Debut
89b8d4f568 Enable torch 1.8.0 on GPU CI (#10593)
* Enable torch 1.8.0 in GPU CI

* Disable torch-scatter
2021-03-08 07:11:43 -05:00
Suraj Patil
2a737bffef [M2M100] fix positional embeddings (#10590)
* fix tests

* emb should be a parameter

* fix positional embeddings

* fix make_weights

* don't save pos embeds

* add comment to describe the clamping
2021-03-08 16:06:19 +05:30
Oren Amsalem
d59464db6b fix BART Summarization example in doc (#10582) 2021-03-08 15:45:06 +05:30
Eunhyuk Shin
3b583d02d6 Fix typo in docstring for pipeline (#10591) 2021-03-08 15:40:03 +05:30
Stas Bekman
e6ce636e02 fix nltk lookup (#10585) 2021-03-07 22:09:58 -08:00
Yu
9dd054fba2 fix tf doc bug (#10570) 2021-03-07 22:31:50 -05:00
Suraj Patil
f6e74a63ca Add m2m100 (#10236)
* m2m_100

* no layernorm_embedding

* sinusoidal positional embeddings

* update pos embeddings

* add default config values

* tokenizer

* add conversion script

* fix config

* fix pos embed

* remove _float_tensor

* update tokenizer

* update lang codes

* handle lang codes

* fix pos embeds

* fix spm key

* put embedding weights on device

* remove qa and seq classification heads

* fix convert script

* lang codes pn one line

* fix embeds

* fix tokenizer

* fix tokenizer

* add fast tokenizer

* style

* M2M100MT => M2M100

* fix copyright, style

* tokenizer converter

* vocab file

* remove fast tokenizer

* fix embeds

* fix tokenizer

* fix tests

* add tokenizer tests

* add integration test

* quality

* fix model name

* fix test

* doc

* doc

* fix doc

* add copied from statements

* fix tokenizer tests

* apply review suggestions

* fix urls

* fix shift_tokens_right

* apply review suggestions

* fix

* fix doc

* add lang code to id

* remove unused function

* update checkpoint names

* fix copy

* fix tokenizer

* fix checkpoint names

* fix merge issue

* style
2021-03-06 22:14:16 +05:30
Lysandre
fd01104435 Temporarily disable stale bot 2021-03-06 00:21:50 -05:00
Stas Bekman
88a951e3cc offline mode for firewalled envs (#10407)
* offline mode start

* add specific values

* fix fallback

* add test

* better values check and range

* test that actually works

* document the offline mode

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* more strict check

* cleaner test

* pt-only test

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-05 17:27:48 -08:00
Daniel Hug
90ecc29656 Refactoring checkpoint names for multiple models (#10527)
* Refactor checkpoint name in ALBERT and ALBERT_tf

* Refactor checkpoint name in BART and BART_tf

* Refactor checkpoint name in BERT generation

* Refactor checkpoint name in Blenderbot_tf

* Refactor checkpoint name in Blenderbot_small_tf

* Refactor checkpoint name in ConvBERT AND CONVBERT_TF

* Refactor checkpoint name in CTRL AND CTRL_TF

* Refactor checkpoint name in DistilBERT AND DistilBERT_TF

* Refactor checkpoint name in DistilBERT redo

* Refactor checkpoint name in Electra and Electra_tf

* Refactor checkpoint name in FlauBERT and FlauBERT_tf

* Refactor checkpoint name in FSMT

* Refactor checkpoint name in GPT2 and GPT2_tf

* Refactor checkpoint name in IBERT

* Refactor checkpoint name in LED and LED_tf

* Refactor checkpoint name in Longformer and Longformer_tf

* Refactor checkpoint name in Lxmert and Lxmert_tf

* Refactor checkpoint name in Marian_tf

* Refactor checkpoint name in MBART and MBART_tf

* Refactor checkpoint name in MobileBERT and MobileBERT_tf

* Refactor checkpoint name in mpnet and mpnet_tf

* Refactor checkpoint name in openai and openai_tf

* Refactor checkpoint name in pegasus_tf

* Refactor checkpoint name in reformer

* Refactor checkpoint name in Roberta and Roberta_tf

* Refactor checkpoint name in SqueezeBert

* Refactor checkpoint name in Transformer_xl and Transformer_xl_tf

* Refactor checkpoint name in XLM and XLM_tf

* Refactor checkpoint name in XLNET and XLNET_tf

* Refactor checkpoint name in BERT_tf

* run make tests, style, quality, fixup
2021-03-05 18:06:55 -05:00
Lysandre Debut
defe9e20fe Stale Bot (#10509)
* Add stale bot to Github Actions

* Update message

* Message for assignee

* Update scripts/stale.py

* Uncomment & stop testing
2021-03-05 16:41:50 -05:00
Sylvain Gugger
7da995c00c Fix embeddings for PyTorch 1.8 (#10549)
* Fix embeddings for PyTorch 1.8

* Try with PyTorch 1.8.0

* Fix embeddings init

* Fix copies

* Typo

* More typos
2021-03-05 16:18:48 -05:00
Chen Liang
3e056c1003 Typo correction. (#10531)
DEBERTA_PRETRAINED_MODEL_ARCHIVE_LIST => DEBERTA_V2_PRETRAINED_MODEL_ARCHIVE_LIST in line 31.
2021-03-05 15:27:09 -05:00
Joakim Warholm
9f8bc87cbe fixed dead link in trainer doc (#10554) 2021-03-05 14:56:37 -05:00
Lysandre Debut
6b58e15507 Fix torch 1.8.0 segmentation fault (#10546)
* Only run one test

* Patch segfault

* Fix summarization pipeline

* Ready for merge
2021-03-05 12:10:19 -05:00
Patrick von Platen
395ffcd757 fix run seq2seq (#10547) 2021-03-05 18:17:12 +03:00
Nicolas Patry
54e55b52d4 Fixing conversation test for torch 1.8 (#10545) 2021-03-05 09:24:14 -05:00
Lysandre
dc9aaa3848 Pin torch to 1.7.1 in tests while we resolve issues 2021-03-05 07:57:35 -05:00
lewtun
12b66215cf Fix example of custom Trainer to reflect signature of compute_loss (#10537) 2021-03-05 07:44:53 -05:00
Lysandre
093b88f4e9 Update scatter to use torch 1.8.0 2021-03-05 07:31:51 -05:00
Patrick von Platen
c503a1c15e [ProphetNet] Bart-like Refactor (#10501)
* first step to refactor

* make all fast tests pass

* make all slow tests pass

* save intermediate

* correct cache

* finish PR

* make fp16 work
2021-03-04 23:27:12 +03:00
Sylvain Gugger
6290169eb3 Rework TPU checkpointing in Trainer (#10504)
* Rework TPU checkpointing in Trainer

* Wraps the barrier in a dist test

* Address review comments

* Remove line
2021-03-04 11:46:11 -05:00
Philipp Schmid
805c5200dc Removes overwrites for output_dir (#10521)
* removed overwrites

* remove default value for output_dir

* adjusted typing
2021-03-04 17:12:37 +01:00
Sylvain Gugger
a5bd40b75c Not always consider a local model a checkpoint in run_glue (#10517) 2021-03-04 11:11:39 -05:00
Sylvain Gugger
745ea78dcc Revert "Not always consider a local model a checkpoint in run_glue"
This reverts commit f3660613bc.
2021-03-04 09:45:18 -05:00
Sylvain Gugger
f3660613bc Not always consider a local model a checkpoint in run_glue 2021-03-04 09:44:02 -05:00
Sylvain Gugger
948b730f97 Remove unsupported methods from ModelOutput doc (#10505) 2021-03-03 14:55:18 -05:00
Sylvain Gugger
b70f441b72 Smp grad accum (#10488)
* Fix gradient accumulation for SM Model Parallelism

* Style and divide loss by grad accum steps
2021-03-03 12:13:29 -05:00
felixgwu
d064fb5647 Fix the bug in constructing the all_hidden_states of DeBERTa v2 (#10466)
* fix all_hidden_states

* use output_states instead of next_kv
2021-03-03 12:05:21 -05:00
Stas Bekman
188574ac50 remap MODEL_FOR_QUESTION_ANSWERING_MAPPING classes to names auto-generated file (#10487)
* remap classes to strings

* missing new util

* style

* doc

* move the autogenerated file

* Trigger CI
2021-03-03 08:54:00 -08:00
Sylvain Gugger
801ff969ce Refactor checkpoint name in BERT and MobileBERT (#10424)
* Refactor checkpoint name in BERT and MobileBERT

* Add option to check copies

* Add QuestionAnswering

* Add last models

* Make black happy
2021-03-03 11:21:17 -05:00
Jeff Yang
39f70a4058 feat(docs): navigate with left/right arrow keys (#10481)
* feat(docs): navigate with left/right arrow keys

* fix: add missing comma
2021-03-03 11:17:12 -05:00
Patrick von Platen
2d2ed2cc18 [T5] Fix speed degradation bug t5 (#10496)
* fix speed degradation bug t5

* fix for all models

* fix code quality
2021-03-03 12:42:41 +03:00
WybeKoper
5dc303e281 Fixed minor spelling mistakes (#10489)
Co-authored-by: WybeKoper <WybeKoper@users.noreply.github.com>
2021-03-03 14:17:25 +05:30
Mehrad Moradshahi
1750e62900 Generate can return cross-attention weights too (#10493) 2021-03-03 13:57:02 +05:30
Martin Schmitt
b013842244 Changed num_beams to num_beams // num_beam_groups when initialising PrefixConstrainedLogitsProcessor in _get_logits_processor to fix compatibility issue when constrained decoding is used together with grouped beam search (#10475) 2021-03-02 10:41:54 +03:00
Lysandre Debut
0c2325198f Add I-BERT to README (#10462) 2021-03-01 12:12:31 -05:00
Lysandre Debut
9248e27037 Remove Anthony from the bug reports in Transformers 2021-03-01 10:23:40 -05:00
Suraj Patil
a106bde5a7 [Wav2Vec2FeatureExtractor] smal fixes (#10455)
* smal fixes

* don't check for None
2021-03-01 20:19:52 +05:30
Patrick von Platen
11655fafdd remove feature extraction config (#10457) 2021-03-01 12:30:12 +03:00
Patrick von Platen
0234de8418 Add Fine-Tuning for Wav2Vec2 (#10145)
* add encode labels function to tokenizer

* start adding finetuning

* init dropout

* upload

* correct convert script

* apply changes

* fix second typo

* make first dummy training run

* adapt convert script

* push confg for comparison

* remove conf

* finish training

* adapt data collator

* add research folder

* update according to fairseq feedback

* some minor corrections

* refactor masking indices a bit

* some minor changes

* clean tokenizer

* finish clean-up

* remove previous logic

* update run script

* correct training

* finish changes

* finish model

* correct bug

* fix training a bit more

* add some tests

* finish gradient checkpointing

* finish example

* correct gradient checkpointing

* improve tokenization method

* revert changes in tokenizer

* revert general change

* adapt fine-tuning

* update

* save intermediate test

* Update README.md

* finish finetuning

* delete conversion script

* Update src/transformers/models/wav2vec2/configuration_wav2vec2.py

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* finish wav2vec2 script

* finish wav2vec2 fine-tuning

* finalize test

* correct test

* adapt tests

* finish

* remove test file

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-01 12:13:17 +03:00
Patrick von Platen
3c733f3208 Update ibert.rst (#10445) 2021-02-28 19:03:49 +03:00
Darigov Research
aeba4f95bb Adds terms to Glossary (#10443)
* feat: Adds three definitions to glossary from @cronoik

Needed a definition for transformer which in turn needed 2 more definitions

To do with issue https://github.com/huggingface/transformers/issues/9078

* fix: Adjusts definition of neural network to make it easier to read
2021-02-28 08:27:54 -05:00
Tanmay Garg
256482ac92 Introduce save_strategy training argument (#10286)
* Introduce save_strategy training argument

* deprecate EvaluationStrategy

* collapse EvaluationStrategy and LoggingStrategy into a single
  IntervalStrategy enum

* modify tests to use modified enum
2021-02-27 19:34:22 -05:00
Bhadresh Savani
aca6288ff4 updated logging and saving metrics (#10436)
* updated logging and saving metrics

* space removal
2021-02-27 09:53:44 -08:00
Stas Bekman
f52a15897b [run_seq2seq.py] restore functionality: saving to test_generations.txt (#10428)
This PR restores the original functionality that for some reason was modified.

Fixes: https://github.com/huggingface/transformers/issues/10381

@sgugger
2021-02-27 08:21:50 -08:00
Lysandre Debut
311b7048c5 Fix conda-build (#10431) 2021-02-26 20:20:30 -05:00
Stas Bekman
ee04b69822 [examples] better model example (#10427)
* refactors

* typo
2021-02-26 17:01:01 -08:00
Amog Kamsetty
a85eb616f7 Ray Tune Integration Bug Fixes (#10406)
* fixes

* update resources

* formatting

* remove import

* add log statement

* use fstring

* add period

* Update src/transformers/integrations.py
2021-02-26 19:06:08 -05:00
Kai Fricke
98569d4ba2 Add Ray Tune hyperparameter search integration test (#10414) 2021-02-26 10:18:33 -05:00
Patrick von Platen
d03695f3a2 [LED] Correct Docs (#10419)
* correct docs

* correct tf model docs as well
2021-02-26 17:53:28 +03:00
Mansi Mane
7fc686efb1 Sagemaker Model Parallel tensoboard writing fix (#10403)
* Added tb fix

* Removed local rank condition

* Updated reference to args
2021-02-26 08:04:55 -05:00
Julien Chaumond
83d2d55c94 [ci, flax] non-existing models are unlikely to pass tests (#10409)
😂
2021-02-26 12:35:36 +03:00
Sylvain Gugger
17b6e0d474 Fix run_glue evaluation when model has a label correspondence (#10401) 2021-02-25 15:30:38 -05:00
Sylvain Gugger
26f8b2cb10 Make Barthez tokenizer tests a bit faster (#10399)
* Make Barthez tokenizer tests a bit faster

* Quality
2021-02-25 11:42:25 -05:00
Andrea Bacciu
b040e6efc1 Fix None in add_token_positions - issue #10210 (#10374)
* Fix None in add_token_positions - issue #10210

Fix None in add_token_positions related to the issue #10210

* add_token_positions fix None values in end_positions vector

add_token_positions fix None in end_positions vector as proposed by @joeddav
2021-02-25 09:18:33 -07:00
Sylvain Gugger
9d14be5c20 Add support for ZeRO-2/3 and ZeRO-offload in fairscale (#10354)
* Ass support for ZeRO-2/3 and ZeRO-offload in fairscale

* Quality

* Rework from review comments

* Add doc

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Address review comments

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-02-25 11:07:53 -05:00
Lysandre Debut
88cc26dcd1 Ignore unexpected weights from PT conversion (#10397) 2021-02-25 10:42:27 -05:00
Sehoon Kim
63645b3b11 I-BERT model support (#10153)
* IBertConfig, IBertTokentizer added

* IBert Model names moified

* tokenizer bugfix

* embedding -> QuantEmbedding

* quant utils added

* quant_mode added to configuration

* QuantAct added, Embedding layer + QuantAct addition

* QuantAct added

* unused path removed, QKV quantized

* self attention layer all quantized, except softmax

* temporarl commit

* all liner layers quantized

* quant_utils bugfix

* bugfix: requantization missing

* IntGELU added

* IntSoftmax added

* LayerNorm implemented

* LayerNorm implemented all

* names changed: roberta->ibert

* config not inherit from ROberta

* No support for CausalLM

* static quantization added, quantize_model.py removed

* import modules uncommented

* copyrights fixed

* minor bugfix

* quant_modules, quant_utils merged as one file

* import * fixed

* unused runfile removed

* make style run

* configutration.py docstring fixed

* refactoring: comments removed, function name fixed

* unused dependency removed

* typo fixed

* comments(Copied from), assertion string added

* refactoring: super(..) -> super(), etc.

* refactoring

* refarctoring

* make style

* refactoring

* cuda -> to(x.device)

* weight initialization removed

* QuantLinear set_param removed

* QuantEmbedding set_param removed

* IntLayerNorm set_param removed

* assert string added

* assertion error message fixed

* is_decoder removed

* enc-dec arguments/functions removed

* Converter removed

* quant_modules docstring fixed

* conver_slow_tokenizer rolled back

* quant_utils docstring fixed

* unused aruments e.g. use_cache removed from config

* weight initialization condition fixed

* x_min, x_max initialized with small values to avoid div-zero exceptions

* testing code for ibert

* test emb, linear, gelu, softmax added

* test ln and act added

* style reformatted

* force_dequant added

* error tests overrided

* make style

* Style + Docs

* force dequant tests added

* Fix fast tokenizer in init

* Fix doc

* Remove space

* docstring, IBertConfig, chunk_size

* test_modeling_ibert refactoring

* quant_modules.py refactoring

* e2e integration test added

* tokenizers removed

* IBertConfig added to tokenizer_auto.py

* bugfix

* fix docs & test

* fix style num 2

* final fixes

Co-authored-by: Sehoon Kim <sehoonkim@berkeley.edu>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-02-25 10:06:42 -05:00
Patrick von Platen
cb38ffcc5e [PretrainedFeatureExtractor] + Wav2Vec2FeatureExtractor, Wav2Vec2Processor, Wav2Vec2Tokenizer (#10324)
* push to show

* small improvement

* small improvement

* Update src/transformers/feature_extraction_utils.py

* Update src/transformers/feature_extraction_utils.py

* implement base

* add common tests

* make all tests pass for wav2vec2

* make padding work & add more tests

* finalize feature extractor utils

* add call method to feature extraction

* finalize feature processor

* finish tokenizer

* finish general processor design

* finish tests

* typo

* remove bogus file

* finish docstring

* add docs

* finish docs

* small fix

* correct docs

* save intermediate

* load changes

* apply changes

* apply changes to doc

* change tests

* apply surajs recommend

* final changes

* Apply suggestions from code review

* fix typo

* fix import

* correct docstring
2021-02-25 17:42:46 +03:00
abhishek thakur
9dc7825744 Remove unused variable in example for Q&A (#10392) 2021-02-25 09:18:47 -05:00
mingruimingrui
894db6701e Bugfix: Removal of padding_idx in BartLearnedPositionalEmbedding (#10200)
* Assumption of padding_idx <2 might not stand

* Use offset instead of 2

* Fix with black

* Change behavior to warning instead for backward compatibility.

* Fix with black

* Remove warning

* Make padding_idx non-required

* padding_idx fix for blenderbot

* padding_idx fix for blenderbot_small

* padding_idx fix for led

* padding_idx fix for mbart

* Remove extra whitespaces

* padding_idx fix for template

* Fix padding_idx passed to nn.Embedding mistake

* Fixed padding_idx passed to positional embedding in template

* Remove padding_idx from pytorch learned positional embeddings

* Remove accidentally added quotes

* Remove padding_idx from tf learned positional embeddings

* Remove zeroing of weights in __init__

Co-authored-by: Wang Ming Rui <mingrui.wang@C02CJTUYMD6M.local>
2021-02-25 14:33:13 +03:00
Lysandre Debut
55fe80d084 Only run model templates tests once (#10388) 2021-02-24 19:48:00 -05:00
Lysandre Debut
22bd047e91 Run GA on every push even on forks (#10383) 2021-02-24 19:23:39 -05:00
Lysandre
3591844306 v4.3.3 docs 2021-02-24 15:19:01 -05:00
Stas Bekman
bdbb2c756b [trainer] move secondary methods into a separate file (#10363)
* move secondary methods into a separate file

* cleanup

* style
2021-02-24 08:32:52 -08:00
Poedator
5f2a3d721c fix deprecated ref to tokenizer.max_len (#10220)
This is to fix deprecated reference to `tokenizer.max_len` with `tokenizer.model_max_length` - similar to [issue 8739](https://github.com/huggingface/transformers/issues/8739) and [PR 8604](https://github.com/huggingface/transformers/pull/8604). 
Example [here](https://colab.research.google.com/gist/poedator/f8776349e5c625ce287fc6fcd312fa1e/tokenizer-max_len-error-in-transformers_glue.ipynb). The error happens when `glue_convert_examples_to_features` is called without `max_length` parameter specified. In that case line 119 with wrong reference gets called. This simple fix should  do it.
2021-02-24 09:01:28 -05:00
Julien Plu
cdcdd5f03a Rework casts (#10274) 2021-02-24 08:38:29 -05:00
abhishek thakur
2d458b2c7d ConvBERT fix torch <> tf weights conversion (#10314)
* convbert conversion test

* fin

* fin

* fin

* clean up tf<->pt conversion

* remove from_pt

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2021-02-24 14:55:34 +03:00
Stas Bekman
3437d12134 [Trainer/Deepspeed] handle get_last_lr() before first step() (#10362)
* handle get_last_lr() before first step()

* abstract away the lr getting logic

* cleanup

* add test

* move to utils
2021-02-23 17:42:25 -08:00
Julien Chaumond
4a1ab7cb6c [bert-base-german-cased] cp to hardcoded urls (#10353) 2021-02-23 12:30:47 -05:00
Akmal
23e87c27be Fix broken examples/seq2seq/README.md markdown (#10344) 2021-02-23 10:49:25 -05:00
Lysandre
83f890ddd1 Easier self-scheduled debugging 2021-02-23 08:53:55 -05:00
Sylvain Gugger
461e8cacf9 Fix evaluation with label smoothing in Trainer (#10338) 2021-02-22 16:39:02 -05:00
Stas Bekman
622a8c5995 [trainer] add Trainer methods for metrics logging and saving (#10266)
* make logging and saving trainer built-in

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-22 13:02:53 -08:00
Tanmay Garg
94d8767ba3 Loading from last checkpoint functionality in Trainer.train (#10334)
Enhance resume_from_checkpoint argument of Trainer.train to accept
bool type. If True given, last saved checkpoint in self.args.output_dir
will be loaded. (#10280)
2021-02-22 15:33:00 -05:00
Stas Bekman
eab0afc19c [Trainer] implement gradient_accumulation_steps support in DeepSpeed integration (#10310)
* implement gradient_accumulation_steps support in DeepSpeed integration

* typo

* cleanup

* cleanup
2021-02-22 11:15:59 -08:00
Stas Bekman
f991daed18 defensive programming + expand/correct README (#10295) 2021-02-22 10:58:50 -08:00
Sylvain Gugger
9e147d31f6 Deprecate prepare_seq2seq_batch (#10287)
* Deprecate prepare_seq2seq_batch

* Fix last tests

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* More review comments

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-02-22 12:36:16 -05:00
Lysandre Debut
e73a3e1891 Add note to resize token embeddings matrix when adding new tokens to voc (#10331) 2021-02-22 09:48:20 -05:00
Julien Plu
19e737b93e Making TF Longformer-like models compliant with AMP (#10233)
* AMP

* Add LED

* Apply style

* Fix longformer
2021-02-22 15:41:56 +01:00
Lysandre Debut
cd8c4c3fc2 DeBERTa-v2 fixes (#10328)
Co-authored-by: Pengcheng He <penhe@microsoft.com>

Co-authored-by: Pengcheng He <penhe@microsoft.com>
2021-02-22 07:45:18 -05:00
tagucci
88605f37a6 fix typo in conversion script (#10316)
* fix typo in conversion script

* style

Co-authored-by: Stas Bekman <stas@stason.org>
2021-02-21 07:54:27 -08:00
Stas Bekman
cdd31b4de4 don't fail when there are no zombies (#10308) 2021-02-20 13:28:43 -08:00
Sylvain Gugger
a2e379743c Fix style 2021-02-20 15:46:54 -05:00
cronoik
a0dfc2d30f fixes #10303 (#10304) 2021-02-20 15:21:33 -05:00
Pengcheng He
9a7e63729f Integrate DeBERTa v2(the 1.5B model surpassed human performance on Su… (#10018)
* Integrate DeBERTa v2(the 1.5B model surpassed human performance on SuperGLUE); Add DeBERTa v2 900M,1.5B models;

* DeBERTa-v2

* Fix v2 model loading issue (#10129)

* Doc members

* Update src/transformers/models/deberta/modeling_deberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Address Sylvain's comments

* Address Patrick's comments

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-02-19 18:34:44 -05:00
Sylvain Gugger
f6e53e3c2b Fix example links in the task summary (#10291) 2021-02-19 18:04:15 -05:00
Julien Plu
536aee99bb Move the TF NER example (#10276) 2021-02-19 16:06:13 -05:00
Joe Davison
cbadb5243c Zero shot distillation script cuda patch (#10284) 2021-02-19 14:06:57 -05:00
Stas Bekman
f1299f5038 Kill any run-away pytest processes (#10281) 2021-02-19 13:36:37 -05:00
Tanmay Garg
709c86b5a9 Introduce logging_strategy training argument (#10267) (#10267)
Introduce logging_strategy training argument
in TrainingArguments and TFTrainingArguments. (#9838)
2021-02-19 11:49:22 -05:00
Julien Plu
34df26ec3a Making TF OpenAI GPT model compliant with AMP and XLA (#10261)
* Fix AMP and XLA

* Remove useless var
2021-02-19 09:33:25 -05:00
Julien Plu
3e116ed331 Making TF TransfoXL model compliant with AMP (#10264)
* Fix AMP

* Apply style

* Remove unused import
2021-02-19 06:58:07 -05:00
Julien Plu
86caeb7636 Fix XLA and AMP (#10262) 2021-02-19 06:57:16 -05:00
Julien Plu
3d72d47f09 Making TF MPNet model compliant with XLA (#10260)
* Fix XLA

* Rework cast

* Apply style
2021-02-19 06:56:41 -05:00
Julien Plu
fb56bf2584 Making TF MobileBert model compliant with AMP (#10259)
* Fix AMP

* Trigger CI

* Rework cast
2021-02-19 06:55:25 -05:00
Julien Plu
2fc6284f04 Making TF Lxmert model compliant with AMP (#10257)
* Fix AMP

* Rework cast

* Apply style
2021-02-19 06:54:14 -05:00
Stas Bekman
d27b28d958 [ISSUES.md] propose using google colab to reproduce problems (#10270)
* propose using google colab to reproduce problems

* Update ISSUES.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-18 17:15:51 -08:00
Stas Bekman
4eddc459a9 [trainer] implement support for full fp16 in evaluation/predict (#10268)
* implement --fp16_full_eval

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style

* add test

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-18 17:02:35 -08:00
Stas Bekman
d9a81fc0c5 fix func signature (#10271) 2021-02-18 16:44:42 -08:00
Joe Davison
c6fe17557e Script for distilling zero-shot classifier to more efficient student (#10244)
* add zero-shot distillation script

* readme wordsmithing

* clean up code

* add multi-gpu teacher inference
plus tidying up more code

* add use_fast_tokenizer arg

* update results in readme

* more readme wordsmithing

* style

* Add handle to readme

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* fix code block

* add error+docs about distributed & tpu

* add @sgugger format requests

* xla -> tpu

* support fp16 for teacher preds

* no checkpoint by default

* add demo colab link

* add model sharing prompt + model link

* correct resulting acc of example

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-02-18 17:08:45 -05:00
Stas Bekman
97e688bc22 [Trainer] memory tracker metrics (#10225)
* memory tracker metrics

* go back to eval for somewhat consistency

* handle no-gpu case

* deal with stackable eval calls

* restore callback order

* style

* simplify the API

* add test

* docs

* consistently use eval_ prefix

* improve docs

* Update src/transformers/trainer_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* rename method

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-18 09:27:32 -08:00
Tanmay Garg
d7f38c5d1d Introduce warmup_ratio training argument (#10229)
Introduce warmup_ratio training argument in both
TrainingArguments and TFTrainingArguments classes (#6673)
2021-02-18 12:23:33 -05:00
Julien Plu
2acae50a0c Reduce the time spent for the TF slow tests (#10152)
* rework savedmodel slow test

* Improve savedmodel tests

* Remove useless content
2021-02-18 15:52:57 +01:00
Julien Plu
14ed3b978e Fix AMP (#10216) 2021-02-18 06:29:43 -05:00
Julien Plu
bdf1669e3f Making TF GPT2 compliant with XLA and AMP (#10230)
* Fix XLA and AMP

* Fix AMP and XLA

* Apply style

* Apply Patrick's comment
2021-02-18 09:36:01 +01:00
Stas Bekman
5da7c78ed8 update to new script; notebook notes (#10241) 2021-02-17 15:58:08 -08:00
Stas Bekman
dee876ceff [trainer] refactor place_model_on_device logic, add deepspeed (#10243)
* refactor place_model_on_device logic, add deepspeed

* doc

* style
2021-02-17 15:52:36 -08:00
Stas Bekman
d1eb88f42d [CI] 2 fixes (#10248)
* fix invalid port

* missing requirements
2021-02-17 14:12:39 -08:00
Julien Plu
7246785a67 Make TF CTRL compliant with XLA and AMP (#10209)
* Fix XLA and AMP

* Apply style

* Remove useless cast
2021-02-17 18:54:15 +01:00
Julien Plu
fdb2351ebb Making TF XLM-like models XLA and AMP compliant (#10211)
* Fix Flaubert and XLM

* Remove useless cast

* Tiny fix

* Tiny fix
2021-02-17 18:02:48 +01:00
Julien Plu
83d803ba02 Making TF BART-like models XLA and AMP compliant (#10191)
* Update BART

* Update Blenderbot

* Update BlenderbotSmall

* Update Marian

* Update MBart

* Update MBart

* Update Pegasus

* Update template

* Fix Marian and Pegasus

* Apply style

* Default initializer

* Default initializer

* Default initializer

* Remove int32 casts

* Fix template

* Remove more cast
2021-02-17 17:48:56 +01:00
Daniel Stancl
8d79e5ca49 Fix head masking for TFT5 (#9877)
* Fix head_mask and decoder_head_mask in TFT5 models

* Enable test_headmasking both fot TFT5 tester
and TFT5EncoderOnly tester

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2021-02-17 19:00:09 +03:00
Lysandre Debut
4b91965731 Factor out methods (#10215) 2021-02-17 09:53:43 -05:00
Stas Bekman
e94d63f6cb [trainer] fix ignored columns logger (#10219)
* [trainer] fix ignored columns logger

This PR fixes a confusing log entry that says:
```
The following columns in the evaluation set don't have a corresponding argument in `T5ForConditionalGeneration.forward` and have been ignored: .
```
when everything is in order.

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-16 13:35:39 -08:00
Joe Davison
4210cd96fc fix add_token_positions fn (#10217) 2021-02-16 14:00:05 -05:00
Sylvain Gugger
7169d1ea7b Store FLOS as floats to avoid overflow. (#10213) 2021-02-16 11:15:15 -05:00
Zhang Cheng
df1b0fb54d set tgt_lang of MBart Tokenizer for summarization (#10205) 2021-02-16 09:39:37 -05:00
Julien Plu
5c2d66a2f5 Unlock XLA test for convbert (#10207) 2021-02-16 07:59:41 -05:00
Suraj Patil
1c8c2d9ab3 [WIP][examples/seq2seq] move old s2s scripts to legacy (#10136)
* move old s2s scripts to legacy

* add the tests back

* proper rename

* restore

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-15 10:48:02 -08:00
Stas Bekman
96897a3535 make the sub-group of tests run always (#10196) 2021-02-15 13:01:35 -05:00
Lysandre Debut
8cbd0bd137 Specify dataset dtype (#10195)
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>

Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
2021-02-15 12:57:17 -05:00
Stas Bekman
0b1f552a24 fix run_seq2seq.py; porting trainer tests to it (#10162)
* fix run_seq2seq.py; porting DeepSpeed tests to it

* unrefactor

* defensive programming

* defensive programming 2

* port the rest of the trainer tests

* style

* a cleaner scripts dir finder

* cleanup
2021-02-15 09:12:17 -08:00
Julien Plu
31b0560ab4 Add AMP for Albert (#10141) 2021-02-15 17:18:33 +01:00
Suraj Patil
6fc940ed09 Add mBART-50 (#10154)
* add tokenizer for mBART-50

* update tokenizers

* make src_lang and tgt_lang optional

* update tokenizer test

* add setter

* update docs

* update conversion script

* update docs

* update conversion script

* update tokenizer

* update test

* update docs

* doc

* address Sylvain's suggestions

* fix test

* fix formatting

* nits
2021-02-15 20:58:54 +05:30
Julien Plu
570218878a Fix TF template (#10189)
* Fix template

* Update Seq2Seq tests
2021-02-15 09:21:57 -05:00
Suraj Patil
2a5c990038 fix RagTokenizer (#10167) 2021-02-15 19:48:12 +05:30
Julien Plu
c8d3fa0dfd Check TF ops for ONNX compliance (#10025)
* Add check-ops script

* Finish to implement check_tf_ops and start the test

* Make the test mandatory only for BERT

* Update tf_ops folder

* Remove useless classes

* Add the ONNX test for GPT2 and BART

* Add a onnxruntime slow test + better opset flexibility

* Fix test + apply style

* fix tests

* Switch min opset from 12 to 10

* Update src/transformers/file_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Fix GPT2

* Remove extra shape_list usage

* Fix GPT2

* Address Morgan's comments

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-02-15 07:55:10 -05:00
Lysandre Debut
93bd2f7099 Add new model to labels that should not stale (#10187) 2021-02-15 06:31:29 -05:00
Nicolas Patry
900daec24e Fixing NER pipeline for list inputs. (#10184)
Fixes #10168
2021-02-15 06:22:45 -05:00
Sylvain Gugger
587197dcd2 Fix datasets set_format (#10178) 2021-02-15 05:49:07 -05:00
Stas Bekman
8fae93ca19 [t5 tokenizer] add info logs (#9897)
* save fast tokenizer + add info logs

* fix tests

* remove the saving of fast tokenizer
2021-02-13 09:10:22 -05:00
Sylvain Gugger
803498318c [Doc] Fix version control in internal pages (#10124) 2021-02-13 08:52:30 -05:00
Manuel Romero
698c9e2dbd Fix typo in comment (#10156) 2021-02-13 08:26:25 -05:00
Manuel Romero
c969366870 Fix typo in comments (#10157) 2021-02-13 08:26:01 -05:00
Nicolas Patry
c9837a0d27 Conversion from slow to fast for BPE spm vocabs contained an error. (#10120)
* Conversion from slow to fast for BPE spm vocabs contained an error.

- There is only 1 test currently (tokenizers + slow) that used the modified path
and it's reformer, which does not contain any ids modification so the
bug was silent for now.
- The real issue is that vocab variable was overloaded by
SentencePieceExtractor, leading to Slow specific vocab oddities to be
completely ignored
- The bug was reported here https://github.com/huggingface/transformers/issues/9518
- Ran the complete tokenization test suite with slow without error
(`RUN_SLOW=1 pytest -sv tests/test_tokenization_*`)

* Remove rebase error.

* Adding the fixture.
2021-02-13 08:24:53 -05:00
Lysandre Debut
dd3a7f9641 Revert propagation (#10171) 2021-02-13 08:19:56 -05:00
Julien Chaumond
641f418e10 [hf_api] delete deprecated methods and tests (2) 2021-02-12 21:46:17 +01:00
Julien Chaumond
eed31db948 [hf_api] delete deprecated methods and tests (#10159)
* [hf_api] delete deprecated methods and tests

cc @lhoestq

* Update test_hf_api.py
2021-02-12 15:35:06 -05:00
Mohamed Al Salti
1321356bdf Fix typo in GPT2DoubleHeadsModel docs (#10148)
* Fix typo

* apply suggestion

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-02-12 22:48:39 +05:30
Suraj Patil
f51188cbe7 [examples/run_s2s] remove task_specific_params and update rouge computation (#10133)
* fix rouge metrics and task specific params

* fix typo

* round metrics

* typo

* remove task_specific_params
2021-02-12 17:18:21 +05:30
Sylvain Gugger
31245775e5 Add SageMakerTrainer for model paralellism (#10122)
* Refactor things out of main train

* Store signature

* Add SageMakerTrainer

* Init + Copyright

* Address review comments
2021-02-11 18:44:18 -05:00
Stas Bekman
b54cb0bd82 [DeepSpeed in notebooks] Jupyter + Colab (#10130)
* init devices/setup explicitly

* docs + test

* simplify

* cleanup

* cleanup

* cleanup

* correct the required dist setup

* derive local_rank from env LOCAL_RANK
2021-02-11 14:02:05 -08:00
Sylvain Gugger
6710d1d5ef Typo fix 2021-02-11 15:12:35 -05:00
Patrick von Platen
8e13b73593 Update README.md 2021-02-11 18:35:27 +03:00
Patrick von Platen
d6b4f48ecb Update ADD_BIG_BIRD.md 2021-02-11 18:34:17 +03:00
Patrick von Platen
495c157d6f [Wav2Vec2] Improve Tokenizer & Model for batched inference (#10117)
* save intermediate

* finish batch the same as fairseq

* add normalization

* fix batched input

* add better comment

* Update src/transformers/models/wav2vec2/modeling_wav2vec2.py

* add nice docstring

* add tokenizer tests

* make all slow tests pass

* finish PR

* correct import
2021-02-11 15:40:54 +03:00
Tanmay Thakur
2f3b5f4dcc Add new community notebook - Blenderbot (#10126)
* Update:community.md, new nb add

* feat: updated grammar on  nb description

* Update: Train summarizer for BlenderBotSmall
2021-02-11 12:53:40 +03:00
Qbiwan
8dcfaea08d Update run_xnli.py to use Datasets library (#9829)
* remove xnli_compute_metrics, add load_dataset, load_metric, set_seed,metric.compute,load_metric

* fix

* fix

* fix

* push

* fix

* everything works

* fix init

* fix

* special treatment for sepconv1d

* style

* 🙏🏽

* add doc and cleanup


* fix doc

* fix doc again

* fix doc again

* Apply suggestions from code review

* make style

* Proposal that should work

* Remove needless code

* Fix test

* Apply suggestions from code review

* remove xnli_compute_metrics, add load_dataset, load_metric, set_seed,metric.compute,load_metric

* amend README

* removed data_args.task_name and replaced with task_name = "xnli"; use split function to load train and validation dataset separately; remove __post_init__; remove flag --task_name from README.

* removed dict task_to_keys, use str "xnli" instead of variable task_name, change preprocess_function to use examples["premise"], examples["hypothesis"] directly, remove sentence1_key and sentence2_key, change compute_metrics function to cater only to accuracy metric, add condition for train_langauge is None when using dataset.load_dataset()

* removed `torch.distributed.barrier()` and `import torch` as `from_pretrained` is able to do the work; amend README
2021-02-11 10:27:23 +05:30
Stas Bekman
77b862847b [DeepSpeed] restore memory for evaluation (#10114)
* free up memory at the end of train

* rework tests

* consistent formatting

* correction
2021-02-10 09:09:48 -08:00
Suraj Patil
c130e67dce remove adjust_logits_during_generation method (#10087)
* add forced logits processors

* delete adjust_logits method

* add forced_eos_token_id argument in config

* add tests for forced logits processors

* update gen utils tests

* add forced option to tf generate

* remove adjust_logits method from tf models

* update adjust_logits for marian

* delete _force_token_id_to_be_generated method

* style

* import warnings

* pass max_length to _get_logits_processor

* set forced_eos_token_id to None

* set forced attributes in conf utils

* typo

* fix rag generate

* add forced_eos_token_id in rag config

* remove force_bos_token_to_be_generated from BartConfig

* remove _force_token_ids_generation from FSMT

* nit

* fix negative constant

* apply suggestions from code review
2021-02-10 22:39:09 +05:30
Julien Plu
22a32cf485 Fix TF LED/Longformer attentions computation (#10007)
* Fix test

* Remove commented test

* Fix name

* Apply style

* Fix check copies

* Remove prints

* Restore boolean

* Fix reshape
2021-02-10 10:58:37 -05:00
Lysandre Debut
0d8e554d42 Line endings should be LF across repo and not CRLF (#10119) 2021-02-10 10:50:00 -05:00
Stas Bekman
937f67074d add deepspeed fairscale (#10116) 2021-02-10 03:12:27 -05:00
Stas Bekman
d478257d9b [CI] build docs faster (#10115)
I assume the CI machine should have at least 4 cores, so let's build docs faster
2021-02-10 03:02:39 -05:00
Stas Bekman
7c07a47dfb [DeepSpeed docs] new information (#9610)
* how to specify a specific gpu

* new paper

* expand on buffer sizes

* style

* where to find config examples

* specific example

* small updates
2021-02-09 22:16:20 -08:00
Anthony MOI
1fbaa3c117 Fix tokenizers training in notebook (#10110) 2021-02-09 21:48:22 -05:00
Shiva Zamani
85395e4901 Remove speed metrics from default compute objective (#10107) 2021-02-09 19:03:02 -05:00
Boris Dayma
7c7962ba89 doc: update W&B related doc (#10086)
* doc: update W&B related doc

* doc(wandb): mention report_to

* doc(wandb): commit suggestion

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* doc(wandb): fix typo

* doc(wandb): remove WANDB_DISABLED

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-09 14:47:52 -05:00
abhishek thakur
480a9d6ba0 Fix TFConvBertModelIntegrationTest::test_inference_masked_lm Test (#10104) 2021-02-09 20:22:54 +01:00
Sylvain Gugger
0c3d23dff7 Add patch releases to the doc 2021-02-09 14:17:09 -05:00
Suraj Patil
3e0c62b611 [RAG] fix generate (#10094)
* fix rag generate and tests

* put back adjust_logits_during_generation

* tests are okay

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-02-09 21:57:38 +03:00
Patrick von Platen
226973a9c5 fix import (#10103) 2021-02-09 21:43:41 +03:00
Patrick von Platen
4cda2d73ef Update ADD_BIG_BIRD.md 2021-02-09 19:58:35 +03:00
Julien Plu
b82fe7d258 Replace strided slice with tf.expand_dims (#10078)
* Replace tf.newaxis -> tf.expand_dims

* Fix tests

* Fix tests

* Use reshape when a tensors needs a double expand

* Fix GPT2

* Fix GPT2
2021-02-09 11:48:28 -05:00
Daniel Stancl
e7381c4596 Add head_mask and decoder_head_mask to TF LED (#9988)
* Add head masking to TF LED

* Add head_mask to Longformer + one doc piece to LED

* Fix integration tests
2021-02-09 11:45:18 -05:00
Sylvain Gugger
77c0ce8c0c Fix some edge cases in report_to and add deprecation warnings (#10100) 2021-02-09 10:38:12 -05:00
Lysandre Debut
78f4a0e7e5 Logging propagation (#10092)
* Enable propagation by default

* Document enable/disable default handler
2021-02-09 10:27:49 -05:00
Suraj Patil
63fddcf69c [examples/s2s] add test set predictions (#10085)
* add do_predict, pass eval_beams durig eval

* update help

* apply suggestions from code review
2021-02-09 20:41:41 +05:30
Julien Plu
c6d5e56595 Fix naming (#10095) 2021-02-09 06:10:31 -05:00
abhishek thakur
4ed763779e Fix example in Wav2Vec2 documentation (#10096)
* Fix example in Wav2Vec2 documentation

* fix style
2021-02-09 06:07:56 -05:00
Lysandre
bf1a06a437 Docs for v4.3.1 release 2021-02-09 10:02:50 +01:00
Patrick von Platen
b972125ced Deprecate Wav2Vec2ForMaskedLM and add Wav2Vec2ForCTC (#10089)
* add wav2vec2CTC and deprecate for maskedlm

* remove from docs
2021-02-09 03:49:02 -05:00
Lysandre
ba542ffb49 Fix deployment script 2021-02-09 08:43:00 +01:00
sandip
263fac71a2 Integration test for electra model (#10073) 2021-02-08 15:42:25 -05:00
Stas Bekman
781220acab transition to new tests dir (#10080) 2021-02-08 12:41:52 -08:00
demSd
84acf0c7bb remove token_type_ids from TokenizerBertGeneration output (#10070) 2021-02-08 13:05:32 -05:00
Juan Cruz-Benito
e4bf9910dc Removing run_pl_glue.py from text classification docs, include run_xnli.py & run_tf_text_classification.py (#10066)
* Removing run_pl_glue.py from seq classification docs

* Adding run_tf_text_classification.py

* Using :prefix_link: to refer local files

* Applying "make style" to the branch

* Update docs/source/task_summary.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Removing last underscores

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-08 13:04:21 -05:00
Lysandre
0dd579c9cf Docs for v4.3.0 2021-02-08 18:53:24 +01:00
Stas Bekman
322037e842 [trainer] deepspeed bug fixes and tests (#10039)
* deepspeed bug fixes and tests

* manual wrap?
2021-02-08 09:44:02 -08:00
Anthony MOI
f285e4c3ad Update tokenizers requirement (#10077) 2021-02-08 12:27:26 -05:00
noise-field
ddaafd78fb Fix mlflow param overflow clean (#10071)
* Unify logging with f-strings

* Get limits from MLflow rather than hardcode

* Add a check for parameter length overflow

Also constants are marked as internal

* Don't stop run in on_train_end

This causes bad behaviour when there is a seprarte validation step:
validation gets recorded as separate run.

* Fix style
2021-02-08 11:58:02 -05:00
Olivier
ece6c51458 [s2s examples] Replace -100 token ids with the tokenizer pad_id for compute_metrics (#10046)
* replace -100 token ids with the tokenizer pad_id for compute_metrics

* fixed typo for label_ids
2021-02-08 10:08:16 -05:00
Lysandre Debut
c9df1b1d53 Model templates (#10072) 2021-02-08 09:07:02 -05:00
demSd
3b7e612a5e Implementing the test integration of BertGeneration (#9990)
* claiming this issue

* Integration test for BertGeneration(Encoder and Decoder)

* fix code quality
2021-02-08 08:22:19 -05:00
Julien Plu
cdd8659231 Fix TF template (#10069)
* Fix template

* Fix template
2021-02-08 08:10:50 -05:00
Patrick von Platen
9e795eac88 fix bert2bert test (#10063) 2021-02-08 16:04:28 +03:00
Julien Plu
31563e056d Restore TF embeddings and attention layers to their previous version (#9890)
* Refacto BERT

* Restore all the concerned models

* Remove print

* Update template

* Apply Sylvain's and Morgan's comments

* Fix cast

* Put the cast inside call

* Remove cond in ebds

* Fix funnel

* Restore previous dot product (attention_scores) computation

* Add ConvBERT and BART

* Make all the S2S models ONNX compliant

* Fix test

* Fix check copies
2021-02-08 14:36:30 +03:00
Julien Plu
8bb52bd240 Disable temporarily too slow tests (Longformer/LED) (#10062)
* Disable temporarily too slow tests

* Fix style

* Fix template
2021-02-08 12:32:31 +01:00
Nicolas Patry
b1aa4982cd Cleaning up ConversationalPipeline to support more than DialoGPT. (#10002)
* Cleaning up `ConversationalPipeline` to support more than DialoGPT.

Currently ConversationalPipeline was heavily biased towards DialoGPT
,which is the default model for this pipeline.

This PR proposes changes to put back the modifications specific to
DialoGPT into tokenizer-specific behavior wherever possible, by
creating `_build_conversation_input_ids` function that takes
conversation as input, and returns a list of ints corresponding
to the tokens. It feels natural to put here because all models
have probably different strategies to build input_ids from the
full conversation and it's the tokenizer's job to transform strings
into tokens (and vice-versa)

If `_build_conversation_input_ids` is missing, previous behavior is
used so we don't break anything so far (except for blenderbot where it's a fix).

This PR also contains a fix for too long inputs. There used
to be dead code for trying to limit the size of incoming input.
The introduced fixed is that we limit
within `_build_conversation_input_ids` to `tokenizer.model_max_length`.
It corresponds to the intent of the removed dead code and is actually
better because it corresponds to `model_max_length` which is different
from `max_length` (which is a default parameter for `generate`).

- Removed `history` logic from the Conversation as it's not relevant
anymore because tokenization logic has been moved to tokenizer.
And tokenizer cannot save any cache, and conversation cannot know
what is relevant or not.
Also it's not usable from `blenderbot` because the input_ids are
not append only (EOS tokens is always at the end).

- Added `iter_texts` method on `Conversation` because all
the code was literred with some form of this iteration of
past/generated_responses.

* Removing torch mention in types.

* Adding type checking to `_build_conversation_input_ids`.

* Fixing import in strings.
2021-02-08 14:29:07 +03:00
Lysandre Debut
ae37ceacbd Fix typo (#10064) 2021-02-08 06:02:05 -05:00
Patrick von Platen
9a0399e18d fix bart tests (#10060) 2021-02-08 13:25:09 +03:00
Sylvain Gugger
b01483faa0 Truncate max length if needed in all examples (#10034) 2021-02-08 05:03:55 -05:00
Sylvain Gugger
45aaf5f7ab A few fixes in the documentation (#10033) 2021-02-08 05:02:01 -05:00
Sylvain Gugger
04fd783cc5 Check copies match full class/function names (#10030) 2021-02-08 04:58:25 -05:00
Lysandre Debut
d51302cca0 Fix slow dpr test (#10059)
* Correct cast to device

* Comment back the slow test
2021-02-08 04:43:25 -05:00
sandip
12e44af5d3 Integration test for FlauBert (#10022) 2021-02-08 04:36:50 -05:00
Stas Bekman
24db8cc329 Can't mix --fp16 and --device cpu (#10041) 2021-02-07 17:54:20 -08:00
Stas Bekman
769948fad2 json to jsonlines, and doc, and typo (#10043) 2021-02-07 17:51:34 -08:00
Stas Bekman
8ea412a86f [examples] make run scripts executable (#10037)
* make executable

* make executable

* same for the template

* cleanup
2021-02-05 15:51:18 -08:00
Suraj Patil
1cd16512dc [examples/seq2seq] support label smoothing (#9844)
* add prepare_decoder_input_ids_from_labels in s2s models

* support lbl smoothing and enc/emb freezing

* fix freezing

* use pad_token_id from config

* remove embed freezing and add warning

* prepare decoder_input_ids inside DataCollatorForSeq2Seq
2021-02-05 23:21:57 +05:30
Patrick von Platen
b9720dd6f2 Bump minimum Jax requirement to 2.8.0 (#10027)
* Bump minimum Jax requirement to 2.8.0

* update table
2021-02-05 16:20:26 +03:00
Patrick von Platen
89be094e29 [Templates] Add template "call-for-model" markdown and "call-for-big-bird" markdown (#9921)
* add big bird

* change teacher to mentor

* add proposal template

* adapt template

* delete old template

* correct some links

* finish template

* create big bird from template

* add big bird

* improve boxes

* finish boxes

* add pointers for BigBird

* finish big bird

* up

* up

* up

* up

* apply lysandres and sylvains suggestions

* delete bogus file

* correct markdown

* try different style

* try different style

* finalize
2021-02-05 15:47:54 +03:00
Lysandre Debut
4bbad604eb Clarify QA pipeline output based on character (#10021)
* Clarify QA pipeline output based on character

* Style
2021-02-05 05:40:30 -05:00
Lysandre
ad2c431097 Update doc deployment script path 2021-02-05 11:18:59 +01:00
Lysandre
95a5f271e5 Update doc deployment script 2021-02-05 11:10:29 +01:00
Sylvain Gugger
3be965c5db Update doc for pre-release (#10014)
* Update doc for pre-release

* Use stable as default

* Use the right commit :facepalms:
2021-02-04 16:52:27 -05:00
Sylvain Gugger
ba607db180 Bump version 2021-02-04 16:23:05 -05:00
1030 changed files with 150676 additions and 23302 deletions

View File

@@ -3,7 +3,6 @@ orbs:
gcp-gke: circleci/gcp-gke@1.0.4
go: circleci/go@1.3.0
# TPU REFERENCES
references:
checkout_ml_testing: &checkout_ml_testing
@@ -69,6 +68,8 @@ jobs:
- image: circleci/python:3.6
environment:
OMP_NUM_THREADS: 1
RUN_PT_TF_CROSS_TESTS: yes
TRANSFORMERS_IS_CI: yes
resource_class: xlarge
parallelism: 1
steps:
@@ -77,14 +78,45 @@ jobs:
keys:
- v0.4-torch_and_tf-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
- run: pip install --upgrade pip
- run: pip install .[sklearn,tf-cpu,torch,testing,sentencepiece]
- run: pip install tapas torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cpu.html
- run: pip install .[sklearn,tf-cpu,torch,testing,sentencepiece,speech,vision]
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cpu.html
- save_cache:
key: v0.4-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: RUN_PT_TF_CROSS_TESTS=1 python -m pytest -n 8 --dist=loadfile -rA -s --make-reports=tests_torch_and_tf ./tests/ -m is_pt_tf_cross_test --durations=0 | tee tests_output.txt
- run: python -m pytest -n 8 --dist=loadfile -rA -s --make-reports=tests_torch_and_tf ./tests/ -m is_pt_tf_cross_test --durations=0 | tee tests_output.txt
- store_artifacts:
path: ~/transformers/tests_output.txt
- store_artifacts:
path: ~/transformers/reports
run_tests_torch_and_flax:
working_directory: ~/transformers
docker:
- image: circleci/python:3.6
environment:
OMP_NUM_THREADS: 1
RUN_PT_FLAX_CROSS_TESTS: yes
TRANSFORMERS_IS_CI: yes
resource_class: xlarge
parallelism: 1
steps:
- checkout
- restore_cache:
keys:
- v0.4-torch_and_flax-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
- run: pip install --upgrade pip
- run: pip install .[sklearn,flax,torch,testing,sentencepiece,speech,vision]
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cpu.html
- save_cache:
key: v0.4-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -n 8 --dist=loadfile -rA -s --make-reports=tests_torch_and_flax ./tests/ -m is_pt_flax_cross_test --durations=0 | tee tests_output.txt
- store_artifacts:
path: ~/transformers/tests_output.txt
- store_artifacts:
@@ -96,6 +128,7 @@ jobs:
- image: circleci/python:3.7
environment:
OMP_NUM_THREADS: 1
TRANSFORMERS_IS_CI: yes
resource_class: xlarge
parallelism: 1
steps:
@@ -104,14 +137,15 @@ jobs:
keys:
- v0.4-torch-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
- run: pip install --upgrade pip
- run: pip install .[sklearn,torch,testing,sentencepiece]
- run: pip install tapas torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cpu.html
- run: pip install .[sklearn,torch,testing,sentencepiece,speech,vision,timm]
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cpu.html
- save_cache:
key: v0.4-torch-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -n 8 --dist=loadfile -s --make-reports=tests_torch ./tests/ | tee tests_output.txt
- run: python -m pytest -n 3 --dist=loadfile -s --make-reports=tests_torch ./tests/ | tee tests_output.txt
- store_artifacts:
path: ~/transformers/tests_output.txt
- store_artifacts:
@@ -123,6 +157,7 @@ jobs:
- image: circleci/python:3.7
environment:
OMP_NUM_THREADS: 1
TRANSFORMERS_IS_CI: yes
resource_class: xlarge
parallelism: 1
steps:
@@ -149,6 +184,7 @@ jobs:
- image: circleci/python:3.7
environment:
OMP_NUM_THREADS: 1
TRANSFORMERS_IS_CI: yes
resource_class: xlarge
parallelism: 1
steps:
@@ -158,7 +194,7 @@ jobs:
- v0.4-flax-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: sudo pip install .[flax,sklearn,torch,testing,sentencepiece]
- run: sudo pip install .[flax,testing,sentencepiece]
- save_cache:
key: v0.4-flax-{{ checksum "setup.py" }}
paths:
@@ -175,6 +211,8 @@ jobs:
- image: circleci/python:3.7
environment:
OMP_NUM_THREADS: 1
RUN_PIPELINE_TESTS: yes
TRANSFORMERS_IS_CI: yes
resource_class: xlarge
parallelism: 1
steps:
@@ -183,14 +221,15 @@ jobs:
keys:
- v0.4-torch-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
- run: pip install --upgrade pip
- run: pip install .[sklearn,torch,testing,sentencepiece]
- run: pip install tapas torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cpu.html
- run: pip install .[sklearn,torch,testing,sentencepiece,speech,vision]
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cpu.html
- save_cache:
key: v0.4-torch-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: RUN_PIPELINE_TESTS=1 python -m pytest -n 8 --dist=loadfile -rA -s --make-reports=tests_pipelines_torch -m is_pipeline_test ./tests/ | tee tests_output.txt
- run: python -m pytest -n 8 --dist=loadfile -rA -s --make-reports=tests_pipelines_torch -m is_pipeline_test ./tests/ | tee tests_output.txt
- store_artifacts:
path: ~/transformers/tests_output.txt
- store_artifacts:
@@ -202,6 +241,8 @@ jobs:
- image: circleci/python:3.7
environment:
OMP_NUM_THREADS: 1
RUN_PIPELINE_TESTS: yes
TRANSFORMERS_IS_CI: yes
resource_class: xlarge
parallelism: 1
steps:
@@ -216,7 +257,7 @@ jobs:
key: v0.4-tf-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: RUN_PIPELINE_TESTS=1 python -m pytest -n 8 --dist=loadfile -rA -s --make-reports=tests_pipelines_tf ./tests/ -m is_pipeline_test | tee tests_output.txt
- run: python -m pytest -n 8 --dist=loadfile -rA -s --make-reports=tests_pipelines_tf ./tests/ -m is_pipeline_test | tee tests_output.txt
- store_artifacts:
path: ~/transformers/tests_output.txt
- store_artifacts:
@@ -228,6 +269,7 @@ jobs:
- image: circleci/python:3.7
environment:
RUN_CUSTOM_TOKENIZERS: yes
TRANSFORMERS_IS_CI: yes
steps:
- checkout
- restore_cache:
@@ -235,7 +277,7 @@ jobs:
- v0.4-custom_tokenizers-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install .[ja,testing,sentencepiece]
- run: pip install .[ja,testing,sentencepiece,jieba]
- run: python -m unidic download
- save_cache:
key: v0.4-custom_tokenizers-{{ checksum "setup.py" }}
@@ -253,6 +295,7 @@ jobs:
- image: circleci/python:3.6
environment:
OMP_NUM_THREADS: 1
TRANSFORMERS_IS_CI: yes
resource_class: xlarge
parallelism: 1
steps:
@@ -263,32 +306,44 @@ jobs:
- v0.4-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install .[sklearn,torch,sentencepiece,testing]
- run: pip install -r examples/_tests_requirements.txt
- run: pip install -r examples/pytorch/_tests_requirements.txt
- save_cache:
key: v0.4-torch_examples-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -n 8 --dist=loadfile -s --make-reports=examples_torch ./examples/ | tee examples_output.txt
- run: TRANSFORMERS_IS_CI=1 python -m pytest -n 8 --dist=loadfile -s --make-reports=examples_torch ./examples/pytorch/ | tee examples_output.txt
- store_artifacts:
path: ~/transformers/examples_output.txt
- store_artifacts:
path: ~/transformers/reports
run_tests_git_lfs:
run_tests_hub:
working_directory: ~/transformers
docker:
- image: circleci/python:3.7
environment:
HUGGINGFACE_CO_STAGING: yes
RUN_GIT_LFS_TESTS: yes
TRANSFORMERS_IS_CI: yes
resource_class: xlarge
parallelism: 1
steps:
- checkout
- restore_cache:
keys:
- v0.4-hub-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get install git-lfs
- run: |
git config --global user.email "ci@dummy.com"
git config --global user.name "ci"
- run: pip install --upgrade pip
- run: pip install .[testing]
- run: RUN_GIT_LFS_TESTS=1 python -m pytest -sv ./tests/test_hf_api.py -k "HfLargefilesTest"
- run: pip install .[torch,sentencepiece,testing]
- save_cache:
key: v0.4-hub-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -sv ./tests/ -m is_staging_test
build_doc:
working_directory: ~/transformers
@@ -300,13 +355,14 @@ jobs:
keys:
- v0.4-build_doc-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
- run: pip install --upgrade pip
- run: pip install ."[all, docs]"
- run: pip install ."[docs]"
- save_cache:
key: v0.4-build_doc-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: cd docs && make html SPHINXOPTS="-W"
- run: cd docs && make html SPHINXOPTS="-W -j 4"
- store_artifacts:
path: ./docs/_build
@@ -323,7 +379,9 @@ jobs:
keys:
- v0.4-deploy_doc-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: pip install ."[all,docs]"
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
- run: pip install --upgrade pip
- run: pip install ."[docs]"
- save_cache:
key: v0.4-deploy_doc-{{ checksum "setup.py" }}
paths:
@@ -335,6 +393,8 @@ jobs:
docker:
- image: circleci/python:3.6
resource_class: medium
environment:
TRANSFORMERS_IS_CI: yes
parallelism: 1
steps:
- checkout
@@ -351,12 +411,14 @@ jobs:
- '~/.cache/pip'
- run: black --check examples tests src utils
- run: isort --check-only examples tests src utils
- run: python utils/custom_init_isort.py --check_only
- run: flake8 examples tests src utils
- run: python utils/style_doc.py src/transformers docs/source --max_len 119 --check_only
- run: python utils/check_copies.py
- run: python utils/check_table.py
- run: python utils/check_dummies.py
- run: python utils/check_repo.py
- run: python utils/check_inits.py
check_repository_consistency:
working_directory: ~/transformers
@@ -375,6 +437,7 @@ jobs:
- image: circleci/python:3.6
environment:
OMP_NUM_THREADS: 1
TRANSFORMERS_IS_CI: yes
resource_class: xlarge
parallelism: 1
steps:
@@ -413,12 +476,13 @@ workflows:
- run_examples_torch
- run_tests_custom_tokenizers
- run_tests_torch_and_tf
- run_tests_torch_and_flax
- run_tests_torch
- run_tests_tf
- run_tests_flax
- run_tests_pipelines_torch
- run_tests_pipelines_tf
- run_tests_git_lfs
- run_tests_hub
- build_doc
- deploy_doc: *workflow_filters
# tpu_testing_jobs:

View File

@@ -3,6 +3,7 @@ cd docs
function deploy_doc(){
echo "Creating doc at commit $1 and pushing to folder $2"
git checkout $1
pip install -U ..
if [ ! -z "$2" ]
then
if [ "$2" == "master" ]; then
@@ -45,7 +46,7 @@ deploy_doc "6f5a12a" v2.7.0
deploy_doc "11c3257" v2.8.0
deploy_doc "e7cfc1a" v2.9.0
deploy_doc "7cb203f" v2.9.1
deploy_doc "10d7239" v2.10.0
deploy_doc "10d7239" v2.10.0
deploy_doc "b42586e" v2.11.0
deploy_doc "7fb8bdf" v3.0.2
deploy_doc "4b3ee9c" v3.1.0
@@ -53,6 +54,14 @@ deploy_doc "3ebb1b3" v3.2.0
deploy_doc "0613f05" v3.3.1
deploy_doc "eb0e0ce" v3.4.0
deploy_doc "818878d" v3.5.1
deploy_doc "c781171" v4.0.0
deploy_doc "c781171" v4.0.1
deploy_doc "bfa4ccf" v4.1.1
deploy_doc "7d9a9d0" # v4.2.0 Latest stable release
deploy_doc "7d9a9d0" v4.2.2
deploy_doc "bae0c79" v4.3.3
deploy_doc "c988db5" v4.4.0
deploy_doc "c5d6a28" v4.4.1
deploy_doc "6bc89ed" v4.4.2
deploy_doc "4906a29" v4.5.0
deploy_doc "4bae96e" v4.5.1
deploy_doc "25dee4a" v4.6.0
deploy_doc "7a6c9fa" # v4.7.0 Latest stable release

3
.gitattributes vendored Normal file
View File

@@ -0,0 +1,3 @@
*.py eol=lf
*.rst eol=lf
*.md eol=lf

View File

@@ -34,7 +34,7 @@ Models:
- funnel: @sgugger
- gpt2: @patrickvonplaten, @LysandreJik
- rag: @patrickvonplaten, @lhoestq
- tensorflow: @jplu
- tensorflow: @Rocketknight1
Library:
@@ -42,15 +42,19 @@ Library:
- deepspeed: @stas00
- ray/raytune: @richardliaw, @amogkam
- text generation: @patrickvonplaten
- tokenizers: @n1t0, @LysandreJik
- tokenizers: @LysandreJik
- trainer: @sgugger
- pipelines: @LysandreJik
Documentation: @sgugger
Model hub:
- for issues with a model report at https://discuss.huggingface.co/ and tag the model's creator.
HF projects:
- nlp datasets: [different repo](https://github.com/huggingface/nlp)
- datasets: [different repo](https://github.com/huggingface/datasets)
- rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Examples:

View File

@@ -30,7 +30,7 @@ Fixes # (issue)
## Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.
members/contributors who may be interested in your PR.
<!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @
@@ -46,7 +46,7 @@ Models:
- funnel: @sgugger
- gpt2: @patrickvonplaten, @LysandreJik
- rag: @patrickvonplaten, @lhoestq
- tensorflow: @jplu
- tensorflow: @LysandreJik
Library:
@@ -62,7 +62,7 @@ Documentation: @sgugger
HF projects:
- nlp datasets: [different repo](https://github.com/huggingface/nlp)
- datasets: [different repo](https://github.com/huggingface/datasets)
- rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Examples:

View File

@@ -14,8 +14,10 @@ requirements:
host:
- python
- pip
- numpy
- numpy >=1.17
- dataclasses
- importlib_metadata
- huggingface_hub
- packaging
- filelock
- requests
@@ -23,11 +25,14 @@ requirements:
- sacremoses
- regex !=2019.12.17
- protobuf
- tokenizers ==0.9.4
- tokenizers >=0.10.1,<0.11.0
- pyyaml
run:
- python
- numpy
- numpy >=1.17
- dataclasses
- importlib_metadata
- huggingface_hub
- packaging
- filelock
- requests
@@ -35,7 +40,8 @@ requirements:
- sacremoses
- regex !=2019.12.17
- protobuf
- tokenizers ==0.9.4
- tokenizers >=0.10.1,<0.11.0
- pyyaml
test:
imports:

18
.github/stale.yml vendored
View File

@@ -1,18 +0,0 @@
# Number of days of inactivity before an issue becomes stale
daysUntilStale: 60
# Number of days of inactivity before a stale issue is closed
daysUntilClose: 7
# Issues with these labels will never be considered stale
exemptLabels:
- pinned
- security
- Feature request
# Label to use when marking an issue as stale
staleLabel: wontfix
# Comment to post when marking an issue as stale. Set to `false` to disable
markComment: >
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions.
# Comment to post when closing a stale issue. Set to `false` to disable
closeComment: false

View File

@@ -37,10 +37,10 @@ jobs:
# no longer needed
pip uninstall -y transformers
- name: Torch hub list
run: |
python -c "import torch; print(torch.hub.list('huggingface/transformers:$BRANCH'))"
#- name: Torch hub list
# run: |
# python -c "import torch; print(torch.hub.list('huggingface/transformers:$BRANCH'))"
- name: Torch hub help
run: |
python -c "import torch; print(torch.hub.help('huggingface/transformers:$BRANCH', 'modelForSequenceClassification'))"
#- name: Torch hub help
# run: |
# python -c "import torch; print(torch.hub.help('huggingface/transformers:$BRANCH', 'modelForSequenceClassification'))"

View File

@@ -2,14 +2,15 @@ name: Model templates runner
on:
push:
branches:
- master
pull_request:
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
pull_request_target:
branches:
- master
types: [assigned, opened, synchronize, reopened]
jobs:
run_tests_templates:
@@ -36,6 +37,7 @@ jobs:
- name: Install dependencies
run: |
pip install --upgrade pip
sudo apt -y update && sudo apt install -y libsndfile1-dev
pip install .[dev]
- name: Create model files
run: |
@@ -48,6 +50,7 @@ jobs:
make style
python utils/check_table.py --fix_and_overwrite
python utils/check_dummies.py --fix_and_overwrite
python utils/check_copies.py --fix_and_overwrite
- name: Run all non-slow tests
run: |

View File

@@ -4,6 +4,8 @@ on:
push:
tags:
- v*
branches:
- conda_*
env:
ANACONDA_API_TOKEN: ${{ secrets.ANACONDA_API_TOKEN }}
@@ -24,6 +26,7 @@ jobs:
with:
auto-update-conda: true
auto-activate-base: false
python-version: 3.8
activate-environment: "build-transformers"
channels: huggingface

View File

@@ -5,148 +5,97 @@ on:
branches:
- master
- ci_*
- ci-*
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
# pull_request:
repository_dispatch:
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
jobs:
run_tests_torch_gpu:
runs-on: [self-hosted, gpu, single-gpu]
runs-on: [self-hosted, docker-gpu, single-gpu]
container:
image: pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- uses: actions/checkout@v2
- name: Python version
- name: Launcher docker
uses: actions/checkout@v2
- name: NVIDIA-SMI
run: |
which python
python --version
pip --version
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v1.1-tests_torch_gpu-${{ hashFiles('setup.py') }}
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
nvidia-smi
- name: Install dependencies
run: |
source .env/bin/activate
apt -y update && apt install -y libsndfile1-dev
pip install --upgrade pip
pip install .[torch,sklearn,testing,onnxruntime,sentencepiece]
pip install git+https://github.com/huggingface/datasets
pip install pandas torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cu102.html
pip install .[sklearn,testing,onnxruntime,sentencepiece,speech,vision,timm]
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
python -c "import torch; print('Cuda version:', torch.version.cuda)"
python -c "import torch; print('CuDNN version:', torch.backends.cudnn.version())"
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
# - name: Create model files
# run: |
# source .env/bin/activate
# transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/encoder-bert-tokenizer.json --path=templates/adding_a_new_model
# transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/pt-encoder-bert-tokenizer.json --path=templates/adding_a_new_model
# transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/standalone.json --path=templates/adding_a_new_model
# transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/tf-encoder-bert-tokenizer.json --path=templates/adding_a_new_model
- name: Run all non-slow tests on GPU
env:
OMP_NUM_THREADS: 1
CUDA_VISIBLE_DEVICES: 0
run: |
source .env/bin/activate
python -m pytest -n 2 --dist=loadfile -s --make-reports=tests_torch_gpu tests
python -m pytest -n 2 --dist=loadfile --make-reports=tests_torch_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_tests_torch_gpu_test_reports
path: reports
run_tests_tf_gpu:
runs-on: [self-hosted, gpu, single-gpu]
runs-on: [self-hosted, docker-gpu, single-gpu]
timeout-minutes: 120
container:
image: tensorflow/tensorflow:2.4.1-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- uses: actions/checkout@v2
- name: Python version
run: |
which python
python --version
pip --version
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Launcher docker
uses: actions/checkout@v2
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v1.1-tests_tf_gpu-${{ hashFiles('setup.py') }}
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
- name: NVIDIA-SMI
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
nvidia-smi
- name: Install dependencies
run: |
source .env/bin/activate
pip install --upgrade pip
pip install .[tf,sklearn,testing,onnxruntime,sentencepiece]
pip install git+https://github.com/huggingface/datasets
pip install .[sklearn,testing,onnxruntime,sentencepiece]
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
- name: Create model files
run: |
source .env/bin/activate
# transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/encoder-bert-tokenizer.json --path=templates/adding_a_new_model
# transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/pt-encoder-bert-tokenizer.json --path=templates/adding_a_new_model
# transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/standalone.json --path=templates/adding_a_new_model
# transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/tf-encoder-bert-tokenizer.json --path=templates/adding_a_new_model
- name: Run all non-slow tests on GPU
env:
OMP_NUM_THREADS: 1
CUDA_VISIBLE_DEVICES: 0
TF_NUM_INTRAOP_THREADS: 8
TF_NUM_INTEROP_THREADS: 1
run: |
source .env/bin/activate
python -m pytest -n 2 --dist=loadfile -s --make-reports=tests_tf_gpu tests
python -m pytest -n 1 --dist=loadfile --make-reports=tests_tf_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_tf_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
@@ -154,58 +103,42 @@ jobs:
name: run_all_tests_tf_gpu_test_reports
path: reports
run_tests_torch_multi_gpu:
runs-on: [self-hosted, gpu, multi-gpu]
runs-on: [self-hosted, docker-gpu, multi-gpu]
container:
image: pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- uses: actions/checkout@v2
- name: Python version
- name: Launcher docker
uses: actions/checkout@v2
- name: NVIDIA-SMI
run: |
which python
python --version
pip --version
nvidia-smi
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v1.1-tests_torch_multi_gpu-${{ hashFiles('setup.py') }}
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
- name: Install dependencies
run: |
source .env/bin/activate
apt -y update && apt install -y libsndfile1-dev
pip install --upgrade pip
pip install .[torch,sklearn,testing,onnxruntime,sentencepiece]
pip install git+https://github.com/huggingface/datasets
pip install pandas torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cu102.html
pip install .[sklearn,testing,onnxruntime,sentencepiece,speech,vision,timm]
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
python -c "import torch; print('Cuda version:', torch.version.cuda)"
python -c "import torch; print('CuDNN version:', torch.backends.cudnn.version())"
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
- name: Run all non-slow tests on GPU
env:
OMP_NUM_THREADS: 1
MKL_SERVICE_FORCE_INTEL: 1
run: |
source .env/bin/activate
python -m pytest -n 2 --dist=loadfile -s --make-reports=tests_torch_multi_gpu tests
python -m pytest -n 2 --dist=loadfile --make-reports=tests_torch_multi_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_multi_gpu_failures_short.txt
run: cat reports/tests_torch_multi_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
@@ -215,52 +148,35 @@ jobs:
path: reports
run_tests_tf_multi_gpu:
runs-on: [self-hosted, gpu, multi-gpu]
runs-on: [self-hosted, docker-gpu, multi-gpu]
timeout-minutes: 120
container:
image: tensorflow/tensorflow:2.4.1-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- uses: actions/checkout@v2
- name: Python version
- name: Launcher docker
uses: actions/checkout@v2
- name: NVIDIA-SMI
run: |
which python
python --version
pip --version
nvidia-smi
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v1.1-tests_tf_multi_gpu-${{ hashFiles('setup.py') }}
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
- name: Install dependencies
run: |
source .env/bin/activate
pip install --upgrade pip
pip install .[tf,sklearn,testing,onnxruntime,sentencepiece]
pip install git+https://github.com/huggingface/datasets
pip install .[sklearn,testing,onnxruntime,sentencepiece]
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
- name: Run all non-slow tests on GPU
env:
OMP_NUM_THREADS: 1
TF_NUM_INTRAOP_THREADS: 8
TF_NUM_INTEROP_THREADS: 1
run: |
source .env/bin/activate
python -m pytest -n 2 --dist=loadfile -s --make-reports=tests_tf_multi_gpu tests
python -m pytest -n 1 --dist=loadfile --make-reports=tests_tf_multi_gpu tests
- name: Failure short reports
if: ${{ always() }}
@@ -272,4 +188,112 @@ jobs:
with:
name: run_all_tests_tf_multi_gpu_test_reports
path: reports
run_tests_torch_cuda_extensions_gpu:
runs-on: [self-hosted, docker-gpu, single-gpu]
container:
image: nvcr.io/nvidia/pytorch:21.03-py3
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Launcher docker
uses: actions/checkout@v2
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Install dependencies
run: |
apt -y update && apt install -y libaio-dev
pip install --upgrade pip
pip install .[testing,deepspeed]
- name: Are GPUs recognized by our DL frameworks
run: |
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
python -c "import torch; print('Cuda version:', torch.version.cuda)"
python -c "import torch; print('CuDNN version:', torch.backends.cudnn.version())"
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
- name: Run all tests on GPU
run: |
python -m pytest -n 1 --dist=loadfile --make-reports=tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_cuda_extensions_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_tests_torch_cuda_extensions_gpu_test_reports
path: reports
run_tests_torch_cuda_extensions_multi_gpu:
runs-on: [self-hosted, docker-gpu, multi-gpu]
container:
image: nvcr.io/nvidia/pytorch:21.03-py3
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Launcher docker
uses: actions/checkout@v2
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Install dependencies
run: |
apt -y update && apt install -y libaio-dev
pip install --upgrade pip
pip install .[testing,deepspeed,fairscale]
- name: Are GPUs recognized by our DL frameworks
run: |
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
python -c "import torch; print('Cuda version:', torch.version.cuda)"
python -c "import torch; print('CuDNN version:', torch.backends.cudnn.version())"
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
- name: Run all tests on GPU
run: |
python -m pytest -n 1 --dist=loadfile --make-reports=tests_torch_cuda_extensions_multi_gpu tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_cuda_extensions_multi_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_tests_torch_cuda_extensions_multi_gpu_test_reports
path: reports
send_results:
name: Send results to webhook
runs-on: ubuntu-latest
if: always()
needs: [
run_tests_torch_gpu,
run_tests_tf_gpu,
run_tests_torch_multi_gpu,
run_tests_tf_multi_gpu,
run_tests_torch_cuda_extensions_gpu,
run_tests_torch_cuda_extensions_multi_gpu
]
steps:
- uses: actions/checkout@v2
- uses: actions/download-artifact@v2
- name: Send message to Slack
env:
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
run: |
pip install slack_sdk
python utils/notification_service.py push

View File

@@ -1,82 +1,66 @@
# configuration notes:
#
# - `source .env/bin/activate` is currently needed to be run first thing first in each step. Otherwise
# the step uses the system-wide python interpreter.
name: Self-hosted runner (scheduled)
on:
push:
branches:
- multi_ci_*
repository_dispatch:
schedule:
- cron: "0 0 * * *"
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
RUN_SLOW: yes
OMP_NUM_THREADS: 16
MKL_NUM_THREADS: 16
jobs:
run_all_tests_torch_gpu:
runs-on: [self-hosted, gpu, single-gpu]
runs-on: [self-hosted, docker-gpu, single-gpu]
container:
image: pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- uses: actions/checkout@v2
- name: Launcher docker
uses: actions/checkout@v2
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v 1.1-slow_tests_torch_gpu-${{ hashFiles('setup.py') }}
- name: Python version
- name: NVIDIA-SMI
run: |
which python
python --version
pip --version
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
if: steps.cache.outputs.cache-hit != 'true'
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
nvidia-smi
- name: Install dependencies
run: |
source .env/bin/activate
apt -y update && apt install -y libsndfile1-dev
pip install --upgrade pip
pip install .[torch,sklearn,testing,onnxruntime,sentencepiece]
pip install git+https://github.com/huggingface/datasets
pip list
pip install .[integrations,sklearn,testing,onnxruntime,sentencepiece,speech,vision,timm]
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
python -c "import torch; print('Cuda version:', torch.version.cuda)"
python -c "import torch; print('CuDNN version:', torch.backends.cudnn.version())"
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
- name: Run all tests on GPU
env:
OMP_NUM_THREADS: 1
RUN_SLOW: yes
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s --make-reports=tests_torch_gpu tests
python -m pytest -n 1 --dist=loadfile --make-reports=tests_torch_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_gpu_failures_short.txt
- name: Run examples tests on GPU
if: ${{ always() }}
env:
OMP_NUM_THREADS: 1
OMP_NUM_THREADS: 16
MKL_NUM_THREADS: 16
RUN_SLOW: yes
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
run: |
source .env/bin/activate
pip install -r examples/_tests_requirements.txt
python -m pytest -n 1 --dist=loadfile -s --make-reports=examples_torch_gpu examples
pip install -r examples/pytorch/_tests_requirements.txt
python -m pytest -n 1 --dist=loadfile --make-reports=examples_torch_gpu examples
- name: Failure short reports
if: ${{ always() }}
@@ -85,13 +69,9 @@ jobs:
- name: Run all pipeline tests on GPU
if: ${{ always() }}
env:
TF_FORCE_GPU_ALLOW_GROWTH: "true"
OMP_NUM_THREADS: 1
RUN_SLOW: yes
RUN_PIPELINE_TESTS: yes
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s -m is_pipeline_test --make-reports=tests_torch_pipeline_gpu tests
python -m pytest -n 1 --dist=loadfile -m is_pipeline_test --make-reports=tests_torch_pipeline_gpu tests
- name: Failure short reports
if: ${{ always() }}
@@ -104,60 +84,36 @@ jobs:
name: run_all_tests_torch_gpu_test_reports
path: reports
run_all_tests_tf_gpu:
runs-on: [self-hosted, gpu, single-gpu]
runs-on: [self-hosted, docker-gpu, single-gpu]
container:
image: tensorflow/tensorflow:2.4.1-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- uses: actions/checkout@v2
- name: Launcher docker
uses: actions/checkout@v2
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v1.1-slow_tests_tf_gpu-${{ hashFiles('setup.py') }}
- name: Python version
- name: NVIDIA-SMI
run: |
which python
python --version
pip --version
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
if: steps.cache.outputs.cache-hit != 'true'
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
nvidia-smi
- name: Install dependencies
run: |
source .env/bin/activate
pip install --upgrade pip
pip install .[tf,sklearn,testing,onnxruntime,sentencepiece]
pip install git+https://github.com/huggingface/datasets
pip list
pip install .[sklearn,testing,onnx,sentencepiece]
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
- name: Run all tests on GPU
env:
OMP_NUM_THREADS: 1
RUN_SLOW: yes
TF_NUM_INTEROP_THREADS: 1
TF_NUM_INTRAOP_THREADS: 16
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s --make-reports=tests_tf_gpu tests
python -m pytest -n 1 --dist=loadfile --make-reports=tests_tf_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_tf_gpu_failures_short.txt
@@ -165,17 +121,15 @@ jobs:
- name: Run all pipeline tests on GPU
if: ${{ always() }}
env:
TF_FORCE_GPU_ALLOW_GROWTH: "true"
OMP_NUM_THREADS: 1
RUN_SLOW: yes
RUN_PIPELINE_TESTS: yes
TF_NUM_INTEROP_THREADS: 1
TF_NUM_INTRAOP_THREADS: 16
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s -m is_pipeline_test --make-reports=tests_tf_pipelines_gpu tests
python -m pytest -n 1 --dist=loadfile -m is_pipeline_test --make-reports=tests_tf_pipeline_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_tf_pipelines_gpu_failures_short.txt
run: cat reports/tests_tf_pipeline_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
@@ -183,86 +137,49 @@ jobs:
with:
name: run_all_tests_tf_gpu_test_reports
path: reports
run_all_tests_torch_multi_gpu:
runs-on: [self-hosted, gpu, multi-gpu]
runs-on: [self-hosted, docker-gpu, multi-gpu]
container:
image: pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- uses: actions/checkout@v2
- name: Launcher docker
uses: actions/checkout@v2
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v1.1-slow_tests_torch_multi_gpu-${{ hashFiles('setup.py') }}
- name: Python version
- name: NVIDIA-SMI
run: |
which python
python --version
pip --version
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
if: steps.cache.outputs.cache-hit != 'true'
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
nvidia-smi
- name: Install dependencies
run: |
source .env/bin/activate
apt -y update && apt install -y libsndfile1-dev
pip install --upgrade pip
pip install .[torch,sklearn,testing,onnxruntime,sentencepiece]
pip install git+https://github.com/huggingface/datasets
pip list
pip install .[integrations,sklearn,testing,onnxruntime,sentencepiece,speech,vision,timm]
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
python -c "import torch; print('Cuda version:', torch.version.cuda)"
python -c "import torch; print('CuDNN version:', torch.backends.cudnn.version())"
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
- name: Run all tests on multi-GPU
- name: Run all tests on GPU
env:
OMP_NUM_THREADS: 1
RUN_SLOW: yes
MKL_SERVICE_FORCE_INTEL: 1
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s --make-reports=tests_torch_multi_gpu tests
python -m pytest -n 1 --dist=loadfile --make-reports=tests_torch_multi_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_multi_gpu_failures_short.txt
- name: Run examples tests on multi-GPU
env:
OMP_NUM_THREADS: 1
RUN_SLOW: yes
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s --make-reports=tests_torch_examples_multi_gpu examples
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_examples_multi_gpu_failures_short.txt
- name: Run all pipeline tests on multi-GPU
- name: Run all pipeline tests on GPU
if: ${{ always() }}
env:
TF_FORCE_GPU_ALLOW_GROWTH: "true"
OMP_NUM_THREADS: 1
RUN_SLOW: yes
RUN_PIPELINE_TESTS: yes
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s -m is_pipeline_test --make-reports=tests_torch_pipeline_multi_gpu tests
python -m pytest -n 1 --dist=loadfile -m is_pipeline_test --make-reports=tests_torch_pipeline_multi_gpu tests
- name: Failure short reports
if: ${{ always() }}
@@ -276,73 +193,48 @@ jobs:
path: reports
run_all_tests_tf_multi_gpu:
runs-on: [self-hosted, gpu, multi-gpu]
runs-on: [self-hosted, docker-gpu, multi-gpu]
container:
image: tensorflow/tensorflow:2.4.1-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- uses: actions/checkout@v2
- name: Launcher docker
uses: actions/checkout@v2
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v1.1-slow_tests_tf_multi_gpu-${{ hashFiles('setup.py') }}
- name: Python version
- name: NVIDIA-SMI
run: |
which python
python --version
pip --version
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
if: steps.cache.outputs.cache-hit != 'true'
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
nvidia-smi
- name: Install dependencies
run: |
source .env/bin/activate
pip install --upgrade pip
pip install .[tf,sklearn,testing,onnxruntime,sentencepiece]
pip install git+https://github.com/huggingface/datasets
pip list
pip install .[sklearn,testing,onnx,sentencepiece]
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
- name: Run all tests on multi-GPU
- name: Run all tests on GPU
env:
OMP_NUM_THREADS: 1
RUN_SLOW: yes
TF_NUM_INTEROP_THREADS: 1
TF_NUM_INTRAOP_THREADS: 16
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s --make-reports=tests_tf_multi_gpu tests
python -m pytest -n 1 --dist=loadfile --make-reports=tests_tf_multi_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_tf_multi_gpu_failures_short.txt
- name: Run all pipeline tests on multi-GPU
- name: Run all pipeline tests on GPU
if: ${{ always() }}
env:
TF_FORCE_GPU_ALLOW_GROWTH: "true"
OMP_NUM_THREADS: 1
RUN_SLOW: yes
RUN_PIPELINE_TESTS: yes
TF_NUM_INTEROP_THREADS: 1
TF_NUM_INTRAOP_THREADS: 16
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s -m is_pipeline_test --make-reports=tests_tf_pipeline_multi_gpu tests
python -m pytest -n 1 --dist=loadfile -m is_pipeline_test --make-reports=tests_tf_pipeline_multi_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_tf_pipeline_multi_gpu_failures_short.txt
@@ -353,4 +245,112 @@ jobs:
with:
name: run_all_tests_tf_multi_gpu_test_reports
path: reports
run_all_tests_torch_cuda_extensions_gpu:
runs-on: [self-hosted, docker-gpu, single-gpu]
container:
image: nvcr.io/nvidia/pytorch:21.03-py3
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Launcher docker
uses: actions/checkout@v2
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Install dependencies
run: |
apt -y update && apt install -y libaio-dev
pip install --upgrade pip
pip install .[testing,deepspeed]
- name: Are GPUs recognized by our DL frameworks
run: |
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
python -c "import torch; print('Cuda version:', torch.version.cuda)"
python -c "import torch; print('CuDNN version:', torch.backends.cudnn.version())"
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
- name: Run all tests on GPU
run: |
python -m pytest -n 1 --dist=loadfile --make-reports=tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_cuda_extensions_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_tests_torch_cuda_extensions_gpu_test_reports
path: reports
run_all_tests_torch_cuda_extensions_multi_gpu:
runs-on: [self-hosted, docker-gpu, multi-gpu]
container:
image: nvcr.io/nvidia/pytorch:21.03-py3
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Launcher docker
uses: actions/checkout@v2
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Install dependencies
run: |
apt -y update && apt install -y libaio-dev
pip install --upgrade pip
pip install .[testing,deepspeed,fairscale]
- name: Are GPUs recognized by our DL frameworks
run: |
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
python -c "import torch; print('Cuda version:', torch.version.cuda)"
python -c "import torch; print('CuDNN version:', torch.backends.cudnn.version())"
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
- name: Run all tests on GPU
run: |
python -m pytest -n 1 --dist=loadfile --make-reports=tests_torch_cuda_extensions_multi_gpu tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_cuda_extensions_multi_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_tests_torch_cuda_extensions_multi_gpu_test_reports
path: reports
send_results:
name: Send results to webhook
runs-on: ubuntu-latest
if: always()
needs: [
run_all_tests_torch_gpu,
run_all_tests_tf_gpu,
run_all_tests_torch_multi_gpu,
run_all_tests_tf_multi_gpu,
run_all_tests_torch_cuda_extensions_gpu,
run_all_tests_torch_cuda_extensions_multi_gpu
]
steps:
- uses: actions/checkout@v2
- uses: actions/download-artifact@v2
- name: Send message to Slack
env:
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
run: |
pip install slack_sdk
python utils/notification_service.py scheduled

27
.github/workflows/stale.yml vendored Normal file
View File

@@ -0,0 +1,27 @@
name: Stale Bot
on:
schedule:
- cron: "0 15 * * *"
jobs:
close_stale_issues:
name: Close Stale Issues
if: github.repository == 'huggingface/transformers'
runs-on: ubuntu-latest
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v1
with:
python-version: 3.7
- name: Install requirements
run: |
pip install PyGithub
- name: Close stale issues
run: |
python scripts/stale.py

3
.gitignore vendored
View File

@@ -9,8 +9,7 @@ __pycache__/
*.so
# tests and logs
tests/fixtures/*
!tests/fixtures/sample_text_no_unicode.txt
tests/fixtures/cached_*_text.txt
logs/
lightning_logs/
lang_code_data/

View File

@@ -36,6 +36,13 @@ There are 4 ways you can contribute to transformers:
* Contributing to the examples or to the documentation;
* Submitting issues related to bugs or desired new features.
In particular there is a special [Good First
Issue](https://github.com/huggingface/transformers/contribute) listing. It will give you a list of
open Issues that are open to anybody to work on. Just comment in the issue that you'd like to work
on it. In that same listing you will also find some Issues with `Good Second Issue` label. These are
typically slightly more complicated than the Issues with just `Good First Issue` label. But if you
feel you know what you're doing, go for it.
*All are equally valuable to the community.*
## Submitting a new issue or feature request
@@ -46,7 +53,7 @@ feedback.
### Did you find a bug?
The transformers are robust and reliable thanks to the users who notify us of
The 🤗 Transformers library is robust and reliable thanks to the users who notify us of
the problems they encounter. So thank you for reporting an issue.
First, we would really appreciate it if you could **make sure the bug was not
@@ -285,7 +292,7 @@ $ python -m pytest -n auto --dist=loadfile -s -v ./tests/
and for the examples:
```bash
$ pip install -r examples/requirements.txt # only needed the first time
$ pip install -r examples/xxx/requirements.txt # only needed the first time
$ python -m pytest -n auto --dist=loadfile -s -v ./examples/
```
In fact, that's how `make test` and `make test-examples` are implemented (sans the `pip install` line)!
@@ -343,7 +350,7 @@ You can now use `make` from any terminal (Powershell, cmd.exe, etc) 🎉
### Syncing forked master with upstream (HuggingFace) master
To avoid pinging the upstream repository which adds reference notes to each upstream PR and sends unnessary notifications to the developers involved in these PRs,
To avoid pinging the upstream repository which adds reference notes to each upstream PR and sends unnessary notifications to the developers involved in these PRs,
when syncing the master branch of a forked repository, please, follow these steps:
1. When possible, avoid syncing with the upstream using a branch and PR on the forked repository. Instead merge directly into the forked master.
2. If a PR is absolutely necessary, use the following steps after checking out your branch:

View File

@@ -207,6 +207,8 @@ You are not required to read the following guidelines before opening an issue. H
Do not dispair if you can't figure it out from the begining, just share what you can and perhaps someone else will be able to help you at the forums.
If your setup involves any custom datasets, the best way to help us reproduce the problem is to create a [Google Colab notebook](https://colab.research.google.com/) that demonstrates the issue and once you verify that the issue still exists, include a link to that notebook in the Issue. Just make sure that you don't copy and paste the location bar url of the open notebook - as this is private and we won't be able to open it. Instead, you need to click on `Share` in the right upper corner of the notebook, select `Get Link` and then copy and paste the public link it will give to you.
7. If you forked off some of this project's code or example applications, please, do not ask us to go into your code repository and figure out what you may have done. The code is already very complex and unless there is an easy way to do a diff and it's a small diff, it won't be possible to find someone with time on their hands to make a lengthy investigation. Albeit, you might find someone at the forums who will be generous to do this for you.
8. Before reporting an issue, first, always try to update your environment to the latest official version of this library. We have no resources to go and debug older revisions, which could easily have bugs that have been fixed in the latest released version.

View File

@@ -1,5 +1,7 @@
.PHONY: deps_table_update modified_only_fixup extra_quality_checks quality style fixup fix-copies test test-examples docs
# make sure to test the local checkout in scripts and not the pre-installed one (don't use quotes!)
export PYTHONPATH = src
check_dirs := examples tests src utils
@@ -19,33 +21,44 @@ modified_only_fixup:
deps_table_update:
@python setup.py deps_table_update
# autogenerating code
autogenerate_code: deps_table_update
python utils/class_mapping_update.py
# Check that source code meets quality standards
extra_quality_checks: deps_table_update
extra_quality_checks:
python utils/check_copies.py
python utils/check_table.py
python utils/check_dummies.py
python utils/check_repo.py
python utils/style_doc.py src/transformers docs/source --max_len 119
python utils/check_inits.py
# this target runs checks on all files
quality:
black --check $(check_dirs)
isort --check-only $(check_dirs)
python utils/custom_init_isort.py --check_only
flake8 $(check_dirs)
python utils/style_doc.py src/transformers docs/source --max_len 119 --check_only
${MAKE} extra_quality_checks
# Format source code automatically and check is there are any problems left that need manual fixing
style: deps_table_update
extra_style_checks:
python utils/custom_init_isort.py
python utils/style_doc.py src/transformers docs/source --max_len 119
# this target runs checks on all files and potentially modifies some of them
style:
black $(check_dirs)
isort $(check_dirs)
python utils/style_doc.py src/transformers docs/source --max_len 119
${MAKE} autogenerate_code
${MAKE} extra_style_checks
# Super fast fix and check target that only works on relevant modified files since the branch was made
fixup: modified_only_fixup extra_quality_checks
fixup: modified_only_fixup extra_style_checks autogenerate_code extra_quality_checks
# Make marked copies of snippets of codes conform to the original
@@ -62,9 +75,29 @@ test:
# Run tests for examples
test-examples:
python -m pytest -n auto --dist=loadfile -s -v ./examples/
python -m pytest -n auto --dist=loadfile -s -v ./examples/pytorch/
# Run tests for SageMaker DLC release
test-sagemaker: # install sagemaker dependencies in advance with pip install .[sagemaker]
TEST_SAGEMAKER=True python -m pytest -n auto -s -v ./tests/sagemaker
# Check that docs can build
docs:
cd docs && make html SPHINXOPTS="-W -j 4"
# Release stuff
pre-release:
python utils/release.py
pre-patch:
python utils/release.py --patch
post-release:
python utils/release.py --post_release
post-patch:
python utils/release.py --post_release --patch

View File

@@ -35,49 +35,60 @@ limitations under the License.
<a href="https://github.com/huggingface/transformers/blob/master/CODE_OF_CONDUCT.md">
<img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg">
</a>
<a href="https://zenodo.org/badge/latestdoi/155220641"><img src="https://zenodo.org/badge/155220641.svg" alt="DOI"></a>
</p>
<h3 align="center">
<p>State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0
<p>State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow</p>
</h3>
🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation, etc in 100+ languages. Its aim is to make cutting-edge NLP easier to use for everyone.
<h3 align="center">
<a href="https://hf.co/course"><img src="https://raw.githubusercontent.com/huggingface/transformers/master/docs/source/imgs/course_banner.png"></a>
</h3>
🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets then share them with the community on our [model hub](https://huggingface.co/models). At the same time, each python module defining an architecture can be used as a standalone and modified to enable quick research experiments.
🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation and more in over 100 languages. Its aim is to make cutting-edge NLP easier to use for everyone.
🤗 Transformers is backed by the two most popular deep learning libraries, [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/), with a seamless integration between them, allowing you to train your models with one then load it for inference with the other.
🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our [model hub](https://huggingface.co/models). At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.
🤗 Transformers is backed by the three most popular deep learning libraries — [Jax](https://jax.readthedocs.io/en/latest/), [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/) — with a seamless integration between them. It's straightforward to train your models with one before loading them for inference with the other.
## Online demos
You can test most of our models directly on their pages from the [model hub](https://huggingface.co/models). We also offer [private model hosting, versioning, & an inference API](https://huggingface.co/pricing) to use those models.
You can test most of our models directly on their pages from the [model hub](https://huggingface.co/models). We also offer [private model hosting, versioning, & an inference API](https://huggingface.co/pricing) for public and private models.
Here are a few examples:
- [Masked word completion with BERT](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France)
- [Name Entity Recognition with Electra](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city)
- [Text generation with GPT-2](https://huggingface.co/gpt2?text=A+long+time+ago%2C+)
- [Natural Langugage Inference with RoBERTa](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal)
- [Natural Language Inference with RoBERTa](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal)
- [Summarization with BART](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct)
- [Question answering with DistilBERT](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species)
- [Translation with T5](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin)
**[Write With Transformer](https://transformer.huggingface.co)**, built by the Hugging Face team, is the official demo of this repos text generation capabilities.
## If you are looking for custom support from the Hugging Face team
<a target="_blank" href="https://huggingface.co/support">
<img alt="HuggingFace Expert Acceleration Program" src="https://huggingface.co/front/thumbnails/support.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
</a><br>
## Quick tour
To immediately use a model on a given text, we provide the `pipeline` API. Pipelines group together a pretrained model with the preprocessing that was used during that model training. Here is how to quickly use a pipeline to classify positive versus negative texts
To immediately use a model on a given text, we provide the `pipeline` API. Pipelines group together a pretrained model with the preprocessing that was used during that model's training. Here is how to quickly use a pipeline to classify positive versus negative texts:
```python
>>> from transformers import pipeline
# Allocate a pipeline for sentiment-analysis
>>> classifier = pipeline('sentiment-analysis')
>>> classifier('We are very happy to include pipeline into the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9978193640708923}]
>>> classifier('We are very happy to introduce pipeline to the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9996980428695679}]
```
The second line of code downloads and caches the pretrained model used by the pipeline, the third line evaluates it on the given text. Here the answer is "positive" with a confidence of 99.8%.
The second line of code downloads and caches the pretrained model used by the pipeline, while the third evaluates it on the given text. Here the answer is "positive" with a confidence of 99.97%.
This is another example of pipeline used for that can extract question answers from some context:
Many NLP tasks have a pre-trained `pipeline` ready to go. For example, we can easily extract question answers given context:
``` python
>>> from transformers import pipeline
@@ -86,15 +97,15 @@ This is another example of pipeline used for that can extract question answers f
>>> question_answerer = pipeline('question-answering')
>>> question_answerer({
... 'question': 'What is the name of the repository ?',
... 'context': 'Pipeline have been included in the huggingface/transformers repository'
... 'context': 'Pipeline has been included in the huggingface/transformers repository'
... })
{'score': 0.5135612454720828, 'start': 35, 'end': 59, 'answer': 'huggingface/transformers'}
{'score': 0.30970096588134766, 'start': 34, 'end': 58, 'answer': 'huggingface/transformers'}
```
On top of the answer, the pretrained model used here returned its confidence score, along with the start position and its end position in the tokenized sentence. You can learn more about the tasks supported by the `pipeline` API in [this tutorial](https://huggingface.co/transformers/task_summary.html).
In addition to the answer, the pretrained model used here returned its confidence score, along with the start position and end position of the answer in the tokenized sentence. You can learn more about the tasks supported by the `pipeline` API in [this tutorial](https://huggingface.co/transformers/task_summary.html).
To download and use any of the pretrained models on your given task, you just need to use those three lines of codes (PyTorch version):
To download and use any of the pretrained models on your given task, all it takes is three lines of code. Here is the PyTorch version:
```python
>>> from transformers import AutoTokenizer, AutoModel
@@ -104,7 +115,7 @@ To download and use any of the pretrained models on your given task, you just ne
>>> inputs = tokenizer("Hello world!", return_tensors="pt")
>>> outputs = model(**inputs)
```
or for TensorFlow:
And here is the equivalent code for TensorFlow:
```python
>>> from transformers import AutoTokenizer, TFAutoModel
@@ -115,9 +126,9 @@ or for TensorFlow:
>>> outputs = model(**inputs)
```
The tokenizer is responsible for all the preprocessing the pretrained model expects, and can be called directly on one (or list) of texts (as we can see on the fourth line of both code examples). It will output a dictionary you can directly pass to your model (which is done on the fifth line).
The tokenizer is responsible for all the preprocessing the pretrained model expects, and can be called directly on a single string (as in the above examples) or a list. It will output a dictionary that you can use in downstream code or simply directly pass to your model using the ** argument unpacking operator.
The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) or a [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) (depending on your backend) which you can use normally. For instance, [this tutorial](https://huggingface.co/transformers/training.html) explains how to integrate such a model in classic PyTorch or TensorFlow training loop, or how to use our `Trainer` API to quickly fine-tune the on a new dataset.
The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) or a [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) (depending on your backend) which you can use normally. [This tutorial](https://huggingface.co/transformers/training.html) explains how to integrate such a model into a classic PyTorch or TensorFlow training loop, or how to use our `Trainer` API to quickly fine-tune on a new dataset.
## Why should I use transformers?
@@ -135,16 +146,16 @@ The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/sta
1. Choose the right framework for every part of a model's lifetime:
- Train state-of-the-art models in 3 lines of code.
- Move a single model between TF2.0/PyTorch frameworks at will.
- Seamlessly pick the right framework for training, evaluation, production.
- Seamlessly pick the right framework for training, evaluation and production.
1. Easily customize a model or an example to your needs:
- Examples for each architecture to reproduce the results by the official authors of said architecture.
- Expose the models internal as consistently as possible.
- We provide examples for each architecture to reproduce the results published by its original authors.
- Model internals are exposed as consistently as possible.
- Model files can be used independently of the library for quick experiments.
## Why shouldn't I use transformers?
- This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving in additional abstractions/files.
- This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving into additional abstractions/files.
- The training API is not intended to work on any model but is optimized to work with the models provided by the library. For generic machine learning loops, you should use another library.
- While we strive to present as many use cases as possible, the scripts in our [examples folder](https://github.com/huggingface/transformers/tree/master/examples) are just that: examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs.
@@ -152,16 +163,16 @@ The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/sta
### With pip
This repository is tested on Python 3.6+, PyTorch 1.0.0+ (PyTorch 1.3.1+ for [examples](https://github.com/huggingface/transformers/tree/master/examples)) and TensorFlow 2.0.
This repository is tested on Python 3.6+, Flax 0.3.2+, PyTorch 1.3.1+ and TensorFlow 2.3+.
You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
First, create a virtual environment with the version of Python you're going to use and activate it.
Then, you will need to install at least one of TensorFlow 2.0, PyTorch or Flax.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available), [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform and/or [Flax installation page](https://github.com/google/flax#quick-install).
Then, you will need to install at least one of Flax, PyTorch or TensorFlow.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/), [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) and/or [Flax installation page](https://github.com/google/flax#quick-install) regarding the specific install command for your platform.
When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows:
When one of those backends has been installed, 🤗 Transformers can be installed using pip as follows:
```bash
pip install transformers
@@ -179,9 +190,9 @@ Since Transformers version v4.0.0, we now have a conda channel: `huggingface`.
conda install -c huggingface transformers
```
Follow the installation pages of TensorFlow, PyTorch or Flax to see how to install them with conda.
Follow the installation pages of Flax, PyTorch or TensorFlow to see how to install them with conda.
## Models architectures
## Model architectures
**[All the model checkpoints](https://huggingface.co/models)** provided by 🤗 Transformers are seamlessly integrated from the huggingface.co [model hub](https://huggingface.co) where they are uploaded directly by [users](https://huggingface.co/users) and [organizations](https://huggingface.co/organizations).
@@ -194,13 +205,21 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[BARThez](https://huggingface.co/transformers/model_doc/barthez.html)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
1. **[BERT](https://huggingface.co/transformers/model_doc/bert.html)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
1. **[BERT For Sequence Generation](https://huggingface.co/transformers/model_doc/bertgeneration.html)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[BigBird-RoBERTa](https://huggingface.co/transformers/model_doc/bigbird.html)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[BigBird-Pegasus](https://huggingface.co/transformers/model_doc/bigbird_pegasus.html)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](https://huggingface.co/transformers/model_doc/blenderbot.html)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](https://huggingface.co/transformers/model_doc/blenderbot_small.html)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BORT](https://huggingface.co/transformers/model_doc/bort.html)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](https://huggingface.co/transformers/model_doc/byt5.html)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](https://huggingface.co/transformers/model_doc/camembert.html)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CLIP](https://huggingface.co/transformers/model_doc/clip.html)** from (OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[ConvBERT](https://huggingface.co/transformers/model_doc/convbert.html)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
1. **[CPM](https://huggingface.co/transformers/model_doc/cpm.html)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CTRL](https://huggingface.co/transformers/model_doc/ctrl.html)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[DeBERTa](https://huggingface.co/transformers/model_doc/deberta.html)** (from Microsoft Research) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa](https://huggingface.co/transformers/model_doc/deberta.html)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/transformers/model_doc/deberta_v2.html)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeiT](https://huggingface.co/transformers/model_doc/deit.html)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
1. **[DETR](https://huggingface.co/transformers/model_doc/detr.html)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
1. **[DialoGPT](https://huggingface.co/transformers/model_doc/dialogpt.html)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](https://huggingface.co/transformers/model_doc/distilbert.html)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
1. **[DPR](https://huggingface.co/transformers/model_doc/dpr.html)** (from Facebook) released with the paper [Dense Passage Retrieval
@@ -211,32 +230,45 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[Funnel Transformer](https://huggingface.co/transformers/model_doc/funnel.html)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GPT](https://huggingface.co/transformers/model_doc/gpt.html)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT-2](https://huggingface.co/transformers/model_doc/gpt2.html)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[GPT Neo](https://huggingface.co/transformers/model_doc/gpt_neo.html)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
1. **[Hubert](https://huggingface.co/transformers/model_doc/hubert.html)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
1. **[I-BERT](https://huggingface.co/transformers/model_doc/ibert.html)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
1. **[LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[LED](https://huggingface.co/transformers/model_doc/led.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[Longformer](https://huggingface.co/transformers/model_doc/longformer.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LUKE](https://huggingface.co/transformers/model_doc/luke.html)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
1. **[LXMERT](https://huggingface.co/transformers/model_doc/lxmert.html)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[M2M100](https://huggingface.co/transformers/model_doc/m2m_100.html)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
1. **[MarianMT](https://huggingface.co/transformers/model_doc/marian.html)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MBart](https://huggingface.co/transformers/model_doc/mbart.html)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[MBart-50](https://huggingface.co/transformers/model_doc/mbart.html)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[Megatron-BERT](https://huggingface.co/transformers/model_doc/megatron_bert.html)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[Megatron-GPT2](https://huggingface.co/transformers/model_doc/megatron_gpt2.html)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[MPNet](https://huggingface.co/transformers/model_doc/mpnet.html)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/transformers/model_doc/mt5.html)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Pegasus](https://huggingface.co/transformers/model_doc/pegasus.html)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[ProphetNet](https://huggingface.co/transformers/model_doc/prophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[Reformer](https://huggingface.co/transformers/model_doc/reformer.html)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RoBERTa](https://huggingface.co/transformers/model_doc/roberta.html)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
1. **[RoFormer](https://huggingface.co/transformers/model_doc/roformer.html)** (from ZhuiyiTechnology), released together with the paper a [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
1. **[SpeechToTextTransformer](https://huggingface.co/transformers/model_doc/speech_to_text.html)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
1. **[SqueezeBert](https://huggingface.co/transformers/model_doc/squeezebert.html)** released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[T5](https://huggingface.co/transformers/model_doc/t5.html)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[TAPAS](https://huggingface.co/transformers/model_doc/tapas.html)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
1. **[Transformer-XL](https://huggingface.co/transformers/model_doc/transformerxl.html)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[Vision Transformer (ViT)](https://huggingface.co/transformers/model_doc/vit.html)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
1. **[VisualBERT](https://huggingface.co/transformers/model_doc/visual_bert.html)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
1. **[Wav2Vec2](https://huggingface.co/transformers/model_doc/wav2vec2.html)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
1. **[XLM](https://huggingface.co/transformers/model_doc/xlm.html)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](https://huggingface.co/transformers/model_doc/xlmprophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
1. **[XLNet](https://huggingface.co/transformers/model_doc/xlnet.html)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[XLSR-Wav2Vec2](https://huggingface.co/transformers/model_doc/xlsr_wav2vec2.html)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
1. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
To check if each model has an implementation in PyTorch/TensorFlow/Flax or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to [this table](https://huggingface.co/transformers/index.html#bigtable)
To check if each model has an implementation in Flax, PyTorch or TensorFlow, or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to [this table](https://huggingface.co/transformers/index.html#supported-frameworks).
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations. You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
These implementations have been tested on several datasets (see the example scripts) and should match the performance of the original implementations. You can find more details on performance in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
## Learn more

View File

@@ -53,7 +53,7 @@ RUN git clone https://github.com/huggingface/transformers.git && \
git checkout CI && \
cd .. && \
pip install ./transformers && \
pip install -r ./transformers/examples/requirements.txt && \
pip install -r ./transformers/examples/pytorch/_test_requirements.txt && \
pip install pytest
RUN python -c "import torch_xla; print(torch_xla.__version__)"

View File

@@ -27,7 +27,7 @@ local bertBaseCased = base.BaseTest {
},
command: utils.scriptCommand(
|||
python -m pytest -s transformers/examples/test_xla_examples.py -v
python -m pytest -s transformers/examples/pytorch/test_xla_examples.py -v
test_exit_code=$?
echo "\nFinished running commands.\n"
test $test_exit_code -eq 0

View File

@@ -26,7 +26,7 @@ pip install -e ".[docs]"
---
**NOTE**
You only need to generate the documentation to inspect it locally (if you're planning changes and want to
You only need to generate the documentation to inspect it locally (if you're planning changes and want to
check how they look like before committing for instance). You don't have to commit the built documentation.
---
@@ -65,7 +65,7 @@ make html
```
A folder called ``_build/html`` should have been created. You can now open the file ``_build/html/index.html`` in your
browser.
browser.
---
**NOTE**
@@ -95,15 +95,15 @@ following these steps:
expand them).
- Click on "details" next to the `ci/circleci: build_doc` check.
- In the new window, click on the "Artifacts" tab.
- Locate the file "docs/_build/html/index.html" (or any specific page you want to check) and click on it to get a
- Locate the file "docs/_build/html/index.html" (or any specific page you want to check) and click on it to get a
preview.
## Writing Documentation - Specification
The `huggingface/transformers` documentation follows the
[Google documentation](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) style. It is
mostly written in ReStructuredText
([Sphinx simple documentation](https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html),
mostly written in ReStructuredText
([Sphinx simple documentation](https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html),
[Sourceforge complete documentation](https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html)).
@@ -121,8 +121,8 @@ four.
### Adding a new model
When adding a new model:
- Create a file `xxx.rst` under `./source/model_doc` (don't hesitate to copy an existing file as template).
- Create a file `xxx.rst` under `./source/model_doc` (don't hesitate to copy an existing file as template).
- Link that file in `./source/index.rst` on the `model_doc` toc-tree.
- Write a short overview of the model:
- Overview with paper & authors
@@ -130,8 +130,8 @@ When adding a new model:
- Tips and tricks and how to use it best
- Add the classes that should be linked in the model. This generally includes the configuration, the tokenizer, and
every model of that class (the base model, alongside models with additional heads), both in PyTorch and TensorFlow.
The order is generally:
- Configuration,
The order is generally:
- Configuration,
- Tokenizer
- PyTorch base model
- PyTorch head models
@@ -179,7 +179,7 @@ Links should be done as so (note the double underscore at the end): \`text for t
#### Defining arguments in a method
Arguments should be defined with the `Args:` prefix, followed by a line return and an indentation.
Arguments should be defined with the `Args:` prefix, followed by a line return and an indentation.
The argument should be followed by its type, with its shape if it is a tensor, and a line return.
Another indentation is necessary before writing the description of the argument.
@@ -216,9 +216,9 @@ then its documentation should look like this:
Note that we always omit the "defaults to :obj:\`None\`" when None is the default for any argument. Also note that even
if the first line describing your argument type and its default gets long, you can't break it on several lines. You can
however write as many lines as you want in the indented description (see the example above with `input_ids`).
however write as many lines as you want in the indented description (see the example above with `input_ids`).
#### Writing a multi-line code block
#### Writing a multi-line code block
Multi-line code blocks can be useful for displaying examples. They are done like so:
@@ -237,7 +237,7 @@ the results stay consistent with the library.
#### Writing a return block
Arguments should be defined with the `Args:` prefix, followed by a line return and an indentation.
Arguments should be defined with the `Args:` prefix, followed by a line return and an indentation.
The first line should be the type of the return, followed by a line return. No need to indent further for the elements
building the return.
@@ -258,3 +258,43 @@ Here's an example for a single value return:
Returns:
:obj:`List[int]`: A list of integers in the range [0, 1] --- 1 for a special token, 0 for a sequence token.
```
#### Adding a new section
In ReST section headers are designated as such with the help of a line of underlying characters, e.g.,:
```
Section 1
^^^^^^^^^^^^^^^^^^
Sub-section 1
~~~~~~~~~~~~~~~~~~
```
ReST allows the use of any characters to designate different section levels, as long as they are used consistently within the same document. For details see [sections doc](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#sections). Because there is no standard different documents often end up using different characters for the same levels which makes it very difficult to know which character to use when creating a new section.
Specifically, if when running `make docs` you get an error like:
```
docs/source/main_classes/trainer.rst:127:Title level inconsistent:
```
you picked an inconsistent character for some of the levels.
But how do you know which characters you must use for an already existing level or when adding a new level?
You can use this helper script:
```
perl -ne '/^(.)\1{100,}/ && do { $h{$1}=++$c if !$h{$1} }; END { %h = reverse %h ; print "$_ $h{$_}\n" for sort keys %h}' docs/source/main_classes/trainer.rst
1 -
2 ~
3 ^
4 =
5 "
```
This tells you which characters have already been assigned for each level.
So using this particular example's output -- if your current section's header uses `=` as its underline character, you now know you're at level 4, and if you want to add a sub-section header you know you want `"` as it'd level 5.
If you needed to add yet another sub-level, then pick a character that is not used already. That is you must pick a character that is not in the output of that script.
Here is the full list of characters that can be used in this context: `= - ` : ' " ~ ^ _ * + # < >`

View File

@@ -1,10 +1,15 @@
// These two things need to be updated at each release for the version selector.
// Last stable version
const stableVersion = "v4.2.0"
const stableVersion = "v4.7.0"
// Dictionary doc folder to label. The last stable version should have an empty key.
const versionMapping = {
"master": "master",
"": "v4.2.0/v4.2.1 (stable)",
"": "v4.7.0 (stable)",
"v4.6.0": "v4.6.0",
"v4.5.1": "v4.5.0/v4.5.1",
"v4.4.2": "v4.4.0/v4.4.1/v4.4.2",
"v4.3.3": "v4.3.0/v4.3.1/v4.3.2/v4.3.3",
"v4.2.2": "v4.2.0/v4.2.1/v4.2.2",
"v4.1.1": "v4.1.0/v4.1.1",
"v4.0.1": "v4.0.0/v4.0.1",
"v3.5.1": "v3.5.0/v3.5.1",
@@ -60,7 +65,7 @@ function addIcon() {
function addCustomFooter() {
const customFooter = document.createElement("div");
const questionOrIssue = document.createElement("div");
questionOrIssue.innerHTML = "Stuck? Read our <a href='https://medium.com/huggingface'>Blog posts</a> or <a href='https://github.com/huggingface/transformers'>Create an issue</a>";
questionOrIssue.innerHTML = "Stuck? Read our <a href='https://huggingface.co/blog'>Blog posts</a> or <a href='https://github.com/huggingface/transformers'>Create an issue</a>";
customFooter.appendChild(questionOrIssue);
customFooter.classList.add("footer");
@@ -127,11 +132,11 @@ function addVersionControl() {
const parts = location.toString().split('/');
let versionIndex = parts.length - 2;
// Index page may not have a last part with filename.html so we need to go up
if (parts[parts.length - 1] != "" && ! parts[parts.length - 1].match(/\.html$|^search.html?/)) {
if (parts[parts.length - 1] != "" && ! parts[parts.length - 1].match(/\.html/)) {
versionIndex = parts.length - 1;
}
// Main classes and models are nested so we need to go deeper
else if (parts[versionIndex] == "main_classes" || parts[versionIndex] == "model_doc") {
else if (parts[versionIndex] == "main_classes" || parts[versionIndex] == "model_doc" || parts[versionIndex] == "internal") {
versionIndex = versionIndex - 1;
}
const version = parts[versionIndex];

View File

@@ -388,7 +388,7 @@ Next, you can finally start adding new code to 🤗 Transformers. Go into the cl
::
cd transformers
cd transformers
In the special case that you are adding a model whose architecture exactly matches the model architecture of an
existing model you only have to add a conversion script as described in `this section <#write-a-conversion-script>`__.
@@ -417,27 +417,27 @@ You should do the following:
::
git checkout -b add_brand_new_bert
git checkout -b add_brand_new_bert
2. Commit the automatically generated code:
::
git add .
git commit
git add .
git commit
3. Fetch and rebase to current master
::
git fetch upstream
git rebase upstream/master
git fetch upstream
git rebase upstream/master
4. Push the changes to your account using:
::
git push -u origin a-descriptive-name-for-my-changes
git push -u origin a-descriptive-name-for-my-changes
5. Once you are satisfied, go to the webpage of your fork on GitHub. Click on “Pull request”. Make sure to add the
GitHub handle of some members of the Hugging Face team as reviewers, so that the Hugging Face team gets notified for
@@ -451,8 +451,8 @@ time to time by doing:
::
git fetch upstream
git merge upstream/master
git fetch upstream
git merge upstream/master
In general, all questions you might have regarding the model or your implementation should be asked in your PR and
discussed/solved in the PR. This way, the Hugging Face team will always be notified when you are committing new code or
@@ -518,7 +518,7 @@ PyTorch, called ``SimpleModel`` as follows:
.. code:: python
import torch.nn as nn
from torch import nn
class SimpleModel(nn.Module):
def __init__(self):

View File

@@ -65,10 +65,10 @@ respectively.
.. code-block:: bash
## PYTORCH CODE
python examples/benchmarking/run_benchmark.py --help
python examples/pytorch/benchmarking/run_benchmark.py --help
## TENSORFLOW CODE
python examples/benchmarking/run_benchmark_tf.py --help
python examples/tensorflow/benchmarking/run_benchmark_tf.py --help
An instantiated benchmark object can then simply be run by calling ``benchmark.run()``.
@@ -358,4 +358,6 @@ available `here
<https://docs.google.com/spreadsheets/d/1sryqufw2D0XlUH4sq3e9Wnxu5EAQkaohzrJbd5HdQ_w/edit?usp=sharing>`__.
With the new `benchmark` tools, it is easier than ever to share your benchmark results with the community
:prefix_link:`here <examples/benchmarking/README.md>`.
- :prefix_link:`PyTorch Benchmarking Results<examples/pytorch/benchmarking/README.md>`.
- :prefix_link:`TensorFlow Benchmarking Results<examples/tensorflow/benchmarking/README.md>`.

View File

@@ -18,8 +18,8 @@ This page regroups resources around 🤗 Transformers developed by the community
| [Fine-tune DialoGPT on New Datasets and Languages](https://github.com/ncoop57/i-am-a-nerd/blob/master/_notebooks/2020-05-12-chatbot-part-1.ipynb) | How to fine-tune the DialoGPT model on a new dataset for open-dialog conversational chatbots | [Nathan Cooper](https://github.com/ncoop57) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ncoop57/i-am-a-nerd/blob/master/_notebooks/2020-05-12-chatbot-part-1.ipynb) |
| [Long Sequence Modeling with Reformer](https://github.com/patrickvonplaten/notebooks/blob/master/PyTorch_Reformer.ipynb) | How to train on sequences as long as 500,000 tokens with Reformer | [Patrick von Platen](https://github.com/patrickvonplaten) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/PyTorch_Reformer.ipynb) |
| [Fine-tune BART for Summarization](https://github.com/ohmeow/ohmeow_website/blob/master/_notebooks/2020-05-23-text-generation-with-blurr.ipynb) | How to fine-tune BART for summarization with fastai using blurr | [Wayde Gilliam](https://ohmeow.com/) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ohmeow/ohmeow_website/blob/master/_notebooks/2020-05-23-text-generation-with-blurr.ipynb) |
| [Fine-tune a pre-trained Transformer on anyone's tweets](https://colab.research.google.com/github/borisdayma/huggingtweets/blob/master/huggingtweets-demo.ipynb) | How to generate tweets in the style of your favorite Twitter account by fine-tune a GPT-2 model | [Boris Dayma](https://github.com/borisdayma) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/borisdayma/huggingtweets/blob/master/huggingtweets-demo.ipynb) |
| [A Step by Step Guide to Tracking Hugging Face Model Performance](https://colab.research.google.com/drive/1NEiqNPhiouu2pPwDAVeFoN4-vTYMz9F8) | A quick tutorial for training NLP models with HuggingFace and & visualizing their performance with Weights & Biases | [Jack Morris](https://github.com/jxmorris12) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1NEiqNPhiouu2pPwDAVeFoN4-vTYMz9F8) |
| [Fine-tune a pre-trained Transformer on anyone's tweets](https://colab.research.google.com/github/borisdayma/huggingtweets/blob/master/huggingtweets-demo.ipynb) | How to generate tweets in the style of your favorite Twitter account by fine-tuning a GPT-2 model | [Boris Dayma](https://github.com/borisdayma) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/borisdayma/huggingtweets/blob/master/huggingtweets-demo.ipynb) |
| [Optimize 🤗 Hugging Face models with Weights & Biases](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/huggingface/Optimize_Hugging_Face_models_with_Weights_%26_Biases.ipynb) | A complete tutorial showcasing W&B integration with Hugging Face | [Boris Dayma](https://github.com/borisdayma) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/huggingface/Optimize_Hugging_Face_models_with_Weights_%26_Biases.ipynb) |
| [Pretrain Longformer](https://github.com/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb) | How to build a "long" version of existing pretrained models | [Iz Beltagy](https://beltagy.net) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb) |
| [Fine-tune Longformer for QA](https://github.com/patil-suraj/Notebooks/blob/master/longformer_qa_training.ipynb) | How to fine-tune longformer model for QA task | [Suraj Patil](https://github.com/patil-suraj) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/longformer_qa_training.ipynb) |
| [Evaluate Model with 🤗nlp](https://github.com/patrickvonplaten/notebooks/blob/master/How_to_evaluate_Longformer_on_TriviaQA_using_NLP.ipynb) | How to evaluate longformer on TriviaQA with `nlp` | [Patrick von Platen](https://github.com/patrickvonplaten) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1m7eTGlPmLRgoPkkA7rkhQdZ9ydpmsdLE?usp=sharing) |
@@ -30,6 +30,7 @@ This page regroups resources around 🤗 Transformers developed by the community
|[Speed up Fine-Tuning in Transformers with Dynamic Padding / Bucketing](https://github.com/ELS-RD/transformers-notebook/blob/master/Divide_Hugging_Face_Transformers_training_time_by_2_or_more.ipynb)|How to speed up fine-tuning by a factor of 2 using dynamic padding / bucketing|[Michael Benesty](https://github.com/pommedeterresautee) |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1CBfRU1zbfu7-ijiOqAAQUA-RJaxfcJoO?usp=sharing)|
|[Pretrain Reformer for Masked Language Modeling](https://github.com/patrickvonplaten/notebooks/blob/master/Reformer_For_Masked_LM.ipynb)| How to train a Reformer model with bi-directional self-attention layers | [Patrick von Platen](https://github.com/patrickvonplaten) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tzzh0i8PgDQGV3SMFUGxM7_gGae3K-uW?usp=sharing)|
|[Expand and Fine Tune Sci-BERT](https://github.com/lordtt13/word-embeddings/blob/master/COVID-19%20Research%20Data/COVID-SciBERT.ipynb)| How to increase vocabulary of a pretrained SciBERT model from AllenAI on the CORD dataset and pipeline it. | [Tanmay Thakur](https://github.com/lordtt13) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1rqAR40goxbAfez1xvF3hBJphSCsvXmh8)|
|[Fine Tune BlenderBotSmall for Summarization using the Trainer API](https://github.com/lordtt13/transformers-experiments/blob/master/Custom%20Tasks/fine-tune-blenderbot_small-for-summarization.ipynb)| How to fine tune BlenderBotSmall for summarization on a custom dataset, using the Trainer API. | [Tanmay Thakur](https://github.com/lordtt13) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/19Wmupuls7mykSGyRN_Qo6lPQhgp56ymq?usp=sharing)|
|[Fine-tune Electra and interpret with Integrated Gradients](https://github.com/elsanns/xai-nlp-notebooks/blob/master/electra_fine_tune_interpret_captum_ig.ipynb) | How to fine-tune Electra for sentiment analysis and interpret predictions with Captum Integrated Gradients | [Eliza Szczechla](https://elsanns.github.io) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elsanns/xai-nlp-notebooks/blob/master/electra_fine_tune_interpret_captum_ig.ipynb)|
|[fine-tune a non-English GPT-2 Model with Trainer class](https://github.com/philschmid/fine-tune-GPT-2/blob/master/Fine_tune_a_non_English_GPT_2_Model_with_Huggingface.ipynb) | How to fine-tune a non-English GPT-2 Model with Trainer class | [Philipp Schmid](https://www.philschmid.de) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/philschmid/fine-tune-GPT-2/blob/master/Fine_tune_a_non_English_GPT_2_Model_with_Huggingface.ipynb)|
|[Fine-tune a DistilBERT Model for Multi Label Classification task](https://github.com/DhavalTaunk08/Transformers_scripts/blob/master/Transformers_multilabel_distilbert.ipynb) | How to fine-tune a DistilBERT Model for Multi Label Classification task | [Dhaval Taunk](https://github.com/DhavalTaunk08) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DhavalTaunk08/Transformers_scripts/blob/master/Transformers_multilabel_distilbert.ipynb)|
@@ -47,3 +48,16 @@ This page regroups resources around 🤗 Transformers developed by the community
|[Fine-Tune LED on up to 8K tokens](https://github.com/patrickvonplaten/notebooks/blob/master/Fine_tune_Longformer_Encoder_Decoder_(LED)_for_Summarization_on_pubmed.ipynb) | How to fine-tune LED on pubmed for long-range summarization | [Patrick von Platen](https://github.com/patrickvonplaten) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_tune_Longformer_Encoder_Decoder_(LED)_for_Summarization_on_pubmed.ipynb)|
|[Evaluate LED on Arxiv](https://github.com/patrickvonplaten/notebooks/blob/master/LED_on_Arxiv.ipynb) | How to effectively evaluate LED on long-range summarization | [Patrick von Platen](https://github.com/patrickvonplaten) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/LED_on_Arxiv.ipynb)|
|[Fine-tune LayoutLM on RVL-CDIP (a document image classification dataset)](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForSequenceClassification_on_RVL_CDIP.ipynb) | How to fine-tune *LayoutLMForSequenceClassification* on the RVL-CDIP dataset for scanned document classification | [Niels Rogge](https://github.com/nielsrogge) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForSequenceClassification_on_RVL_CDIP.ipynb)|
|[Wav2Vec2 CTC decoding with GPT2 adjustment](https://github.com/voidful/huggingface_notebook/blob/main/xlsr_gpt.ipynb) | How to decode CTC sequence with language model adjustment | [Eric Lam](https://github.com/voidful) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1e_z5jQHYbO2YKEaUgzb1ww1WwiAyydAj?usp=sharing)|
|[Fine-tune BART for summarization in two languages with Trainer class](https://github.com/elsanns/xai-nlp-notebooks/blob/master/fine_tune_bart_summarization_two_langs.ipynb) | How to fine-tune BART for summarization in two languages with Trainer class | [Eliza Szczechla](https://github.com/elsanns) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elsanns/xai-nlp-notebooks/blob/master/fine_tune_bart_summarization_two_langs.ipynb)|
|[Evaluate Big Bird on Trivia QA](https://github.com/patrickvonplaten/notebooks/blob/master/Evaluating_Big_Bird_on_TriviaQA.ipynb) | How to evaluate BigBird on long document question answering on Trivia QA | [Patrick von Platen](https://github.com/patrickvonplaten) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Evaluating_Big_Bird_on_TriviaQA.ipynb)|
| [Create video captions using Wav2Vec2](https://github.com/Muennighoff/ytclipcc/blob/main/wav2vec_youtube_captions.ipynb) | How to create YouTube captions from any video by transcribing the audio with Wav2Vec | [Niklas Muennighoff](https://github.com/Muennighoff) |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Muennighoff/ytclipcc/blob/main/wav2vec_youtube_captions.ipynb) |
| [Fine-tune the Vision Transformer on CIFAR-10 using PyTorch Lightning](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/VisionTransformer/Fine_tuning_the_Vision_Transformer_on_CIFAR_10_with_PyTorch_Lightning.ipynb) | How to fine-tune the Vision Transformer (ViT) on CIFAR-10 using HuggingFace Transformers, Datasets and PyTorch Lightning | [Niels Rogge](https://github.com/nielsrogge) |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/VisionTransformer/Fine_tuning_the_Vision_Transformer_on_CIFAR_10_with_PyTorch_Lightning.ipynb) |
| [Fine-tune the Vision Transformer on CIFAR-10 using the 🤗 Trainer](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/VisionTransformer/Fine_tuning_the_Vision_Transformer_on_CIFAR_10_with_the_%F0%9F%A4%97_Trainer.ipynb) | How to fine-tune the Vision Transformer (ViT) on CIFAR-10 using HuggingFace Transformers, Datasets and the 🤗 Trainer | [Niels Rogge](https://github.com/nielsrogge) |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/VisionTransformer/Fine_tuning_the_Vision_Transformer_on_CIFAR_10_with_the_%F0%9F%A4%97_Trainer.ipynb) |
| [Evaluate LUKE on Open Entity, an entity typing dataset](https://github.com/studio-ousia/luke/blob/master/notebooks/huggingface_open_entity.ipynb) | How to evaluate *LukeForEntityClassification* on the Open Entity dataset | [Ikuya Yamada](https://github.com/ikuyamada) |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/studio-ousia/luke/blob/master/notebooks/huggingface_open_entity.ipynb) |
| [Evaluate LUKE on TACRED, a relation extraction dataset](https://github.com/studio-ousia/luke/blob/master/notebooks/huggingface_tacred.ipynb) | How to evaluate *LukeForEntityPairClassification* on the TACRED dataset | [Ikuya Yamada](https://github.com/ikuyamada) |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/studio-ousia/luke/blob/master/notebooks/huggingface_tacred.ipynb) |
| [Evaluate LUKE on CoNLL-2003, an important NER benchmark](https://github.com/studio-ousia/luke/blob/master/notebooks/huggingface_conll_2003.ipynb) | How to evaluate *LukeForEntitySpanClassification* on the CoNLL-2003 dataset | [Ikuya Yamada](https://github.com/ikuyamada) |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/studio-ousia/luke/blob/master/notebooks/huggingface_conll_2003.ipynb) |
| [Evaluate BigBird-Pegasus on PubMed dataset](https://github.com/vasudevgupta7/bigbird/blob/main/notebooks/bigbird_pegasus_evaluation.ipynb) | How to evaluate *BigBirdPegasusForConditionalGeneration* on PubMed dataset | [Vasudev Gupta](https://github.com/vasudevgupta7) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vasudevgupta7/bigbird/blob/main/notebooks/bigbird_pegasus_evaluation.ipynb) |
| [Speech Emotion Classification with Wav2Vec2](https://github/m3hrdadfi/soxan/blob/main/notebooks/Emotion_recognition_in_Greek_speech_using_Wav2Vec2.ipynb) | How to leverage a pretrained Wav2Vec2 model for Emotion Classification on the MEGA dataset | [Mehrdad Farahani](https://github.com/m3hrdadfi) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/m3hrdadfi/soxan/blob/main/notebooks/Emotion_recognition_in_Greek_speech_using_Wav2Vec2.ipynb) |
| [Detect objects in an image with DETR](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/DETR_minimal_example_(with_DetrFeatureExtractor).ipynb) | How to use a trained *DetrForObjectDetection* model to detect objects in an image and visualize attention | [Niels Rogge](https://github.com/NielsRogge) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/DETR/DETR_minimal_example_(with_DetrFeatureExtractor).ipynb) |
| [Fine-tune DETR on a custom object detection dataset](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb) | How to fine-tune *DetrForObjectDetection* on a custom object detection dataset | [Niels Rogge](https://github.com/NielsRogge) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb) |

View File

@@ -14,21 +14,25 @@
#
import os
import sys
sys.path.insert(0, os.path.abspath('../../src'))
sys.path.insert(0, os.path.abspath("../../src"))
# -- Project information -----------------------------------------------------
project = u'transformers'
copyright = u'2020, The Hugging Face Team, Licenced under the Apache License, Version 2.0'
author = u'huggingface'
project = "transformers"
copyright = "2020, The Hugging Face Team, Licenced under the Apache License, Version 2.0"
author = "huggingface"
# The short X.Y version
version = u''
version = ""
# The full version, including alpha/beta/rc tags
release = u'4.2.0'
release = u'4.7.0'
# Prefix link to point to master, comment this during version release and uncomment below line
extlinks = {'prefix_link': ('https://github.com/huggingface/transformers/blob/master/%s', '')}
extlinks = {"prefix_link": ("https://github.com/huggingface/transformers/blob/master/%s", "")}
# Prefix link to always point to corresponding version, uncomment this during version release
# extlinks = {'prefix_link': ('https://github.com/huggingface/transformers/blob/v'+ release + '/%s', '')}
@@ -42,27 +46,28 @@ extlinks = {'prefix_link': ('https://github.com/huggingface/transformers/blob/ma
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.extlinks',
'sphinx.ext.coverage',
'sphinx.ext.napoleon',
'recommonmark',
'sphinx.ext.viewcode',
'sphinx_markdown_tables',
'sphinx_copybutton'
"sphinx.ext.autodoc",
"sphinx.ext.extlinks",
"sphinx.ext.coverage",
"sphinx.ext.napoleon",
"recommonmark",
"sphinx.ext.viewcode",
"sphinx_markdown_tables",
"sphinxext.opengraph",
"sphinx_copybutton",
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
templates_path = ["_templates"]
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
source_suffix = ['.rst', '.md']
source_suffix = [".rst", ".md"]
# source_suffix = '.rst'
# The master toctree document.
master_doc = 'index'
master_doc = "index"
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
@@ -74,7 +79,7 @@ language = None
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = [u'_build', 'Thumbs.db', '.DS_Store']
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = None
@@ -88,20 +93,30 @@ copybutton_prompt_is_regexp = True
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'
html_theme = "sphinx_rtd_theme"
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#
html_theme_options = {
'analytics_id': 'UA-83738774-2'
}
html_theme_options = {"analytics_id": "UA-83738774-2", "navigation_with_keys": True}
# Configuration for OpenGraph and Twitter Card Tags.
# These are responsible for creating nice shareable social images https://ahrefs.com/blog/open-graph-meta-tags/
# https://ogp.me/#type_website
ogp_image = "https://huggingface.co/front/thumbnails/transformers.png"
ogp_description = "State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0. Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation, etc in 100+ languages. Its aim is to make cutting-edge NLP easier to use for everyone"
ogp_description_length = 160
ogp_custom_meta_tags = [
f'<meta name="twitter:image" content="{ogp_image}">',
f'<meta name="twitter:description" content="{ogp_description}">',
]
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
html_static_path = ["_static"]
# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
@@ -113,17 +128,17 @@ html_static_path = ['_static']
#
# html_sidebars = {}
# This must be the name of an image file (path relative to the configuration
# directory) that is the favicon of the docs. Modern browsers use this as
# the icon for tabs, windows and bookmarks. It should be a Windows-style
# This must be the name of an image file (path relative to the configuration
# directory) that is the favicon of the docs. Modern browsers use this as
# the icon for tabs, windows and bookmarks. It should be a Windows-style
# icon file (.ico).
html_favicon = 'favicon.ico'
html_favicon = "favicon.ico"
# -- Options for HTMLHelp output ---------------------------------------------
# Output file base name for HTML help builder.
htmlhelp_basename = 'transformersdoc'
htmlhelp_basename = "transformersdoc"
# -- Options for LaTeX output ------------------------------------------------
@@ -132,15 +147,12 @@ latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#
# 'preamble': '',
# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
@@ -150,8 +162,7 @@ latex_elements = {
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'transformers.tex', u'transformers Documentation',
u'huggingface', 'manual'),
(master_doc, "transformers.tex", "transformers Documentation", "huggingface", "manual"),
]
@@ -159,10 +170,7 @@ latex_documents = [
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'transformers', u'transformers Documentation',
[author], 1)
]
man_pages = [(master_doc, "transformers", "transformers Documentation", [author], 1)]
# -- Options for Texinfo output ----------------------------------------------
@@ -171,9 +179,15 @@ man_pages = [
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'transformers', u'transformers Documentation',
author, 'transformers', 'One line description of project.',
'Miscellaneous'),
(
master_doc,
"transformers",
"transformers Documentation",
author,
"transformers",
"One line description of project.",
"Miscellaneous",
),
]
@@ -192,11 +206,13 @@ epub_title = project
# epub_uid = ''
# A list of files that should not be packed into the epub file.
epub_exclude_files = ['search.html']
epub_exclude_files = ["search.html"]
def setup(app):
app.add_css_file('css/huggingface.css')
app.add_css_file('css/code-snippets.css')
app.add_js_file('js/custom.js')
app.add_css_file("css/huggingface.css")
app.add_css_file("css/code-snippets.css")
app.add_js_file("js/custom.js")
# -- Extension configuration -------------------------------------------------

View File

@@ -33,8 +33,8 @@ You can convert any TensorFlow checkpoint for BERT (in particular `the pre-train
This CLI takes as input a TensorFlow checkpoint (three files starting with ``bert_model.ckpt``\ ) and the associated
configuration file (\ ``bert_config.json``\ ), and creates a PyTorch model for this configuration, loads the weights
from the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that
can be imported using ``from_pretrained()`` (see example in :doc:`quicktour` , `run_glue.py
<https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_glue.py>`_\ ).
can be imported using ``from_pretrained()`` (see example in :doc:`quicktour` , :prefix_link:`run_glue.py
<examples/pytorch/text-classification/run_glue.py>` \ ).
You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow
checkpoint (the three files starting with ``bert_model.ckpt``\ ) but be sure to keep the configuration file (\
@@ -47,12 +47,12 @@ Here is an example of the conversion process for a pre-trained ``BERT-Base Uncas
.. code-block:: shell
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
transformers-cli convert --model_type bert \
--tf_checkpoint $BERT_BASE_DIR/bert_model.ckpt \
--config $BERT_BASE_DIR/bert_config.json \
--pytorch_dump_output $BERT_BASE_DIR/pytorch_model.bin
transformers-cli convert --model_type bert \
--tf_checkpoint $BERT_BASE_DIR/bert_model.ckpt \
--config $BERT_BASE_DIR/bert_config.json \
--pytorch_dump_output $BERT_BASE_DIR/pytorch_model.bin
You can download Google's pre-trained models for the conversion `here
<https://github.com/google-research/bert#pre-trained-models>`__.
@@ -72,12 +72,12 @@ Here is an example of the conversion process for the pre-trained ``ALBERT Base``
.. code-block:: shell
export ALBERT_BASE_DIR=/path/to/albert/albert_base
export ALBERT_BASE_DIR=/path/to/albert/albert_base
transformers-cli convert --model_type albert \
--tf_checkpoint $ALBERT_BASE_DIR/model.ckpt-best \
--config $ALBERT_BASE_DIR/albert_config.json \
--pytorch_dump_output $ALBERT_BASE_DIR/pytorch_model.bin
transformers-cli convert --model_type albert \
--tf_checkpoint $ALBERT_BASE_DIR/model.ckpt-best \
--config $ALBERT_BASE_DIR/albert_config.json \
--pytorch_dump_output $ALBERT_BASE_DIR/pytorch_model.bin
You can download Google's pre-trained models for the conversion `here
<https://github.com/google-research/albert#pre-trained-models>`__.
@@ -91,13 +91,13 @@ save as the same format than OpenAI pretrained model (see `here <https://github.
.. code-block:: shell
export OPENAI_GPT_CHECKPOINT_FOLDER_PATH=/path/to/openai/pretrained/numpy/weights
export OPENAI_GPT_CHECKPOINT_FOLDER_PATH=/path/to/openai/pretrained/numpy/weights
transformers-cli convert --model_type gpt \
--tf_checkpoint $OPENAI_GPT_CHECKPOINT_FOLDER_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config OPENAI_GPT_CONFIG] \
[--finetuning_task_name OPENAI_GPT_FINETUNED_TASK] \
transformers-cli convert --model_type gpt \
--tf_checkpoint $OPENAI_GPT_CHECKPOINT_FOLDER_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config OPENAI_GPT_CONFIG] \
[--finetuning_task_name OPENAI_GPT_FINETUNED_TASK] \
OpenAI GPT-2
@@ -108,13 +108,13 @@ Here is an example of the conversion process for a pre-trained OpenAI GPT-2 mode
.. code-block:: shell
export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/gpt2/pretrained/weights
export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/gpt2/pretrained/weights
transformers-cli convert --model_type gpt2 \
--tf_checkpoint $OPENAI_GPT2_CHECKPOINT_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config OPENAI_GPT2_CONFIG] \
[--finetuning_task_name OPENAI_GPT2_FINETUNED_TASK]
transformers-cli convert --model_type gpt2 \
--tf_checkpoint $OPENAI_GPT2_CHECKPOINT_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config OPENAI_GPT2_CONFIG] \
[--finetuning_task_name OPENAI_GPT2_FINETUNED_TASK]
Transformer-XL
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -124,13 +124,13 @@ Here is an example of the conversion process for a pre-trained Transformer-XL mo
.. code-block:: shell
export TRANSFO_XL_CHECKPOINT_FOLDER_PATH=/path/to/transfo/xl/checkpoint
export TRANSFO_XL_CHECKPOINT_FOLDER_PATH=/path/to/transfo/xl/checkpoint
transformers-cli convert --model_type transfo_xl \
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_FOLDER_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config TRANSFO_XL_CONFIG] \
[--finetuning_task_name TRANSFO_XL_FINETUNED_TASK]
transformers-cli convert --model_type transfo_xl \
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_FOLDER_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config TRANSFO_XL_CONFIG] \
[--finetuning_task_name TRANSFO_XL_FINETUNED_TASK]
XLNet
@@ -140,14 +140,14 @@ Here is an example of the conversion process for a pre-trained XLNet model:
.. code-block:: shell
export TRANSFO_XL_CHECKPOINT_PATH=/path/to/xlnet/checkpoint
export TRANSFO_XL_CONFIG_PATH=/path/to/xlnet/config
export TRANSFO_XL_CHECKPOINT_PATH=/path/to/xlnet/checkpoint
export TRANSFO_XL_CONFIG_PATH=/path/to/xlnet/config
transformers-cli convert --model_type xlnet \
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_PATH \
--config $TRANSFO_XL_CONFIG_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--finetuning_task_name XLNET_FINETUNED_TASK] \
transformers-cli convert --model_type xlnet \
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_PATH \
--config $TRANSFO_XL_CONFIG_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--finetuning_task_name XLNET_FINETUNED_TASK] \
XLM
@@ -157,13 +157,13 @@ Here is an example of the conversion process for a pre-trained XLM model:
.. code-block:: shell
export XLM_CHECKPOINT_PATH=/path/to/xlm/checkpoint
export XLM_CHECKPOINT_PATH=/path/to/xlm/checkpoint
transformers-cli convert --model_type xlm \
--tf_checkpoint $XLM_CHECKPOINT_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT
[--config XML_CONFIG] \
[--finetuning_task_name XML_FINETUNED_TASK]
transformers-cli convert --model_type xlm \
--tf_checkpoint $XLM_CHECKPOINT_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT
[--config XML_CONFIG] \
[--finetuning_task_name XML_FINETUNED_TASK]
T5
@@ -173,9 +173,9 @@ Here is an example of the conversion process for a pre-trained T5 model:
.. code-block:: shell
export T5=/path/to/t5/uncased_L-12_H-768_A-12
export T5=/path/to/t5/uncased_L-12_H-768_A-12
transformers-cli convert --model_type t5 \
--tf_checkpoint $T5/t5_model.ckpt \
--config $T5/t5_config.json \
--pytorch_dump_output $T5/pytorch_model.bin
transformers-cli convert --model_type t5 \
--tf_checkpoint $T5/t5_model.ckpt \
--config $T5/t5_config.json \
--pytorch_dump_output $T5/pytorch_model.bin

View File

@@ -15,10 +15,10 @@ Fine-tuning with custom datasets
.. note::
The datasets used in this tutorial are available and can be more easily accessed using the `🤗 NLP library
<https://github.com/huggingface/nlp>`_. We do not use this library to access the datasets here since this tutorial
meant to illustrate how to work with your own data. A brief of introduction can be found at the end of the tutorial
in the section ":ref:`nlplib`".
The datasets used in this tutorial are available and can be more easily accessed using the `🤗 Datasets library
<https://github.com/huggingface/datasets>`_. We do not use this library to access the datasets here since this
tutorial meant to illustrate how to work with your own data. A brief of introduction can be found at the end of the
tutorial in the section ":ref:`datasetslib`".
This tutorial will take you through several examples of using 🤗 Transformers models with your own datasets. The guide
shows one of many valid workflows for using these models and is meant to be illustrative rather than definitive. We
@@ -41,7 +41,7 @@ Sequence Classification with IMDb Reviews
.. note::
This dataset can be explored in the Hugging Face model hub (`IMDb <https://huggingface.co/datasets/imdb>`_), and
can be alternatively downloaded with the 🤗 NLP library with ``load_dataset("imdb")``.
can be alternatively downloaded with the 🤗 Datasets library with ``load_dataset("imdb")``.
In this example, we'll show how to download, tokenize, and train a model on the IMDb reviews dataset. This task takes
the text of a review and requires the model to predict whether the sentiment of the review is positive or negative.
@@ -260,7 +260,7 @@ Token Classification with W-NUT Emerging Entities
.. note::
This dataset can be explored in the Hugging Face model hub (`WNUT-17 <https://huggingface.co/datasets/wnut_17>`_),
and can be alternatively downloaded with the 🤗 NLP library with ``load_dataset("wnut_17")``.
and can be alternatively downloaded with the 🤗 Datasets library with ``load_dataset("wnut_17")``.
Next we will look at token classification. Rather than classifying an entire sequence, this task classifies token by
token. We'll demonstrate how to do this with `Named Entity Recognition
@@ -459,7 +459,7 @@ Question Answering with SQuAD 2.0
.. note::
This dataset can be explored in the Hugging Face model hub (`SQuAD V2
<https://huggingface.co/datasets/squad_v2>`_), and can be alternatively downloaded with the 🤗 NLP library with
<https://huggingface.co/datasets/squad_v2>`_), and can be alternatively downloaded with the 🤗 Datasets library with
``load_dataset("squad_v2")``.
Question answering comes in many forms. In this example, we'll look at the particular type of extractive QA that
@@ -558,15 +558,14 @@ we can use the built in :func:`~transformers.BatchEncoding.char_to_token` method
end_positions = []
for i in range(len(answers)):
start_positions.append(encodings.char_to_token(i, answers[i]['answer_start']))
end_positions.append(encodings.char_to_token(i, answers[i]['answer_end']))
end_positions.append(encodings.char_to_token(i, answers[i]['answer_end'] - 1))
# if start position is None, the answer passage has been truncated
if start_positions[-1] is None:
start_positions[-1] = tokenizer.model_max_length
# if end position is None, the 'char_to_token' function points to the space before the correct token - > add + 1
if end_positions[-1] is None:
end_positions[-1] = encodings.char_to_token(i, answers[i]['answer_end'] + 1)
end_positions[-1] = tokenizer.model_max_length
encodings.update({'start_positions': start_positions, 'end_positions': end_positions})
add_token_positions(train_encodings, train_answers)
@@ -678,22 +677,23 @@ Additional Resources
- :doc:`Preprocessing <preprocessing>`. Docs page on data preprocessing.
- :doc:`Training <training>`. Docs page on training and fine-tuning.
.. _nlplib:
.. _datasetslib:
Using the 🤗 NLP Datasets & Metrics library
Using the 🤗 Datasets & Metrics library
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This tutorial demonstrates how to read in datasets from various raw text formats and prepare them for training with 🤗
Transformers so that you can do the same thing with your own custom datasets. However, we recommend users use the `🤗
NLP library <https://github.com/huggingface/nlp>`_ for working with the 150+ datasets included in the `hub
Datasets library <https://github.com/huggingface/datasets>`_ for working with the 150+ datasets included in the `hub
<https://huggingface.co/datasets>`_, including the three datasets used in this tutorial. As a very brief overview, we
will show how to use the NLP library to download and prepare the IMDb dataset from the first example, :ref:`seq_imdb`.
will show how to use the Datasets library to download and prepare the IMDb dataset from the first example,
:ref:`seq_imdb`.
Start by downloading the dataset:
.. code-block:: python
from nlp import load_dataset
from datasets import load_dataset
train = load_dataset("imdb", split="train")
Each dataset has multiple columns corresponding to different features. Let's see what our columns are.
@@ -725,5 +725,5 @@ dataset elements.
>>> {key: val.shape for key, val in train[0].items()})
{'labels': TensorShape([]), 'input_ids': TensorShape([512]), 'attention_mask': TensorShape([512])}
We now have a fully-prepared dataset. Check out `the 🤗 NLP docs <https://huggingface.co/nlp/processing.html>`_ for a
more thorough introduction.
We now have a fully-prepared dataset. Check out `the 🤗 Datasets docs
<https://huggingface.co/docs/datasets/processing.html>`_ for a more thorough introduction.

295
docs/source/debugging.rst Normal file
View File

@@ -0,0 +1,295 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Debugging
=======================================================================================================================
Underflow and Overflow Detection
-----------------------------------------------------------------------------------------------------------------------
.. note::
This feature is currently available for PyTorch-only.
.. note::
This feature can be used with any ``nn.Module``-based model
If you start getting ``loss=NaN`` or the model inhibits some other abnormal behavior due to ``inf`` or ``nan`` in
activations or weights one needs to discover where the first underflow or overflow happens and what led to it. Luckily
you can accomplish that easily by activating a special module that will do the detection automatically.
If you're using :class:`~transformers.Trainer`, you just need to add:
.. code-block:: bash
--debug underflow_overflow
to the normal command line arguments, or pass ``debug="underflow_overflow"`` when creating the
:class:`~transformers.TrainingArguments` object.
If you're using your own training loop or another Trainer you can accomplish the same with:
.. code-block:: python
from .debug_utils import DebugUnderflowOverflow
debug_overflow = DebugUnderflowOverflow(model)
:class:`~transformers.debug_utils.DebugUnderflowOverflow` inserts hooks into the model that immediately after each
forward call will test input and output variables and also the corresponding module's weights. As soon as ``inf`` or
``nan`` is detected in at least one element of the activations or weights, the program will assert and print a report
like this (this was caught with ``google/mt5-small`` under fp16 mixed precision):
.. code-block::
Detected inf/nan during batch_number=0
Last 21 forward frames:
abs min abs max metadata
encoder.block.1.layer.1.DenseReluDense.dropout Dropout
0.00e+00 2.57e+02 input[0]
0.00e+00 2.85e+02 output
[...]
encoder.block.2.layer.0 T5LayerSelfAttention
6.78e-04 3.15e+03 input[0]
2.65e-04 3.42e+03 output[0]
None output[1]
2.25e-01 1.00e+04 output[2]
encoder.block.2.layer.1.layer_norm T5LayerNorm
8.69e-02 4.18e-01 weight
2.65e-04 3.42e+03 input[0]
1.79e-06 4.65e+00 output
encoder.block.2.layer.1.DenseReluDense.wi_0 Linear
2.17e-07 4.50e+00 weight
1.79e-06 4.65e+00 input[0]
2.68e-06 3.70e+01 output
encoder.block.2.layer.1.DenseReluDense.wi_1 Linear
8.08e-07 2.66e+01 weight
1.79e-06 4.65e+00 input[0]
1.27e-04 2.37e+02 output
encoder.block.2.layer.1.DenseReluDense.dropout Dropout
0.00e+00 8.76e+03 input[0]
0.00e+00 9.74e+03 output
encoder.block.2.layer.1.DenseReluDense.wo Linear
1.01e-06 6.44e+00 weight
0.00e+00 9.74e+03 input[0]
3.18e-04 6.27e+04 output
encoder.block.2.layer.1.DenseReluDense T5DenseGatedGeluDense
1.79e-06 4.65e+00 input[0]
3.18e-04 6.27e+04 output
encoder.block.2.layer.1.dropout Dropout
3.18e-04 6.27e+04 input[0]
0.00e+00 inf output
The example output has been trimmed in the middle for brevity.
The second column shows the value of the absolute largest element, so if you have a closer look at the last few frames,
the inputs and outputs were in the range of ``1e4``. So when this training was done under fp16 mixed precision the very
last step overflowed (since under ``fp16`` the largest number before ``inf`` is ``64e3``). To avoid overflows under
``fp16`` the activations must remain way below ``1e4``, because ``1e4 * 1e4 = 1e8`` so any matrix multiplication with
large activations is going to lead to a numerical overflow condition.
At the very start of the trace you can discover at which batch number the problem occurred (here ``Detected inf/nan
during batch_number=0`` means the problem occurred on the first batch).
Each reported frame starts by declaring the fully qualified entry for the corresponding module this frame is reporting
for. If we look just at this frame:
.. code-block::
encoder.block.2.layer.1.layer_norm T5LayerNorm
8.69e-02 4.18e-01 weight
2.65e-04 3.42e+03 input[0]
1.79e-06 4.65e+00 output
Here, ``encoder.block.2.layer.1.layer_norm`` indicates that it was a layer norm for the first layer, of the second
block of the encoder. And the specific calls of the ``forward`` is ``T5LayerNorm``.
Let's look at the last few frames of that report:
.. code-block::
Detected inf/nan during batch_number=0
Last 21 forward frames:
abs min abs max metadata
[...]
encoder.block.2.layer.1.DenseReluDense.wi_0 Linear
2.17e-07 4.50e+00 weight
1.79e-06 4.65e+00 input[0]
2.68e-06 3.70e+01 output
encoder.block.2.layer.1.DenseReluDense.wi_1 Linear
8.08e-07 2.66e+01 weight
1.79e-06 4.65e+00 input[0]
1.27e-04 2.37e+02 output
encoder.block.2.layer.1.DenseReluDense.wo Linear
1.01e-06 6.44e+00 weight
0.00e+00 9.74e+03 input[0]
3.18e-04 6.27e+04 output
encoder.block.2.layer.1.DenseReluDense T5DenseGatedGeluDense
1.79e-06 4.65e+00 input[0]
3.18e-04 6.27e+04 output
encoder.block.2.layer.1.dropout Dropout
3.18e-04 6.27e+04 input[0]
0.00e+00 inf output
The last frame reports for ``Dropout.forward`` function with the first entry for the only input and the second for the
only output. You can see that it was called from an attribute ``dropout`` inside ``DenseReluDense`` class. We can see
that it happened during the first layer, of the 2nd block, during the very first batch. Finally, the absolute largest
input elements was ``6.27e+04`` and same for the output was ``inf``.
You can see here, that ``T5DenseGatedGeluDense.forward`` resulted in output activations, whose absolute max value was
around 62.7K, which is very close to fp16's top limit of 64K. In the next frame we have ``Dropout`` which renormalizes
the weights, after it zeroed some of the elements, which pushes the absolute max value to more than 64K, and we get an
overlow (``inf``).
As you can see it's the previous frames that we need to look into when the numbers start going into very large for fp16
numbers.
Let's match the report to the code from ``models/t5/modeling_t5.py``:
.. code-block:: python
class T5DenseGatedGeluDense(nn.Module):
def __init__(self, config):
super().__init__()
self.wi_0 = nn.Linear(config.d_model, config.d_ff, bias=False)
self.wi_1 = nn.Linear(config.d_model, config.d_ff, bias=False)
self.wo = nn.Linear(config.d_ff, config.d_model, bias=False)
self.dropout = nn.Dropout(config.dropout_rate)
self.gelu_act = ACT2FN["gelu_new"]
def forward(self, hidden_states):
hidden_gelu = self.gelu_act(self.wi_0(hidden_states))
hidden_linear = self.wi_1(hidden_states)
hidden_states = hidden_gelu * hidden_linear
hidden_states = self.dropout(hidden_states)
hidden_states = self.wo(hidden_states)
return hidden_states
Now it's easy to see the ``dropout`` call, and all the previous calls as well.
Since the detection is happening in a forward hook, these reports are printed immediately after each ``forward``
returns.
Going back to the full report, to act on it and to fix the problem, we need to go a few frames up where the numbers
started to go up and most likely switch to the ``fp32`` mode here, so that the numbers don't overflow when multiplied
or summed up. Of course, there might be other solutions. For example, we could turn off ``amp`` temporarily if it's
enabled, after moving the original ``forward`` into a helper wrapper, like so:
.. code-block:: python
def _forward(self, hidden_states):
hidden_gelu = self.gelu_act(self.wi_0(hidden_states))
hidden_linear = self.wi_1(hidden_states)
hidden_states = hidden_gelu * hidden_linear
hidden_states = self.dropout(hidden_states)
hidden_states = self.wo(hidden_states)
return hidden_states
import torch
def forward(self, hidden_states):
if torch.is_autocast_enabled():
with torch.cuda.amp.autocast(enabled=False):
return self._forward(hidden_states)
else:
return self._forward(hidden_states)
Since the automatic detector only reports on inputs and outputs of full frames, once you know where to look, you may
want to analyse the intermediary stages of any specific ``forward`` function as well. In such a case you can use the
``detect_overflow`` helper function to inject the detector where you want it, for example:
.. code-block:: python
from debug_utils import detect_overflow
class T5LayerFF(nn.Module):
[...]
def forward(self, hidden_states):
forwarded_states = self.layer_norm(hidden_states)
detect_overflow(forwarded_states, "after layer_norm")
forwarded_states = self.DenseReluDense(forwarded_states)
detect_overflow(forwarded_states, "after DenseReluDense")
return hidden_states + self.dropout(forwarded_states)
You can see that we added 2 of these and now we track if ``inf`` or ``nan`` for ``forwarded_states`` was detected
somewhere in between.
Actually, the detector already reports these because each of the calls in the example above is a `nn.Module``, but
let's say if you had some local direct calculations this is how you'd do that.
Additionally, if you're instantiating the debugger in your own code, you can adjust the number of frames printed from
its default, e.g.:
.. code-block:: python
from .debug_utils import DebugUnderflowOverflow
debug_overflow = DebugUnderflowOverflow(model, max_frames_to_save=100)
Specific batch absolute mix and max value tracing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The same debugging class can be used for per-batch tracing with the underflow/overflow detection feature turned off.
Let's say you want to watch the absolute min and max values for all the ingredients of each ``forward`` call of a given
batch, and only do that for batches 1 and 3. Then you instantiate this class as:
.. code-block:: python
debug_overflow = DebugUnderflowOverflow(model, trace_batch_nums=[1,3])
And now full batches 1 and 3 will be traced using the same format as the underflow/overflow detector does.
Batches are 0-indexed.
This is helpful if you know that the program starts misbehaving after a certain batch number, so you can fast-forward
right to that area. Here is a sample truncated output for such configuration:
.. code-block::
*** Starting batch number=1 ***
abs min abs max metadata
shared Embedding
1.01e-06 7.92e+02 weight
0.00e+00 2.47e+04 input[0]
5.36e-05 7.92e+02 output
[...]
decoder.dropout Dropout
1.60e-07 2.27e+01 input[0]
0.00e+00 2.52e+01 output
decoder T5Stack
not a tensor output
lm_head Linear
1.01e-06 7.92e+02 weight
0.00e+00 1.11e+00 input[0]
6.06e-02 8.39e+01 output
T5ForConditionalGeneration
not a tensor output
*** Starting batch number=3 ***
abs min abs max metadata
shared Embedding
1.01e-06 7.92e+02 weight
0.00e+00 2.78e+04 input[0]
5.36e-05 7.92e+02 output
[...]
Here you will get a huge number of frames dumped - as many as there were forward calls in your model, so it may or may
not what you want, but sometimes it can be easier to use for debugging purposes than a normal debugger. For example, if
a problem starts happening at batch number 150. So you can dump traces for batches 149 and 150 and compare where
numbers started to diverge.
You can also specify the batch number after which to stop the training, with:
.. code-block:: python
debug_overflow = DebugUnderflowOverflow(model, trace_batch_nums=[1,3], abort_after_batch_num=3)

View File

@@ -0,0 +1,62 @@
Using tokenizers from 🤗 Tokenizers
=======================================================================================================================
The :class:`~transformers.PreTrainedTokenizerFast` depends on the `tokenizers
<https://huggingface.co/docs/tokenizers>`__ library. The tokenizers obtained from the 🤗 Tokenizers library can be
loaded very simply into 🤗 Transformers.
Before getting in the specifics, let's first start by creating a dummy tokenizer in a few lines:
.. code-block::
>>> from tokenizers import Tokenizer
>>> from tokenizers.models import BPE
>>> from tokenizers.trainers import BpeTrainer
>>> from tokenizers.pre_tokenizers import Whitespace
>>> tokenizer = Tokenizer(BPE(unk_token="[UNK]"))
>>> trainer = BpeTrainer(special_tokens=["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"])
>>> tokenizer.pre_tokenizer = Whitespace()
>>> files = [...]
>>> tokenizer.train(files, trainer)
We now have a tokenizer trained on the files we defined. We can either continue using it in that runtime, or save it to
a JSON file for future re-use.
Loading directly from the tokenizer object
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Let's see how to leverage this tokenizer object in the 🤗 Transformers library. The
:class:`~transformers.PreTrainedTokenizerFast` class allows for easy instantiation, by accepting the instantiated
`tokenizer` object as an argument:
.. code-block::
>>> from transformers import PreTrainedTokenizerFast
>>> fast_tokenizer = PreTrainedTokenizerFast(tokenizer_object=tokenizer)
This object can now be used with all the methods shared by the 🤗 Transformers tokenizers! Head to :doc:`the tokenizer
page <main_classes/tokenizer>` for more information.
Loading from a JSON file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In order to load a tokenizer from a JSON file, let's first start by saving our tokenizer:
.. code-block::
>>> tokenizer.save("tokenizer.json")
The path to which we saved this file can be passed to the :class:`~transformers.PreTrainedTokenizerFast` initialization
method using the :obj:`tokenizer_file` parameter:
.. code-block::
>>> from transformers import PreTrainedTokenizerFast
>>> fast_tokenizer = PreTrainedTokenizerFast(tokenizer_file="tokenizer.json")
This object can now be used with all the methods shared by the 🤗 Transformers tokenizers! Head to :doc:`the tokenizer
page <main_classes/tokenizer>` for more information.

View File

@@ -21,6 +21,7 @@ General terms
- CLM: causal language modeling, a pretraining task where the model reads the texts in order and has to predict the
next word. It's usually done by reading the whole sentence but using a mask inside the model to hide the future
tokens at a certain timestep.
- deep learning: machine learning algorithms which uses neural networks with several layers.
- MLM: masked language modeling, a pretraining task where the model sees a corrupted version of the texts, usually done
by masking some tokens randomly, and has to predict the original text.
- multimodal: a task that combines texts with another kind of inputs (for instance images).
@@ -33,10 +34,12 @@ General terms
involve a self-supervised objective, which can be reading the text and trying to predict the next word (see CLM) or
masking some words and trying to predict them (see MLM).
- RNN: recurrent neural network, a type of model that uses a loop over a layer to process texts.
- self-attention: each element of the input finds out which other elements of the input they should attend to.
- seq2seq or sequence-to-sequence: models that generate a new sequence from an input, like translation models, or
summarization models (such as :doc:`Bart </model_doc/bart>` or :doc:`T5 </model_doc/t5>`).
- token: a part of a sentence, usually a word, but can also be a subword (non-common words are often split in subwords)
or a punctuation symbol.
- transformer: self-attention based deep learning model architecture.
Model inputs
-----------------------------------------------------------------------------------------------------------------------
@@ -52,6 +55,12 @@ Input IDs
The input ids are often the only required parameters to be passed to the model as input. *They are token indices,
numerical representations of tokens building the sequences that will be used as input by the model*.
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/VFp38yj8h3A" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
Each tokenizer works differently but the underlying mechanism remains the same. Here's an example using the BERT
tokenizer, which is a `WordPiece <https://arxiv.org/pdf/1609.08144.pdf>`__ tokenizer:
@@ -117,8 +126,15 @@ because this is the way a :class:`~transformers.BertModel` is going to expect it
Attention mask
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The attention mask is an optional argument used when batching sequences together. This argument indicates to the model
which tokens should be attended to, and which should not.
The attention mask is an optional argument used when batching sequences together.
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/M6adb1j2jPI" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
This argument indicates to the model which tokens should be attended to, and which should not.
For example, consider these two sequences:
@@ -172,14 +188,21 @@ in the dictionary returned by the tokenizer under the key "attention_mask":
Token Type IDs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some models' purpose is to do sequence classification or question answering. These require two different sequences to
be joined in a single "input_ids" entry, which usually is performed with the help of special tokens, such as the
classifier (``[CLS]``) and separator (``[SEP]``) tokens. For example, the BERT model builds its two sequence input as
such:
Some models' purpose is to do classification on pairs of sentences or question answering.
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/0u3ioSwev3s" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
These require two different sequences to be joined in a single "input_ids" entry, which usually is performed with the
help of special tokens, such as the classifier (``[CLS]``) and separator (``[SEP]``) tokens. For example, the BERT
model builds its two sequence input as such:
.. code-block::
>>> # [CLS] SEQUENCE_A [SEP] SEQUENCE_B [SEP]
>>> # [CLS] SEQUENCE_A [SEP] SEQUENCE_B [SEP]
We can use our tokenizer to automatically generate such a sentence by passing the two sequences to ``tokenizer`` as two
arguments (and not a list, like before) like this:

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 145 KiB

After

Width:  |  Height:  |  Size: 691 KiB

View File

@@ -1,14 +1,25 @@
Transformers
=======================================================================================================================
State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
State-of-the-art Natural Language Processing for Jax, Pytorch and TensorFlow
🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between Jax,
PyTorch and TensorFlow.
This is the documentation of our repository `transformers <https://github.com/huggingface/transformers>`_.
This is the documentation of our repository `transformers <https://github.com/huggingface/transformers>`__. You can
also follow our `online course <https://huggingface.co/course>`__ that teaches how to use this library, as well as the
other libraries developed by Hugging Face and the Hub.
If you are looking for custom support from the Hugging Face team
-----------------------------------------------------------------------------------------------------------------------
.. raw:: html
<a target="_blank" href="https://huggingface.co/support">
<img alt="HuggingFace Expert Acceleration Program" src="https://huggingface.co/front/thumbnails/support.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
</a><br>
Features
-----------------------------------------------------------------------------------------------------------------------
@@ -22,7 +33,7 @@ State-of-the-art NLP for everyone:
- Hands-on practitioners
- AI/ML/NLP teachers and educators
..
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
@@ -43,11 +54,11 @@ Lower compute costs, smaller carbon footprint:
Choose the right framework for every part of a model's lifetime:
- Train state-of-the-art models in 3 lines of code
- Deep interoperability between TensorFlow 2.0 and PyTorch models
- Move a single model between TF2.0/PyTorch frameworks at will
- Deep interoperability between Jax, Pytorch and TensorFlow models
- Move a single model between Jax/PyTorch/TensorFlow frameworks at will
- Seamlessly pick the right framework for training, evaluation, production
Experimental support for Flax with a few models right now, expected to grow in the coming months.
The support for Jax is still experimental (with a few models right now), expect to see it grow in the coming months!
`All the model checkpoints <https://huggingface.co/models>`__ are seamlessly integrated from the huggingface.co `model
hub <https://huggingface.co>`__ where they are uploaded directly by `users <https://huggingface.co/users>`__ and
@@ -74,8 +85,11 @@ The documentation is organized in five parts:
- **MODELS** for the classes and functions related to each model implemented in the library.
- **INTERNAL HELPERS** for the classes and functions we use internally.
The library currently contains PyTorch, Tensorflow and Flax implementations, pretrained model weights, usage scripts
and conversion utilities for the following models:
The library currently contains Jax, PyTorch and Tensorflow implementations, pretrained model weights, usage scripts and
conversion utilities for the following models.
Supported models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
..
This list is updated automatically from the README with `make fix-copies`. Do not update manually!
@@ -97,121 +111,188 @@ and conversion utilities for the following models:
5. :doc:`BERT For Sequence Generation <model_doc/bertgeneration>` (from Google) released with the paper `Leveraging
Pre-trained Checkpoints for Sequence Generation Tasks <https://arxiv.org/abs/1907.12461>`__ by Sascha Rothe, Shashi
Narayan, Aliaksei Severyn.
6. :doc:`Blenderbot <model_doc/blenderbot>` (from Facebook) released with the paper `Recipes for building an
6. :doc:`BigBird-RoBERTa <model_doc/bigbird>` (from Google Research) released with the paper `Big Bird: Transformers
for Longer Sequences <https://arxiv.org/abs/2007.14062>`__ by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua
Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
7. :doc:`BigBird-Pegasus <model_doc/bigbird_pegasus>` (from Google Research) released with the paper `Big Bird:
Transformers for Longer Sequences <https://arxiv.org/abs/2007.14062>`__ by Manzil Zaheer, Guru Guruganesh, Avinava
Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
8. :doc:`Blenderbot <model_doc/blenderbot>` (from Facebook) released with the paper `Recipes for building an
open-domain chatbot <https://arxiv.org/abs/2004.13637>`__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary
Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
7. :doc:`BlenderbotSmall <model_doc/blenderbot_small>` (from Facebook) released with the paper `Recipes for building an
9. :doc:`BlenderbotSmall <model_doc/blenderbot_small>` (from Facebook) released with the paper `Recipes for building an
open-domain chatbot <https://arxiv.org/abs/2004.13637>`__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary
Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
8. :doc:`BORT <model_doc/bort>` (from Alexa) released with the paper `Optimal Subarchitecture Extraction For BERT
<https://arxiv.org/abs/2010.10499>`__ by Adrian de Wynter and Daniel J. Perry.
9. :doc:`CamemBERT <model_doc/camembert>` (from Inria/Facebook/Sorbonne) released with the paper `CamemBERT: a Tasty
French Language Model <https://arxiv.org/abs/1911.03894>`__ by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz
Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
10. :doc:`ConvBERT <model_doc/convbert>` (from YituTech) released with the paper `ConvBERT: Improving BERT with
10. :doc:`BORT <model_doc/bort>` (from Alexa) released with the paper `Optimal Subarchitecture Extraction For BERT
<https://arxiv.org/abs/2010.10499>`__ by Adrian de Wynter and Daniel J. Perry.
11. :doc:`ByT5 <model_doc/byt5>` (from Google Research) released with the paper `ByT5: Towards a token-free future with
pre-trained byte-to-byte models <https://arxiv.org/abs/2105.13626>`__ by Linting Xue, Aditya Barua, Noah Constant,
Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
12. :doc:`CamemBERT <model_doc/camembert>` (from Inria/Facebook/Sorbonne) released with the paper `CamemBERT: a Tasty
French Language Model <https://arxiv.org/abs/1911.03894>`__ by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz
Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
13. :doc:`CLIP <model_doc/clip>` from (OpenAI) released with the paper `Learning Transferable Visual Models From
Natural Language Supervision <https://arxiv.org/abs/2103.00020>`__ by Alec Radford, Jong Wook Kim, Chris Hallacy,
Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen
Krueger, Ilya Sutskever.
14. :doc:`ConvBERT <model_doc/convbert>` (from YituTech) released with the paper `ConvBERT: Improving BERT with
Span-based Dynamic Convolution <https://arxiv.org/abs/2008.02496>`__ by Zihang Jiang, Weihao Yu, Daquan Zhou,
Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
11. :doc:`CTRL <model_doc/ctrl>` (from Salesforce) released with the paper `CTRL: A Conditional Transformer Language
15. :doc:`CPM <model_doc/cpm>` (from Tsinghua University) released with the paper `CPM: A Large-scale Generative
Chinese Pre-trained Language Model <https://arxiv.org/abs/2012.00413>`__ by Zhengyan Zhang, Xu Han, Hao Zhou, Pei
Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng,
Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang,
Juanzi Li, Xiaoyan Zhu, Maosong Sun.
16. :doc:`CTRL <model_doc/ctrl>` (from Salesforce) released with the paper `CTRL: A Conditional Transformer Language
Model for Controllable Generation <https://arxiv.org/abs/1909.05858>`__ by Nitish Shirish Keskar*, Bryan McCann*,
Lav R. Varshney, Caiming Xiong and Richard Socher.
12. :doc:`DeBERTa <model_doc/deberta>` (from Microsoft Research) released with the paper `DeBERTa: Decoding-enhanced
BERT with Disentangled Attention <https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao,
17. :doc:`DeBERTa <model_doc/deberta>` (from Microsoft) released with the paper `DeBERTa: Decoding-enhanced BERT with
Disentangled Attention <https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu
Chen.
18. :doc:`DeBERTa-v2 <model_doc/deberta_v2>` (from Microsoft) released with the paper `DeBERTa: Decoding-enhanced BERT
with Disentangled Attention <https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao,
Weizhu Chen.
13. :doc:`DialoGPT <model_doc/dialogpt>` (from Microsoft Research) released with the paper `DialoGPT: Large-Scale
19. :doc:`DeiT <model_doc/deit>` (from Facebook) released with the paper `Training data-efficient image transformers &
distillation through attention <https://arxiv.org/abs/2012.12877>`__ by Hugo Touvron, Matthieu Cord, Matthijs
Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
20. :doc:`DETR <model_doc/detr>` (from Facebook) released with the paper `End-to-End Object Detection with Transformers
<https://arxiv.org/abs/2005.12872>`__ by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier,
Alexander Kirillov, Sergey Zagoruyko.
21. :doc:`DialoGPT <model_doc/dialogpt>` (from Microsoft Research) released with the paper `DialoGPT: Large-Scale
Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536>`__ by Yizhe
Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
14. :doc:`DistilBERT <model_doc/distilbert>` (from HuggingFace), released together with the paper `DistilBERT, a
22. :doc:`DistilBERT <model_doc/distilbert>` (from HuggingFace), released together with the paper `DistilBERT, a
distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__ by Victor
Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, RoBERTa into `DistilRoBERTa
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, Multilingual BERT into
`DistilmBERT <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__ and a German
version of DistilBERT.
15. :doc:`DPR <model_doc/dpr>` (from Facebook) released with the paper `Dense Passage Retrieval for Open-Domain
23. :doc:`DPR <model_doc/dpr>` (from Facebook) released with the paper `Dense Passage Retrieval for Open-Domain
Question Answering <https://arxiv.org/abs/2004.04906>`__ by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick
Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
16. :doc:`ELECTRA <model_doc/electra>` (from Google Research/Stanford University) released with the paper `ELECTRA:
24. :doc:`ELECTRA <model_doc/electra>` (from Google Research/Stanford University) released with the paper `ELECTRA:
Pre-training text encoders as discriminators rather than generators <https://arxiv.org/abs/2003.10555>`__ by Kevin
Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
17. :doc:`FlauBERT <model_doc/flaubert>` (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model
25. :doc:`FlauBERT <model_doc/flaubert>` (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model
Pre-training for French <https://arxiv.org/abs/1912.05372>`__ by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne,
Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
18. :doc:`Funnel Transformer <model_doc/funnel>` (from CMU/Google Brain) released with the paper `Funnel-Transformer:
26. :doc:`Funnel Transformer <model_doc/funnel>` (from CMU/Google Brain) released with the paper `Funnel-Transformer:
Filtering out Sequential Redundancy for Efficient Language Processing <https://arxiv.org/abs/2006.03236>`__ by
Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
19. :doc:`GPT <model_doc/gpt>` (from OpenAI) released with the paper `Improving Language Understanding by Generative
27. :doc:`GPT <model_doc/gpt>` (from OpenAI) released with the paper `Improving Language Understanding by Generative
Pre-Training <https://blog.openai.com/language-unsupervised/>`__ by Alec Radford, Karthik Narasimhan, Tim Salimans
and Ilya Sutskever.
20. :doc:`GPT-2 <model_doc/gpt2>` (from OpenAI) released with the paper `Language Models are Unsupervised Multitask
28. :doc:`GPT-2 <model_doc/gpt2>` (from OpenAI) released with the paper `Language Models are Unsupervised Multitask
Learners <https://blog.openai.com/better-language-models/>`__ by Alec Radford*, Jeffrey Wu*, Rewon Child, David
Luan, Dario Amodei** and Ilya Sutskever**.
21. :doc:`LayoutLM <model_doc/layoutlm>` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training
29. :doc:`GPT Neo <model_doc/gpt_neo>` (from EleutherAI) released in the repository `EleutherAI/gpt-neo
<https://github.com/EleutherAI/gpt-neo>`__ by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
30. :doc:`Hubert <model_doc/hubert>` (from Facebook) released with the paper `HuBERT: Self-Supervised Speech
Representation Learning by Masked Prediction of Hidden Units <https://arxiv.org/abs/2106.07447>`__ by Wei-Ning Hsu,
Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
31. :doc:`I-BERT <model_doc/ibert>` (from Berkeley) released with the paper `I-BERT: Integer-only BERT Quantization
<https://arxiv.org/abs/2101.01321>`__ by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
32. :doc:`LayoutLM <model_doc/layoutlm>` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training
of Text and Layout for Document Image Understanding <https://arxiv.org/abs/1912.13318>`__ by Yiheng Xu, Minghao Li,
Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
22. :doc:`LED <model_doc/led>` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer
33. :doc:`LED <model_doc/led>` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer
<https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
23. :doc:`Longformer <model_doc/longformer>` (from AllenAI) released with the paper `Longformer: The Long-Document
34. :doc:`Longformer <model_doc/longformer>` (from AllenAI) released with the paper `Longformer: The Long-Document
Transformer <https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
24. :doc:`LXMERT <model_doc/lxmert>` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality
35. :doc:`LUKE <model_doc/luke>` (from Studio Ousia) released with the paper `LUKE: Deep Contextualized Entity
Representations with Entity-aware Self-attention <https://arxiv.org/abs/2010.01057>`__ by Ikuya Yamada, Akari Asai,
Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
36. :doc:`LXMERT <model_doc/lxmert>` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality
Encoder Representations from Transformers for Open-Domain Question Answering <https://arxiv.org/abs/1908.07490>`__
by Hao Tan and Mohit Bansal.
25. :doc:`MarianMT <model_doc/marian>` Machine translation models trained using `OPUS <http://opus.nlpl.eu/>`__ data by
37. :doc:`M2M100 <model_doc/m2m_100>` (from Facebook) released with the paper `Beyond English-Centric Multilingual
Machine Translation <https://arxiv.org/abs/2010.11125>`__ by by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi
Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman
Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
38. :doc:`MarianMT <model_doc/marian>` Machine translation models trained using `OPUS <http://opus.nlpl.eu/>`__ data by
Jörg Tiedemann. The `Marian Framework <https://marian-nmt.github.io/>`__ is being developed by the Microsoft
Translator Team.
26. :doc:`MBart <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Denoising Pre-training for
39. :doc:`MBart <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Denoising Pre-training for
Neural Machine Translation <https://arxiv.org/abs/2001.08210>`__ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li,
Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
27. :doc:`MPNet <model_doc/mpnet>` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted
40. :doc:`MBart-50 <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Translation with Extensible
Multilingual Pretraining and Finetuning <https://arxiv.org/abs/2008.00401>`__ by Yuqing Tang, Chau Tran, Xian Li,
Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
41. :doc:`Megatron-BERT <model_doc/megatron_bert>` (from NVIDIA) released with the paper `Megatron-LM: Training
Multi-Billion Parameter Language Models Using Model Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad
Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
42. :doc:`Megatron-GPT2 <model_doc/megatron_gpt2>` (from NVIDIA) released with the paper `Megatron-LM: Training
Multi-Billion Parameter Language Models Using Model Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad
Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
43. :doc:`MPNet <model_doc/mpnet>` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted
Pre-training for Language Understanding <https://arxiv.org/abs/2004.09297>`__ by Kaitao Song, Xu Tan, Tao Qin,
Jianfeng Lu, Tie-Yan Liu.
28. :doc:`MT5 <model_doc/mt5>` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained
44. :doc:`MT5 <model_doc/mt5>` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained
text-to-text transformer <https://arxiv.org/abs/2010.11934>`__ by Linting Xue, Noah Constant, Adam Roberts, Mihir
Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
29. :doc:`Pegasus <model_doc/pegasus>` (from Google) released with the paper `PEGASUS: Pre-training with Extracted
45. :doc:`Pegasus <model_doc/pegasus>` (from Google) released with the paper `PEGASUS: Pre-training with Extracted
Gap-sentences for Abstractive Summarization <https://arxiv.org/abs/1912.08777>`__> by Jingqing Zhang, Yao Zhao,
Mohammad Saleh and Peter J. Liu.
30. :doc:`ProphetNet <model_doc/prophetnet>` (from Microsoft Research) released with the paper `ProphetNet: Predicting
46. :doc:`ProphetNet <model_doc/prophetnet>` (from Microsoft Research) released with the paper `ProphetNet: Predicting
Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi,
Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
31. :doc:`Reformer <model_doc/reformer>` (from Google Research) released with the paper `Reformer: The Efficient
47. :doc:`Reformer <model_doc/reformer>` (from Google Research) released with the paper `Reformer: The Efficient
Transformer <https://arxiv.org/abs/2001.04451>`__ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
32. :doc:`RoBERTa <model_doc/roberta>` (from Facebook), released together with the paper a `Robustly Optimized BERT
48. :doc:`RoBERTa <model_doc/roberta>` (from Facebook), released together with the paper a `Robustly Optimized BERT
Pretraining Approach <https://arxiv.org/abs/1907.11692>`__ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar
Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
33. :doc:`SqueezeBert <model_doc/squeezebert>` released with the paper `SqueezeBERT: What can computer vision teach NLP
49. :doc:`RoFormer <model_doc/roformer>` (from ZhuiyiTechnology), released together with the paper a `RoFormer:
Enhanced Transformer with Rotary Position Embedding <https://arxiv.org/pdf/2104.09864v1.pdf>`__ by Jianlin Su and
Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
50. :doc:`SpeechToTextTransformer <model_doc/speech_to_text>` (from Facebook), released together with the paper
`fairseq S2T: Fast Speech-to-Text Modeling with fairseq <https://arxiv.org/abs/2010.05171>`__ by Changhan Wang, Yun
Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
51. :doc:`SqueezeBert <model_doc/squeezebert>` released with the paper `SqueezeBERT: What can computer vision teach NLP
about efficient neural networks? <https://arxiv.org/abs/2006.11316>`__ by Forrest N. Iandola, Albert E. Shaw, Ravi
Krishna, and Kurt W. Keutzer.
34. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
52. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam
Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
35. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
53. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
Pre-training <https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller,
Francesco Piccinno and Julian Martin Eisenschlos.
36. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
54. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*,
Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
37. :doc:`Wav2Vec2 <model_doc/wav2vec2>` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
55. :doc:`Vision Transformer (ViT) <model_doc/vit>` (from Google AI) released with the paper `An Image is Worth 16x16
Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`__ by Alexey Dosovitskiy,
Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias
Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
56. :doc:`VisualBERT <model_doc/visual_bert>` (from UCLA NLP) released with the paper `VisualBERT: A Simple and
Performant Baseline for Vision and Language <https://arxiv.org/pdf/1908.03557>`__ by Liunian Harold Li, Mark
Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
57. :doc:`Wav2Vec2 <model_doc/wav2vec2>` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
Self-Supervised Learning of Speech Representations <https://arxiv.org/abs/2006.11477>`__ by Alexei Baevski, Henry
Zhou, Abdelrahman Mohamed, Michael Auli.
38. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
58. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau.
39. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
59. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan,
Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
40. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
60. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau*, Kartikay
Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke
Zettlemoyer and Veselin Stoyanov.
41. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive
61. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive
Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang*, Zihang Dai*, Yiming
Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
62. :doc:`XLSR-Wav2Vec2 <model_doc/xlsr_wav2vec2>` (from Facebook AI) released with the paper `Unsupervised
Cross-Lingual Representation Learning For Speech Recognition <https://arxiv.org/abs/2006.13979>`__ by Alexis
Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
.. _bigtable:
Supported frameworks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The table below represents the current support in the library for each of those models, whether they have a Python
tokenizer (called "slow"). A "fast" tokenizer backed by the 🤗 Tokenizers library, whether they have support in PyTorch,
TensorFlow and/or Flax.
tokenizer (called "slow"). A "fast" tokenizer backed by the 🤗 Tokenizers library, whether they have support in Jax (via
Flax), PyTorch, and/or TensorFlow.
..
This table is updated automatically from the auto modules with `make fix-copies`. Do not update manually!
@@ -223,29 +304,41 @@ TensorFlow and/or Flax.
+=============================+================+================+=================+====================+==============+
| ALBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| BART | ✅ | ✅ | ✅ | ✅ | |
| BART | ✅ | ✅ | ✅ | ✅ | |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| BERT | ✅ | ✅ | ✅ | ✅ | ✅ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Bert Generation | ✅ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| BigBird | ✅ | ✅ | ✅ | ❌ | ✅ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| BigBirdPegasus | ❌ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Blenderbot | ✅ | ❌ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| BlenderbotSmall | ✅ | ❌ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| CLIP | ✅ | ✅ | ✅ | ❌ | ✅ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| CTRL | ✅ | ❌ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| CamemBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| ConvBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| DETR | ❌ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| DPR | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| DeBERTa | ✅ | | ✅ | ❌ | ❌ |
| DeBERTa | ✅ | | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| DeBERTa-v2 | ✅ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| DeiT | ❌ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| DistilBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| ELECTRA | ✅ | ✅ | ✅ | ✅ | |
| ELECTRA | ✅ | ✅ | ✅ | ✅ | |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Encoder decoder | ❌ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
@@ -255,29 +348,41 @@ TensorFlow and/or Flax.
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Funnel Transformer | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| GPT Neo | ❌ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Hubert | ❌ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| I-BERT | ❌ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| LED | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| LUKE | ✅ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| LXMERT | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| LayoutLM | ✅ | ✅ | ✅ | | ❌ |
| LayoutLM | ✅ | ✅ | ✅ | | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Longformer | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| M2M100 | ✅ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| MPNet | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Marian | ✅ | ❌ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| MegatronBert | ❌ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| MobileBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| OpenAI GPT | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| OpenAI GPT-2 | ✅ | ✅ | ✅ | ✅ | |
| OpenAI GPT-2 | ✅ | ✅ | ✅ | ✅ | |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Pegasus | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| ProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| RAG | ✅ | ❌ | ✅ | | ❌ |
| RAG | ✅ | ❌ | ✅ | | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Reformer | ✅ | ✅ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
@@ -285,15 +390,23 @@ TensorFlow and/or Flax.
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| RoFormer | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Speech2Text | ✅ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| SqueezeBERT | ✅ | ✅ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| T5 | ✅ | ✅ | ✅ | ✅ | |
| T5 | ✅ | ✅ | ✅ | ✅ | |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| TAPAS | ✅ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Transformer-XL | ✅ | ❌ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Wav2Vec2 | | ❌ | ✅ | ❌ | |
| ViT | | ❌ | ✅ | ❌ | |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| VisualBert | ❌ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Wav2Vec2 | ✅ | ❌ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| XLM | ✅ | ❌ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
@@ -335,14 +448,19 @@ TensorFlow and/or Flax.
pretrained_models
examples
troubleshooting
custom_datasets
notebooks
sagemaker
community
converting_tensorflow_models
migration
contributing
add_new_model
fast_tokenizers
performance
testing
debugging
serialization
.. toctree::
@@ -359,6 +477,7 @@ TensorFlow and/or Flax.
main_classes/callback
main_classes/configuration
main_classes/data_collator
main_classes/logging
main_classes/model
main_classes/optimizer_schedules
@@ -367,6 +486,8 @@ TensorFlow and/or Flax.
main_classes/processors
main_classes/tokenizer
main_classes/trainer
main_classes/deepspeed
main_classes/feature_extractor
.. toctree::
:maxdepth: 2
@@ -379,13 +500,22 @@ TensorFlow and/or Flax.
model_doc/bert
model_doc/bertweet
model_doc/bertgeneration
model_doc/bert_japanese
model_doc/bigbird
model_doc/bigbird_pegasus
model_doc/blenderbot
model_doc/blenderbot_small
model_doc/bort
model_doc/byt5
model_doc/camembert
model_doc/clip
model_doc/convbert
model_doc/cpm
model_doc/ctrl
model_doc/deberta
model_doc/deberta_v2
model_doc/deit
model_doc/detr
model_doc/dialogpt
model_doc/distilbert
model_doc/dpr
@@ -395,17 +525,24 @@ TensorFlow and/or Flax.
model_doc/fsmt
model_doc/funnel
model_doc/herbert
model_doc/ibert
model_doc/layoutlm
model_doc/led
model_doc/longformer
model_doc/luke
model_doc/lxmert
model_doc/marian
model_doc/m2m_100
model_doc/mbart
model_doc/megatron_bert
model_doc/megatron_gpt2
model_doc/mobilebert
model_doc/mpnet
model_doc/mt5
model_doc/gpt
model_doc/gpt2
model_doc/gpt_neo
model_doc/hubert
model_doc/pegasus
model_doc/phobert
model_doc/prophetnet
@@ -413,15 +550,20 @@ TensorFlow and/or Flax.
model_doc/reformer
model_doc/retribert
model_doc/roberta
model_doc/roformer
model_doc/speech_to_text
model_doc/squeezebert
model_doc/t5
model_doc/tapas
model_doc/transformerxl
model_doc/vit
model_doc/visual_bert
model_doc/wav2vec2
model_doc/xlm
model_doc/xlmprophetnet
model_doc/xlmroberta
model_doc/xlnet
model_doc/xlsr_wav2vec2
.. toctree::
:maxdepth: 2
@@ -432,3 +574,4 @@ TensorFlow and/or Flax.
internal/tokenization_utils
internal/trainer_utils
internal/generation_utils
internal/file_utils

View File

@@ -107,7 +107,7 @@ This command performs a magical link between the folder you cloned the repositor
```
now this editable install will reside where you clone the folder to, e.g. `~/transformers/` and python will search it too.
Do note that you have to keep that `transformers` folder around and not delete it to continue using the `transfomers` library.
Do note that you have to keep that `transformers` folder around and not delete it to continue using the `transformers` library.
Now, let's get to the real benefit of this installation approach. Say, you saw some new feature has been just committed into `master`. If you have already performed all the steps above, to update your transformers to include all the latest commits, all you need to do is to `cd` into that cloned repository folder and update the clone to the latest version:
@@ -149,11 +149,42 @@ So if you don't have any specific environment variable set, the cache directory
(``PYTORCH_TRANSFORMERS_CACHE`` or ``PYTORCH_PRETRAINED_BERT_CACHE``), those will be used if there is no shell
environment variable for ``TRANSFORMERS_CACHE``.
### Note on model downloads (Continuous Integration or large-scale deployments)
### Offline mode
If you expect to be downloading large volumes of models (more than 1,000) from our hosted bucket (for instance through
your CI setup, or a large-scale production deployment), please cache the model files on your end. It will be way
faster, and cheaper. Feel free to contact us privately if you need any help.
It's possible to run 🤗 Transformers in a firewalled or a no-network environment.
Setting environment variable `TRANSFORMERS_OFFLINE=1` will tell 🤗 Transformers to use local files only and will not try to look things up.
Most likely you may want to couple this with `HF_DATASETS_OFFLINE=1` that performs the same for 🤗 Datasets if you're using the latter.
Here is an example of how this can be used on a filesystem that is shared between a normally networked and a firewalled to the external world instances.
On the instance with the normal network run your program which will download and cache models (and optionally datasets if you use 🤗 Datasets). For example:
```
python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
```
and then with the same filesystem you can now run the same program on a firewalled instance:
```
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
```
and it should succeed without any hanging waiting to timeout.
#### Fetching models and tokenizers to use offline
When running a script the first time like mentioned above, the downloaded files will be cached for future reuse.
However, it is also possible to download files and point to their local path instead.
Downloading files can be done through the Web Interface by clicking on the "Download" button, but it can also be handled
programmatically using the `huggingface_hub` library that is a dependency to `transformers`:
- Using `snapshot_download` to download an entire repository
- Using `hf_hub_download` to download a specific file
See the reference for these methods in the huggingface_hub
[documentation](https://github.com/huggingface/huggingface_hub/tree/main/src/huggingface_hub).
## Do you want to run a Transformer model on a mobile device?

View File

@@ -0,0 +1,54 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
General Utilities
-----------------------------------------------------------------------------------------------------------------------
This page lists all of Transformers general utility functions that are found in the file ``file_utils.py``.
Most of those are only useful if you are studying the general code in the library.
Enums and namedtuples
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.file_utils.ExplicitEnum
.. autoclass:: transformers.file_utils.PaddingStrategy
.. autoclass:: transformers.file_utils.TensorType
Special Decorators
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.file_utils.add_start_docstrings
.. autofunction:: transformers.file_utils.add_start_docstrings_to_model_forward
.. autofunction:: transformers.file_utils.add_end_docstrings
.. autofunction:: transformers.file_utils.add_code_sample_docstrings
.. autofunction:: transformers.file_utils.replace_return_docstrings
Special Properties
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.file_utils.cached_property
Other Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.file_utils._BaseLazyModule

View File

@@ -13,19 +13,21 @@
Utilities for Generation
-----------------------------------------------------------------------------------------------------------------------
This page lists all the utility functions used by :meth:`~transformers.PreTrainedModel.generate`,
:meth:`~transformers.PreTrainedModel.greedy_search`, :meth:`~transformers.PreTrainedModel.sample`,
:meth:`~transformers.PreTrainedModel.beam_search`, :meth:`~transformers.PreTrainedModel.beam_sample`, and
:meth:`~transformers.PreTrainedModel.group_beam_search`.
This page lists all the utility functions used by :meth:`~transformers.generation_utils.GenerationMixin.generate`,
:meth:`~transformers.generation_utils.GenerationMixin.greedy_search`,
:meth:`~transformers.generation_utils.GenerationMixin.sample`,
:meth:`~transformers.generation_utils.GenerationMixin.beam_search`,
:meth:`~transformers.generation_utils.GenerationMixin.beam_sample`, and
:meth:`~transformers.generation_utils.GenerationMixin.group_beam_search`.
Most of those are only useful if you are studying the code of the generate methods in the library.
Generate Outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The output of :meth:`~transformers.PreTrainedModel.generate` is an instance of a subclass of
The output of :meth:`~transformers.generation_utils.GenerationMixin.generate` is an instance of a subclass of
:class:`~transformers.file_utils.ModelOutput`. This output is a data structure containing all the information returned
by :meth:`~transformers.PreTrainedModel.generate`, but that can also be used as tuple or dictionary.
by :meth:`~transformers.generation_utils.GenerationMixin.generate`, but that can also be used as tuple or dictionary.
Here's an example:
@@ -78,6 +80,9 @@ GreedySearchOutput
.. autoclass:: transformers.generation_utils.GreedySearchEncoderDecoderOutput
:members:
.. autoclass:: transformers.generation_flax_utils.FlaxGreedySearchOutput
:members:
SampleOutput
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -88,6 +93,9 @@ SampleOutput
.. autoclass:: transformers.generation_utils.SampleEncoderDecoderOutput
:members:
.. autoclass:: transformers.generation_flax_utils.FlaxSampleOutput
:members:
BeamSearchOutput
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -151,6 +159,60 @@ generation.
.. autoclass:: transformers.HammingDiversityLogitsProcessor
:members: __call__
.. autoclass:: transformers.ForcedBOSTokenLogitsProcessor
:members: __call__
.. autoclass:: transformers.ForcedEOSTokenLogitsProcessor
:members: __call__
.. autoclass:: transformers.InfNanRemoveLogitsProcessor
:members: __call__
.. autoclass:: transformers.FlaxLogitsProcessor
:members: __call__
.. autoclass:: transformers.FlaxLogitsProcessorList
:members: __call__
.. autoclass:: transformers.FlaxLogitsWarper
:members: __call__
.. autoclass:: transformers.FlaxTemperatureLogitsWarper
:members: __call__
.. autoclass:: transformers.FlaxTopPLogitsWarper
:members: __call__
.. autoclass:: transformers.FlaxTopKLogitsWarper
:members: __call__
.. autoclass:: transformers.FlaxForcedBOSTokenLogitsProcessor
:members: __call__
.. autoclass:: transformers.FlaxForcedEOSTokenLogitsProcessor
:members: __call__
.. autoclass:: transformers.FlaxMinLengthLogitsProcessor
:members: __call__
StoppingCriteria
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A :class:`~transformers.StoppingCriteria` can be used to change when to stop generation (other than EOS token).
.. autoclass:: transformers.StoppingCriteria
:members: __call__
.. autoclass:: transformers.StoppingCriteriaList
:members: __call__
.. autoclass:: transformers.MaxLengthCriteria
:members: __call__
.. autoclass:: transformers.MaxTimeCriteria
:members: __call__
BeamSearch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@@ -47,6 +47,4 @@ Data format
Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.pipelines.get_framework
.. autoclass:: transformers.pipelines.PipelineException

View File

@@ -38,12 +38,6 @@ SpecialTokensMixin
Enums and namedtuples
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.tokenization_utils_base.ExplicitEnum
.. autoclass:: transformers.tokenization_utils_base.PaddingStrategy
.. autoclass:: transformers.tokenization_utils_base.TensorType
.. autoclass:: transformers.tokenization_utils_base.TruncationStrategy
.. autoclass:: transformers.tokenization_utils_base.CharSpan

View File

@@ -1,4 +1,4 @@
..
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
@@ -22,7 +22,7 @@ Utilities
.. autoclass:: transformers.EvalPrediction
.. autoclass:: transformers.EvaluationStrategy
.. autoclass:: transformers.IntervalStrategy
.. autofunction:: transformers.set_seed
@@ -46,3 +46,9 @@ Distributed Evaluation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.HfArgumentParser
Debug Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.debug_utils.DebugUnderflowOverflow

View File

@@ -74,6 +74,32 @@ TrainerCallback
.. autoclass:: transformers.TrainerCallback
:members:
Here is an example of how to register a custom callback with the PyTorch :class:`~transformers.Trainer`:
.. code-block:: python
class MyCallback(TrainerCallback):
"A callback that prints a message at the beginning of training"
def on_train_begin(self, args, state, control, **kwargs):
print("Starting training")
trainer = Trainer(
model,
args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
callbacks=[MyCallback] # We can either pass the callback class this way or an instance of it (MyCallback())
)
Another way to register a callback is to call ``trainer.add_callback()`` as follows:
.. code-block:: python
trainer = Trainer(...)
trainer.add_callback(MyCallback)
# Alternatively, we can pass an instance of the callback class
trainer.add_callback(MyCallback())
TrainerState
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@@ -0,0 +1,71 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Data Collator
-----------------------------------------------------------------------------------------------------------------------
Data collators are objects that will form a batch by using a list of dataset elements as input. These elements are of
the same type as the elements of :obj:`train_dataset` or :obj:`eval_dataset`.
To be able to build batches, data collators may apply some processing (like padding). Some of them (like
:class:`~transformers.DataCollatorForLanguageModeling`) also apply some random data augmentation (like random masking)
oin the formed batch.
Examples of use can be found in the :doc:`example scripts <../examples>` or :doc:`example notebooks <../notebooks>`.
Default data collator
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.data.data_collator.default_data_collator
DataCollatorWithPadding
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorWithPadding
:members:
DataCollatorForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForTokenClassification
:members:
DataCollatorForSeq2Seq
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForSeq2Seq
:members:
DataCollatorForLanguageModeling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForLanguageModeling
:members: mask_tokens
DataCollatorForWholeWordMask
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForWholeWordMask
:members: mask_tokens
DataCollatorForPermutationLanguageModeling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForPermutationLanguageModeling
:members: mask_tokens

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,48 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Feature Extractor
-----------------------------------------------------------------------------------------------------------------------
A feature extractor is in charge of preparing input features for a multi-modal model. This includes feature extraction
from sequences, *e.g.*, pre-processing audio files to Log-Mel Spectrogram features, feature extraction from images
*e.g.* cropping image image files, but also padding, normalization, and conversion to Numpy, PyTorch, and TensorFlow
tensors.
FeatureExtractionMixin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.feature_extraction_utils.FeatureExtractionMixin
:members: from_pretrained, save_pretrained
SequenceFeatureExtractor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SequenceFeatureExtractor
:members: pad
BatchFeature
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BatchFeature
:members:
ImageFeatureExtractionMixin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.image_utils.ImageFeatureExtractionMixin
:members:

View File

@@ -65,6 +65,10 @@ Other functions
.. autofunction:: transformers.logging.get_logger
.. autofunction:: transformers.logging.enable_default_handler
.. autofunction:: transformers.logging.disable_default_handler
.. autofunction:: transformers.logging.enable_explicit_format
.. autofunction:: transformers.logging.reset_format

View File

@@ -26,8 +26,9 @@ are common among all the models to:
The other methods that are common to each model are defined in :class:`~transformers.modeling_utils.ModuleUtilsMixin`
(for the PyTorch models) and :class:`~transformers.modeling_tf_utils.TFModuleUtilsMixin` (for the TensorFlow models) or
for text generation, :class:`~transformers.generation_utils.GenerationMixin` (for the PyTorch models) and
:class:`~transformers.generation_tf_utils.TFGenerationMixin` (for the TensorFlow models)
for text generation, :class:`~transformers.generation_utils.GenerationMixin` (for the PyTorch models),
:class:`~transformers.generation_tf_utils.TFGenerationMixin` (for the TensorFlow models) and
:class:`~transformers.generation_flax_utils.FlaxGenerationMixin` (for the Flax/JAX models).
PreTrainedModel
@@ -73,3 +74,13 @@ Generation
.. autoclass:: transformers.generation_tf_utils.TFGenerationMixin
:members:
.. autoclass:: transformers.generation_flax_utils.FlaxGenerationMixin
:members:
Pushing to the Hub
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.file_utils.PushToHubMixin
:members:

View File

@@ -13,8 +13,8 @@
Model outputs
-----------------------------------------------------------------------------------------------------------------------
PyTorch models have outputs that are instances of subclasses of :class:`~transformers.file_utils.ModelOutput`. Those
are data structures containing all the information returned by the model, but that can also be used as tuples or
All models have outputs that are instances of subclasses of :class:`~transformers.file_utils.ModelOutput`. Those are
data structures containing all the information returned by the model, but that can also be used as tuples or
dictionaries.
Let's see of this looks on an example:
@@ -60,7 +60,7 @@ ModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.file_utils.ModelOutput
:members:
:members: to_tuple
BaseModelOutput

View File

@@ -23,9 +23,11 @@ There are two categories of pipeline abstractions to be aware about:
- The :func:`~transformers.pipeline` which is the most powerful object encapsulating all other pipelines.
- The other task-specific pipelines:
- :class:`~transformers.AutomaticSpeechRecognitionPipeline`
- :class:`~transformers.ConversationalPipeline`
- :class:`~transformers.FeatureExtractionPipeline`
- :class:`~transformers.FillMaskPipeline`
- :class:`~transformers.ImageClassificationPipeline`
- :class:`~transformers.QuestionAnsweringPipeline`
- :class:`~transformers.SummarizationPipeline`
- :class:`~transformers.TextClassificationPipeline`
@@ -48,6 +50,13 @@ pipeline but requires an additional argument which is the `task`.
The task specific pipelines
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
AutomaticSpeechRecognitionPipeline
=======================================================================================================================
.. autoclass:: transformers.AutomaticSpeechRecognitionPipeline
:special-members: __call__
:members:
ConversationalPipeline
=======================================================================================================================
@@ -71,6 +80,13 @@ FillMaskPipeline
:special-members: __call__
:members:
ImageClassificationPipeline
=======================================================================================================================
.. autoclass:: transformers.ImageClassificationPipeline
:special-members: __call__
:members:
NerPipeline
=======================================================================================================================

View File

@@ -68,8 +68,8 @@ Additionally, the following method can be used to load values from a data file a
Example usage
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
An example using these processors is given in the `run_glue.py
<https://github.com/huggingface/pytorch-transformers/blob/master/examples/text-classification/run_glue.py>`__ script.
An example using these processors is given in the :prefix_link:`run_glue.py
<examples/legacy/text-classification/run_glue.py>` script.
XNLI
@@ -89,8 +89,8 @@ This library hosts the processor to load the XNLI data:
Please note that since the gold labels are available on the test set, evaluation is performed on the test set.
An example using these processors is given in the `run_xnli.py
<https://github.com/huggingface/pytorch-transformers/blob/master/examples/text-classification/run_xnli.py>`__ script.
An example using these processors is given in the :prefix_link:`run_xnli.py
<examples/legacy/text-classification/run_xnli.py>` script.
SQuAD
@@ -169,4 +169,4 @@ Using `tensorflow_datasets` is as easy as using a data file:
Another example using these processors is given in the :prefix_link:`run_squad.py
<examples/question-answering/run_squad.py>` script.
<examples/legacy/question-answering/run_squad.py>` script.

View File

@@ -54,19 +54,24 @@ PreTrainedTokenizer
.. autoclass:: transformers.PreTrainedTokenizer
:special-members: __call__
:members:
.. automethod:: encode
:members: batch_decode, convert_ids_to_tokens, convert_tokens_to_ids, convert_tokens_to_string, decode, encode,
get_added_vocab, get_special_tokens_mask, num_special_tokens_to_add, prepare_for_tokenization, tokenize,
vocab_size
PreTrainedTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The :class:`~transformers.PreTrainedTokenizerFast` depend on the `tokenizers
<https://huggingface.co/docs/tokenizers>`__ library. The tokenizers obtained from the 🤗 tokenizers library can be
loaded very simply into 🤗 transformers. Take a look at the :doc:`Using tokenizers from 🤗 tokenizers
<../fast_tokenizers>` page to understand how this is done.
.. autoclass:: transformers.PreTrainedTokenizerFast
:special-members: __call__
:members:
.. automethod:: encode
:members: batch_decode, convert_ids_to_tokens, convert_tokens_to_ids, convert_tokens_to_string, decode, encode,
get_added_vocab, get_special_tokens_mask, num_special_tokens_to_add,
set_truncation_and_padding,tokenize, vocab_size
BatchEncoding

View File

@@ -21,17 +21,20 @@ Before instantiating your :class:`~transformers.Trainer`/:class:`~transformers.T
customization during training.
The API supports distributed training on multiple GPUs/TPUs, mixed precision through `NVIDIA Apex
<https://github.com/NVIDIA/apex>`__ for PyTorch and :obj:`tf.keras.mixed_precision` for TensorFlow.
<https://github.com/NVIDIA/apex>`__ and Native AMP for PyTorch and :obj:`tf.keras.mixed_precision` for TensorFlow.
Both :class:`~transformers.Trainer` and :class:`~transformers.TFTrainer` contain the basic training loop supporting the
previous features. To inject custom behavior you can subclass them and override the following methods:
Both :class:`~transformers.Trainer` and :class:`~transformers.TFTrainer` contain the basic training loop which supports
the above features. To inject custom behavior you can subclass them and override the following methods:
- **get_train_dataloader**/**get_train_tfdataset** -- Creates the training DataLoader (PyTorch) or TF Dataset.
- **get_eval_dataloader**/**get_eval_tfdataset** -- Creates the evaluation DataLoader (PyTorch) or TF Dataset.
- **get_test_dataloader**/**get_test_tfdataset** -- Creates the test DataLoader (PyTorch) or TF Dataset.
- **log** -- Logs information on the various objects watching training.
- **create_optimizer_and_scheduler** -- Setups the optimizer and learning rate scheduler if they were not passed at
init.
- **create_optimizer_and_scheduler** -- Sets up the optimizer and learning rate scheduler if they were not passed at
init. Note, that you can also subclass or override the ``create_optimizer`` and ``create_scheduler`` methods
separately.
- **create_optimizer** -- Sets up the optimizer if it wasn't passed at init.
- **create_scheduler** -- Sets up the learning rate scheduler if it wasn't passed at init.
- **compute_loss** - Computes the loss on a batch of training inputs.
- **training_step** -- Performs a training step.
- **prediction_step** -- Performs an evaluation/test step.
@@ -39,17 +42,35 @@ previous features. To inject custom behavior you can subclass them and override
- **evaluate** -- Runs an evaluation loop and returns metrics.
- **predict** -- Returns predictions (with metrics if labels are available) on a test set.
Here is an example of how to customize :class:`~transformers.Trainer` using a custom loss function:
.. warning::
The :class:`~transformers.Trainer` class is optimized for 🤗 Transformers models and can have surprising behaviors
when you use it on other models. When using it on your own model, make sure:
- your model always return tuples or subclasses of :class:`~transformers.file_utils.ModelOutput`.
- your model can compute the loss if a :obj:`labels` argument is provided and that loss is returned as the first
element of the tuple (if your model returns tuples)
- your model can accept multiple label arguments (use the :obj:`label_names` in your
:class:`~transformers.TrainingArguments` to indicate their name to the :class:`~transformers.Trainer`) but none
of them should be named :obj:`"label"`.
Here is an example of how to customize :class:`~transformers.Trainer` using a custom loss function for multi-label
classification:
.. code-block:: python
from torch import nn
from transformers import Trainer
class MyTrainer(Trainer):
def compute_loss(self, model, inputs):
class MultilabelTrainer(Trainer):
def compute_loss(self, model, inputs, return_outputs=False):
labels = inputs.pop("labels")
outputs = model(**inputs)
logits = outputs[0]
return my_custom_loss(logits, labels)
logits = outputs.logits
loss_fct = nn.BCEWithLogitsLoss()
loss = loss_fct(logits.view(-1, self.model.config.num_labels),
labels.float().view(-1, self.model.config.num_labels))
return (loss, outputs) if return_outputs else loss
Another way to customize the training loop behavior for the PyTorch :class:`~transformers.Trainer` is to use
:doc:`callbacks <callback>` that can inspect the training loop state (for progress reporting, logging on TensorBoard or
@@ -98,6 +119,88 @@ TFTrainingArguments
:members:
Logging
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
By default :class:`~transformers.Trainer` will use ``logging.INFO`` for the main process and ``logging.WARNING`` for
the replicas if any.
These defaults can be overridden to use any of the 5 ``logging`` levels with :class:`~transformers.TrainingArguments`'s
arguments:
- ``log_level`` - for the main process
- ``log_level_replica`` - for the replicas
Further, if :class:`~transformers.TrainingArguments`'s ``log_on_each_node`` is set to ``False`` only the main node will
use the log level settings for its main process, all other nodes will use the log level settings for replicas.
Note that :class:`~transformers.Trainer` is going to set ``transformers``'s log level separately for each node in its
:meth:`~transformers.Trainer.__init__`. So you may want to set this sooner (see the next example) if you tap into other
``transformers`` functionality before creating the :class:`~transformers.Trainer` object.
Here is an example of how this can be used in an application:
.. code-block:: python
[...]
logger = logging.getLogger(__name__)
# Setup logging
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
handlers=[logging.StreamHandler(sys.stdout)],
)
# set the main code and the modules it uses to the same log-level according to the node
log_level = training_args.get_process_log_level()
logger.setLevel(log_level)
datasets.utils.logging.set_verbosity(log_level)
transformers.utils.logging.set_verbosity(log_level)
trainer = Trainer(...)
And then if you only want to see warnings on the main node and all other nodes to not print any most likely duplicated
warnings you could run it as:
.. code-block:: bash
my_app.py ... --log_level warning --log_level_replica error
In the multi-node environment if you also don't want the logs to repeat for each node's main process, you will want to
change the above to:
.. code-block:: bash
my_app.py ... --log_level warning --log_level_replica error --log_on_each_node 0
and then only the main process of the first node will log at the "warning" level, and all other processes on the main
node and all processes on other nodes will log at the "error" level.
If you need your application to be as quiet as possible you could do:
.. code-block:: bash
my_app.py ... --log_level error --log_level_replica error --log_on_each_node 0
(add ``--log_on_each_node 0`` if on multi-node environment)
Randomness
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When resuming from a checkpoint generated by :class:`~transformers.Trainer` all efforts are made to restore the
`python`, `numpy` and `pytorch` RNG states to the same states as they were at the moment of saving that checkpoint,
which should make the "stop and resume" style of training as close as possible to non-stop training.
However, due to various default non-deterministic pytorch settings this might not fully work. If you want full
determinism please refer to `Controlling sources of randomness
<https://pytorch.org/docs/stable/notes/randomness.html>`__. As explained in the document, that some of those settings
that make things determinstic (.e.g., ``torch.backends.cudnn.deterministic``) may slow things down, therefore this
can't be done by default, but you can enable those yourself if needed.
Trainer Integrations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -113,7 +216,9 @@ Toward Training Trillion Parameter Models, by Samyam Rajbhandari, Jeff Rasley, O
This provided support is new and experimental as of this writing.
Installation Notes
.. _zero-install-notes:
CUDA Extension Installation Notes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As of this writing, both FairScale and Deepspeed require compilation of CUDA C++ code, before they can be used.
@@ -135,7 +240,8 @@ please, read the following notes first.
In these notes we give examples for what to do when ``pytorch`` has been built with CUDA ``10.2``. If your situation is
different remember to adjust the version number to the one you are after.
**Possible problem #1:**
Possible problem #1
=======================================================================================================================
While, Pytorch comes with its own CUDA toolkit, to build these two projects you must have an identical version of CUDA
installed system-wide.
@@ -155,7 +261,8 @@ If you don't have CUDA installed system-wide, install it first. You will find th
search engine. For example, if you're on Ubuntu you may want to search for: `ubuntu cuda 10.2 install
<https://www.google.com/search?q=ubuntu+cuda+10.2+install>`__.
**Possible problem #2:**
Possible problem #2
=======================================================================================================================
Another possible common problem is that you may have more than one CUDA toolkit installed system-wide. For example you
may have:
@@ -201,7 +308,8 @@ exist. ``lib64`` sub-directory is where the various CUDA ``.so`` objects, like `
that your system will have it named differently, but if it is adjust it to reflect your reality.
**Possible problem #3:**
Possible problem #3
=======================================================================================================================
Some older CUDA versions may refuse to build with newer compilers. For example, you my have ``gcc-9`` but it wants
``gcc-7``.
@@ -226,13 +334,6 @@ should find ``gcc-7`` (and ``g++7``) and then the build will succeed.
As always make sure to edit the paths in the example to match your situation.
**If still unsuccessful:**
If after addressing these you still encounter build issues, please, proceed with the GitHub Issue of `FairScale
<https://github.com/facebookresearch/fairscale/issues>`__ and `Deepspeed
<https://github.com/microsoft/DeepSpeed/issues>`__, depending on the project you have the problem with.
FairScale
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -241,482 +342,268 @@ provides support for the following features from `the ZeRO paper <https://arxiv.
1. Optimizer State Sharding
2. Gradient Sharding
3. Model Parameters Sharding (new and very experimental)
4. CPU offload (new and very experimental)
You will need at least two GPUs to use this feature.
To deploy this feature:
1. Install the library via pypi:
**Installation**:
.. code-block:: bash
pip install fairscale
or find more details on `the FairScale's GitHub page
<https://github.com/facebookresearch/fairscale/#installation>`__.
2. Add ``--sharded_ddp`` to the command line arguments, and make sure you have added the distributed launcher ``-m
torch.distributed.launch --nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE`` if you haven't been using it already.
For example here is how you could use it for ``finetune_trainer.py`` with 2 GPUs:
Install the library via pypi:
.. code-block:: bash
cd examples/seq2seq
python -m torch.distributed.launch --nproc_per_node=2 ./finetune_trainer.py \
--model_name_or_path sshleifer/distill-mbart-en-ro-12-4 --data_dir wmt_en_ro \
pip install fairscale
or via ``transformers``' ``extras``:
.. code-block:: bash
pip install transformers[fairscale]
(will become available starting from ``transformers==4.6.0``)
or find more details on `the FairScale's GitHub page <https://github.com/facebookresearch/fairscale/#installation>`__.
If you're still struggling with the build, first make sure to read :ref:`zero-install-notes`.
If it's still not resolved the build issue, here are a few more ideas.
``fairscale`` seems to have an issue with the recently introduced by pip build isolation feature. If you have a problem
with it, you may want to try one of:
.. code-block:: bash
pip install fairscale --no-build-isolation .
or:
.. code-block:: bash
git clone https://github.com/facebookresearch/fairscale/
cd fairscale
rm -r dist build
python setup.py bdist_wheel
pip uninstall -y fairscale
pip install dist/fairscale-*.whl
``fairscale`` also has issues with building against pytorch-nightly, so if you use it you may have to try one of:
.. code-block:: bash
pip uninstall -y fairscale; pip install fairscale --pre \
-f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html \
--no-cache --no-build-isolation
or:
.. code-block:: bash
pip install -v --disable-pip-version-check . \
-f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html --pre
Of course, adjust the urls to match the cuda version you use.
If after trying everything suggested you still encounter build issues, please, proceed with the GitHub Issue of
`FairScale <https://github.com/facebookresearch/fairscale/issues>`__.
**Usage**:
To use the first version of Sharded data-parallelism, add ``--sharded_ddp simple`` to the command line arguments, and
make sure you have added the distributed launcher ``-m torch.distributed.launch
--nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE`` if you haven't been using it already.
For example here is how you could use it for ``run_translation.py`` with 2 GPUs:
.. code-block:: bash
python -m torch.distributed.launch --nproc_per_node=2 examples/pytorch/translation/run_translation.py \
--model_name_or_path t5-small --per_device_train_batch_size 1 \
--output_dir output_dir --overwrite_output_dir \
--do_train --n_train 500 --num_train_epochs 1 \
--per_device_train_batch_size 1 --freeze_embeds \
--src_lang en_XX --tgt_lang ro_RO --task translation \
--fp16 --sharded_ddp
--do_train --max_train_samples 500 --num_train_epochs 1 \
--dataset_name wmt16 --dataset_config "ro-en" \
--source_lang en --target_lang ro \
--fp16 --sharded_ddp simple
Notes:
- This feature requires distributed training (so multiple GPUs).
- It is not implemented for TPUs.
- It works with ``--fp16`` too, to make things even faster.
- One of the main benefits of enabling ``--sharded_ddp`` is that it uses a lot less GPU memory, so you should be able
to use significantly larger batch sizes using the same hardware (e.g. 3x and even bigger) which should lead to
- One of the main benefits of enabling ``--sharded_ddp simple`` is that it uses a lot less GPU memory, so you should be
able to use significantly larger batch sizes using the same hardware (e.g. 3x and even bigger) which should lead to
significantly shorter training time.
3. To use the second version of Sharded data-parallelism, add ``--sharded_ddp zero_dp_2`` or ``--sharded_ddp
zero_dp_3`` to the command line arguments, and make sure you have added the distributed launcher ``-m
torch.distributed.launch --nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE`` if you haven't been using it already.
For example here is how you could use it for ``run_translation.py`` with 2 GPUs:
.. code-block:: bash
python -m torch.distributed.launch --nproc_per_node=2 examples/pytorch/translation/run_translation.py \
--model_name_or_path t5-small --per_device_train_batch_size 1 \
--output_dir output_dir --overwrite_output_dir \
--do_train --max_train_samples 500 --num_train_epochs 1 \
--dataset_name wmt16 --dataset_config "ro-en" \
--source_lang en --target_lang ro \
--fp16 --sharded_ddp zero_dp_2
:obj:`zero_dp_2` is an optimized version of the simple wrapper, while :obj:`zero_dp_3` fully shards model weights,
gradients and optimizer states.
Both are compatible with adding :obj:`cpu_offload` to enable ZeRO-offload (activate it like this: :obj:`--sharded_ddp
"zero_dp_2 cpu_offload"`).
Notes:
- This feature requires distributed training (so multiple GPUs).
- It is not implemented for TPUs.
- It works with ``--fp16`` too, to make things even faster.
- The ``cpu_offload`` additional option requires ``--fp16``.
- This is an area of active development, so make sure you have a source install of fairscale to use this feature as
some bugs you encounter may have been fixed there already.
Known caveats:
- This feature is incompatible with :obj:`--predict_with_generate` in the `run_translation.py` script.
- Using :obj:`--sharded_ddp zero_dp_3` requires wrapping each layer of the model in the special container
:obj:`FullyShardedDataParallelism` of fairscale. It should be used with the option :obj:`auto_wrap` if you are not
doing this yourself: :obj:`--sharded_ddp "zero_dp_3 auto_wrap"`.
DeepSpeed
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
`DeepSpeed <https://github.com/microsoft/DeepSpeed>`__ implements everything described in the `ZeRO paper
<https://arxiv.org/abs/1910.02054>`__, except ZeRO's stage 3. "Parameter Partitioning (Pos+g+p)". Currently it provides
full support for:
1. Optimizer State Partitioning (ZeRO stage 1)
2. Add Gradient Partitioning (ZeRO stage 2)
Moved to :ref:`deepspeed-trainer-integration`.
Installation
=======================================================================================================================
Install the library via pypi:
Moved to :ref:`deepspeed-installation`.
.. code-block:: bash
pip install deepspeed
or find more details on `the DeepSpeed's GitHub page <https://github.com/microsoft/deepspeed#installation>`__.
Deployment with multiple GPUs
=======================================================================================================================
To deploy this feature with multiple GPUs adjust the :class:`~transformers.Trainer` command line arguments as
following:
1. replace ``python -m torch.distributed.launch`` with ``deepspeed``.
2. add a new argument ``--deepspeed ds_config.json``, where ``ds_config.json`` is the DeepSpeed configuration file as
documented `here <https://www.deepspeed.ai/docs/config-json/>`__. The file naming is up to you.
Therefore, if your original command line looked as following:
.. code-block:: bash
python -m torch.distributed.launch --nproc_per_node=2 your_program.py <normal cl args>
Now it should be:
.. code-block:: bash
deepspeed --num_gpus=2 your_program.py <normal cl args> --deepspeed ds_config.json
Unlike, ``torch.distributed.launch`` where you have to specify how many GPUs to use with ``--nproc_per_node``, with the
``deepspeed`` launcher you don't have to use the corresponding ``--num_gpus`` if you want all of your GPUs used. The
full details on how to configure various nodes and GPUs can be found `here
<https://www.deepspeed.ai/getting-started/#resource-configuration-multi-node>`__.
Here is an example of running ``finetune_trainer.py`` under DeepSpeed deploying all available GPUs:
.. code-block:: bash
cd examples/seq2seq
deepspeed ./finetune_trainer.py --deepspeed ds_config.json \
--model_name_or_path sshleifer/distill-mbart-en-ro-12-4 --data_dir wmt_en_ro \
--output_dir output_dir --overwrite_output_dir \
--do_train --n_train 500 --num_train_epochs 1 \
--per_device_train_batch_size 1 --freeze_embeds \
--src_lang en_XX --tgt_lang ro_RO --task translation
Note that in the DeepSpeed documentation you are likely to see ``--deepspeed --deepspeed_config ds_config.json`` - i.e.
two DeepSpeed-related arguments, but for the sake of simplicity, and since there are already so many arguments to deal
with, we combined the two into a single argument.
For some practical usage examples, please, see this `post
<https://github.com/huggingface/transformers/issues/8771#issuecomment-759248400>`__.
Moved to :ref:`deepspeed-multi-gpu`.
Deployment with one GPU
=======================================================================================================================
To deploy DeepSpeed with one GPU adjust the :class:`~transformers.Trainer` command line arguments as following:
Moved to :ref:`deepspeed-one-gpu`.
.. code-block:: bash
cd examples/seq2seq
deepspeed --num_gpus=1 ./finetune_trainer.py --deepspeed ds_config.json \
--model_name_or_path sshleifer/distill-mbart-en-ro-12-4 --data_dir wmt_en_ro \
--output_dir output_dir --overwrite_output_dir \
--do_train --n_train 500 --num_train_epochs 1 \
--per_device_train_batch_size 1 --freeze_embeds \
--src_lang en_XX --tgt_lang ro_RO --task translation
Deployment in Notebooks
=======================================================================================================================
This is almost the same as with multiple-GPUs, but here we tell DeepSpeed explicitly to use just one GPU. By default,
DeepSpeed deploys all GPUs it can see. If you have only 1 GPU to start with, then you don't need this argument. The
following `documentation <https://www.deepspeed.ai/getting-started/#resource-configuration-multi-node>`__ discusses the
launcher options.
Moved to :ref:`deepspeed-notebook`.
Why would you want to use DeepSpeed with just one GPU?
1. It has a ZeRO-offload feature which can delegate some computations and memory to the host's CPU and RAM, and thus
leave more GPU resources for model's needs - e.g. larger batch size, or enabling a fitting of a very big model which
normally won't fit.
2. It provides a smart GPU memory management system, that minimizes memory fragmentation, which again allows you to fit
bigger models and data batches.
While we are going to discuss the configuration in details next, the key to getting a huge improvement on a single GPU
with DeepSpeed is to have at least the following configuration in the configuration file:
.. code-block:: json
{
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 2e8,
"reduce_scatter": true,
"reduce_bucket_size": 2e8,
"overlap_comm": true,
"contiguous_gradients": true,
"cpu_offload": true
},
}
which enables ``cpu_offload`` and some other important features. You may experiment with the buffer sizes, you will
find more details in the discussion below.
For a practical usage example of this type of deployment, please, see this `post
<https://github.com/huggingface/transformers/issues/8771#issuecomment-759176685>`__.
Configuration
=======================================================================================================================
For the complete guide to the DeepSpeed configuration options that can be used in its configuration file please refer
to the `following documentation <https://www.deepspeed.ai/docs/config-json/>`__.
Moved to :ref:`deepspeed-config`.
While you always have to supply the DeepSpeed configuration file, you can configure the DeepSpeed integration in
several ways:
1. Supply most of the configuration inside the file, and just use a few required command line arguments. This is the
recommended way as it puts most of the configuration params in one place.
2. Supply just the ZeRO configuration params inside the file, and configure the rest using the normal
:class:`~transformers.Trainer` command line arguments.
3. Any variation of the first two ways.
Passing Configuration
=======================================================================================================================
To get an idea of what DeepSpeed configuration file looks like, here is one that activates ZeRO stage 2 features,
enables FP16, uses AdamW optimizer and WarmupLR scheduler:
Moved to :ref:`deepspeed-config-passing`.
.. code-block:: json
{
"fp16": {
"enabled": true,
"loss_scale": 0,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 5e8,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 5e8,
"contiguous_gradients": true,
"cpu_offload": true
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": 3e-5,
"betas": [ 0.8, 0.999 ],
"eps": 1e-8,
"weight_decay": 3e-7
}
},
"zero_allow_untested_optimizer": true,
"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": 0,
"warmup_max_lr": 3e-5,
"warmup_num_steps": 500
}
}
}
If you already have a command line that you have been using with :class:`transformers.Trainer` args, you can continue
using those and the :class:`~transformers.Trainer` will automatically convert them into the corresponding DeepSpeed
configuration at run time. For example, you could use the following configuration file:
.. code-block:: json
{
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 5e8,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 5e8,
"contiguous_gradients": true,
"cpu_offload": true
}
}
and the following command line arguments:
.. code-block:: bash
--learning_rate 3e-5 --warmup_steps 500 --adam_beta1 0.8 --adam_beta2 0.999 --adam_epsilon 1e-8 \
--weight_decay 3e-7 --lr_scheduler_type constant_with_warmup --fp16 --fp16_backend amp
to achieve the same configuration as provided by the longer json file in the first example.
When you execute the program, DeepSpeed will log the configuration it received from the :class:`~transformers.Trainer`
to the console, so you can see exactly what the final configuration was passed to it.
Shared Configuration
=======================================================================================================================
Some configuration information is required by both the :class:`~transformers.Trainer` and DeepSpeed to function
correctly, therefore, to prevent conflicting definitions, which could lead to hard to detect errors, we chose to
configure those via the :class:`~transformers.Trainer` command line arguments.
Therefore, the following DeepSpeed configuration params shouldn't be used with the :class:`~transformers.Trainer`:
* ``train_batch_size``
* ``train_micro_batch_size_per_gpu``
* ``gradient_accumulation_steps``
as these will be automatically derived from the run time environment and the following 2 command line arguments:
.. code-block:: bash
--per_device_train_batch_size 8 --gradient_accumulation_steps 2
which are always required to be supplied.
Of course, you will need to adjust the values in this example to your situation.
Moved to :ref:`deepspeed-config-shared`.
ZeRO
=======================================================================================================================
The ``zero_optimization`` section of the configuration file is the most important part (`docs
<https://www.deepspeed.ai/docs/config-json/#zero-optimizations-for-fp16-training>`__), since that is where you define
which ZeRO stages you want to enable and how to configure them.
Moved to :ref:`deepspeed-zero`.
.. code-block:: json
ZeRO-2 Config
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
{
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 5e8,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 5e8,
"contiguous_gradients": true,
"cpu_offload": true
}
}
Moved to :ref:`deepspeed-zero2-config`.
Notes:
ZeRO-3 Config
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
- enabling ``cpu_offload`` should reduce GPU RAM usage (it requires ``"stage": 2``)
- ``"overlap_comm": true`` trades off increased GPU RAM usage to lower all-reduce latency. ``overlap_comm`` uses 4.5x
the ``allgather_bucket_size`` and ``reduce_bucket_size`` values. So if they are set to 5e8, this requires a 9GB
footprint (``5e8 x 2Bytes x 2 x 4.5``). Therefore, if you have a GPU with 8GB or less RAM, to avoid getting
OOM-errors you will need to reduce those parameters to about ``2e8``, which would require 3.6GB.
Moved to :ref:`deepspeed-zero3-config`.
This section has to be configured exclusively via DeepSpeed configuration - the :class:`~transformers.Trainer` provides
no equivalent command line arguments.
NVMe Support
=======================================================================================================================
Moved to :ref:`deepspeed-nvme`.
ZeRO-2 vs ZeRO-3 Performance
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Moved to :ref:`deepspeed-zero2-zero3-performance`.
ZeRO-2 Example
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Moved to :ref:`deepspeed-zero2-example`.
ZeRO-3 Example
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Moved to :ref:`deepspeed-zero3-example`.
Optimizer and Scheduler
=======================================================================================================================
Optimizer
=======================================================================================================================
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DeepSpeed's main optimizers are Adam, OneBitAdam, and Lamb. These have been thoroughly tested with ZeRO and are thus
recommended to be used. It, however, can import other optimizers from ``torch``. The full documentation is `here
<https://www.deepspeed.ai/docs/config-json/#optimizer-parameters>`__.
If you don't configure the ``optimizer`` entry in the configuration file, the :class:`~transformers.Trainer` will
automatically set it to ``AdamW`` and will use the supplied values or the defaults for the following command line
arguments: ``--learning_rate``, ``--adam_beta1``, ``--adam_beta2``, ``--adam_epsilon`` and ``--weight_decay``.
Here is an example of the pre-configured ``optimizer`` entry for AdamW:
.. code-block:: json
{
"zero_allow_untested_optimizer": true,
"optimizer": {
"type": "AdamW",
"params": {
"lr": 0.001,
"betas": [0.8, 0.999],
"eps": 1e-8,
"weight_decay": 3e-7
}
}
}
Since AdamW isn't on the list of tested with DeepSpeed/ZeRO optimizers, we have to add
``zero_allow_untested_optimizer`` flag.
If you want to use one of the officially supported optimizers, configure them explicitly in the configuration file, and
make sure to adjust the values. e.g. if use Adam you will want ``weight_decay`` around ``0.01``.
Moved to :ref:`deepspeed-optimizer`.
Scheduler
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Moved to :ref:`deepspeed-scheduler`.
fp32 Precision
=======================================================================================================================
DeepSpeed supports LRRangeTest, OneCycle, WarmupLR and WarmupDecayLR LR schedulers. The full documentation is `here
<https://www.deepspeed.ai/docs/config-json/#scheduler-parameters>`__.
If you don't configure the ``scheduler`` entry in the configuration file, the :class:`~transformers.Trainer` will use
the value of ``--lr_scheduler_type`` to configure it. Currently the :class:`~transformers.Trainer` supports only 2 LR
schedulers that are also supported by DeepSpeed:
* ``WarmupLR`` via ``--lr_scheduler_type constant_with_warmup``
* ``WarmupDecayLR`` via ``--lr_scheduler_type linear``. This is also the default value for ``--lr_scheduler_type``,
therefore, if you don't configure the scheduler this is scheduler that will get configured by default.
In either case, the values of ``--learning_rate`` and ``--warmup_steps`` will be used for the configuration.
In other words, if you don't use the configuration file to set the ``scheduler`` entry, provide either:
.. code-block:: bash
--lr_scheduler_type constant_with_warmup --learning_rate 3e-5 --warmup_steps 500
or
.. code-block:: bash
--lr_scheduler_type linear --learning_rate 3e-5 --warmup_steps 500
with the desired values. If you don't pass these arguments, reasonable default values will be used instead.
In the case of WarmupDecayLR ``total_num_steps`` gets set either via the ``--max_steps`` command line argument, or if
it is not provided, derived automatically at run time based on the environment and the size of the dataset and other
command line arguments.
Here is an example of the pre-configured ``scheduler`` entry for WarmupLR (``constant_with_warmup`` in the
:class:`~transformers.Trainer` API):
.. code-block:: json
{
"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": 0,
"warmup_max_lr": 0.001,
"warmup_num_steps": 1000
}
}
}
Moved to :ref:`deepspeed-fp32`.
Automatic Mixed Precision
=======================================================================================================================
You can work with FP16 in one of the following ways:
Moved to :ref:`deepspeed-amp`.
1. Pytorch native amp, as documented `here <https://www.deepspeed.ai/docs/config-json/#fp16-training-options>`__.
2. NVIDIA's apex, as documented `here
<https://www.deepspeed.ai/docs/config-json/#automatic-mixed-precision-amp-training-options>`__.
Batch Size
=======================================================================================================================
If you want to use an equivalent of the Pytorch native amp, you can either configure the ``fp16`` entry in the
configuration file, or use the following command line arguments: ``--fp16 --fp16_backend amp``.
Moved to :ref:`deepspeed-bs`.
Here is an example of the ``fp16`` configuration:
.. code-block:: json
{
"fp16": {
"enabled": true,
"loss_scale": 0,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1
},
}
If you want to use NVIDIA's apex instead, you can can either configure the ``amp`` entry in the configuration file, or
use the following command line arguments: ``--fp16 --fp16_backend apex --fp16_opt_level 01``.
Here is an example of the ``amp`` configuration:
.. code-block:: json
{
"amp": {
"enabled": true,
"opt_level": "O1"
}
}
Gradient Accumulation
=======================================================================================================================
Moved to :ref:`deepspeed-grad-acc`.
Gradient Clipping
=======================================================================================================================
If you don't configure the ``gradient_clipping`` entry in the configuration file, the :class:`~transformers.Trainer`
will use the value of the ``--max_grad_norm`` command line argument to set it.
Here is an example of the ``gradient_clipping`` configuration:
.. code-block:: json
{
"gradient_clipping": 1.0,
}
Moved to :ref:`deepspeed-grad-clip`.
Notes
Getting The Model Weights Out
=======================================================================================================================
* DeepSpeed works with the PyTorch :class:`~transformers.Trainer` but not TF :class:`~transformers.TFTrainer`.
* While DeepSpeed has a pip installable PyPI package, it is highly recommended that it gets installed from `source
<https://github.com/microsoft/deepspeed#installation>`__ to best match your hardware and also if you need to enable
certain features, like 1-bit Adam, which aren't available in the pypi distribution.
* You don't have to use the :class:`~transformers.Trainer` to use DeepSpeed with HuggingFace ``transformers`` - you can
use any model with your own trainer, and you will have to adapt the latter according to `the DeepSpeed integration
instructions <https://www.deepspeed.ai/getting-started/#writing-deepspeed-models>`__.
Main DeepSpeed Resources
=======================================================================================================================
- `Project's github <https://github.com/microsoft/deepspeed>`__
- `Usage docs <https://www.deepspeed.ai/getting-started/>`__
- `API docs <https://deepspeed.readthedocs.io/en/latest/index.html>`__
- `Blog posts <https://www.microsoft.com/en-us/research/search/?q=deepspeed>`__
Finally, please, remember that, HuggingFace :class:`~transformers.Trainer` only integrates DeepSpeed, therefore if you
have any problems or questions with regards to DeepSpeed usage, please, file an issue with `DeepSpeed GitHub
<https://github.com/microsoft/DeepSpeed/issues>`__.
Moved to :ref:`deepspeed-weight-extraction`.

View File

@@ -23,7 +23,7 @@ expected changes:
#### 1. AutoTokenizers and pipelines now use fast (rust) tokenizers by default.
The python and rust tokenizers have roughly the same API, but the rust tokenizers have a more complete feature set.
The python and rust tokenizers have roughly the same API, but the rust tokenizers have a more complete feature set.
This introduces two breaking changes:
- The handling of overflowing tokens between the python and rust tokenizers is different.
@@ -85,7 +85,7 @@ This is a breaking change as importing intermediary layers using a model's modul
##### How to obtain the same behavior as v3.x in v4.x
In order to obtain the same behavior as version `v3.x`, you should update the path used to access the layers.
In order to obtain the same behavior as version `v3.x`, you should update the path used to access the layers.
In version `v3.x`:
```bash
@@ -169,8 +169,8 @@ Regarding the `TFTrainer` class:
- The `TFTrainer` method `_setup_wandb` is deprecated in favor of `setup_wandb`.
- The `TFTrainer` method `_run_model` is deprecated in favor of `run_model`.
Regarding the `TrainerArgument` class:
- The `TrainerArgument` argument `evaluate_during_training` is deprecated in favor of `evaluation_strategy`.
Regarding the `TrainingArguments` class:
- The `TrainingArguments` argument `evaluate_during_training` is deprecated in favor of `evaluation_strategy`.
Regarding the Transfo-XL model:
- The Transfo-XL configuration attribute `tie_weight` becomes `tie_words_embeddings`.

View File

@@ -43,7 +43,8 @@ Tips:
similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same
number of (repeating) layers.
The original code can be found `here <https://github.com/google-research/ALBERT>`__.
This model was contributed by `lysandre <https://huggingface.co/lysandre>`__. The original code can be found `here
<https://github.com/google-research/ALBERT>`__.
AlbertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@@ -44,6 +44,13 @@ AutoTokenizer
:members:
AutoFeatureExtractor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoFeatureExtractor
:members:
AutoModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -121,6 +128,13 @@ AutoModelForTableQuestionAnswering
:members:
AutoModelForImageClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForImageClassification
:members:
TFAutoModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -189,3 +203,73 @@ FlaxAutoModel
.. autoclass:: transformers.FlaxAutoModel
:members:
FlaxAutoModelForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxAutoModelForCausalLM
:members:
FlaxAutoModelForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxAutoModelForPreTraining
:members:
FlaxAutoModelForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxAutoModelForMaskedLM
:members:
FlaxAutoModelForSeq2SeqLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxAutoModelForSeq2SeqLM
:members:
FlaxAutoModelForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxAutoModelForSequenceClassification
:members:
FlaxAutoModelForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxAutoModelForQuestionAnswering
:members:
FlaxAutoModelForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxAutoModelForTokenClassification
:members:
FlaxAutoModelForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxAutoModelForMultipleChoice
:members:
FlaxAutoModelForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxAutoModelForNextSentencePrediction
:members:
FlaxAutoModelForImageClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxAutoModelForImageClassification
:members:

View File

@@ -35,14 +35,15 @@ According to the abstract,
state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
of up to 6 ROUGE.
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/bart>`__.
This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__. The Authors' code can be found `here
<https://github.com/pytorch/fairseq/tree/master/examples/bart>`__.
Examples
_______________________________________________________________________________________________________________________
- Examples and scripts for fine-tuning BART and other models for sequence to sequence tasks can be found in
:prefix_link:`examples/seq2seq/ <examples/seq2seq/README.md>`.
:prefix_link:`examples/pytorch/summarization/ <examples/pytorch/summarization/README.md>`.
- An example of how to train :class:`~transformers.BartForConditionalGeneration` with a Hugging Face :obj:`datasets`
object can be found in this `forum discussion
<https://discuss.huggingface.co/t/train-bart-for-conditional-generation-e-g-summarization/1904>`__.
@@ -60,7 +61,7 @@ Implementation Notes
- Model predictions are intended to be identical to the original implementation when
:obj:`force_bos_token_to_be_generated=True`. This only works, however, if the string you pass to
:func:`fairseq.encode` starts with a space.
- :meth:`~transformers.BartForConditionalGeneration.generate` should be used for conditional generation tasks like
- :meth:`~transformers.generation_utils.GenerationMixin.generate` should be used for conditional generation tasks like
summarization, see the example in that docstrings.
- Models that load the `facebook/bart-large-cnn` weights will not have a :obj:`mask_token_id`, or be able to perform
mask-filling tasks.
@@ -130,6 +131,7 @@ BartForQuestionAnswering
.. autoclass:: transformers.BartForQuestionAnswering
:members: forward
BartForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -137,7 +139,6 @@ BartForCausalLM
:members: forward
TFBartModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -150,3 +151,32 @@ TFBartForConditionalGeneration
.. autoclass:: transformers.TFBartForConditionalGeneration
:members: call
FlaxBartModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBartModel
:members: __call__, encode, decode
FlaxBartForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBartForConditionalGeneration
:members: __call__, encode, decode
FlaxBartForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBartForSequenceClassification
:members: __call__, encode, decode
FlaxBartForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBartForQuestionAnswering
:members: __call__, encode, decode

View File

@@ -16,7 +16,7 @@ BARThez
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The BARThez model was proposed in `BARThez: a Skilled Pretrained French Sequence-to-Sequence Model`
The BARThez model was proposed in `BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
<https://arxiv.org/abs/2010.12321>`__ by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis on 23 Oct,
2020.
@@ -35,14 +35,15 @@ summarization dataset, OrangeSum, that we release with this paper. We also conti
pretrained multilingual BART on BARThez's corpus, and we show that the resulting model, which we call mBARTHez,
provides a significant boost over vanilla BARThez, and is on par with or outperforms CamemBERT and FlauBERT.*
The Authors' code can be found `here <https://github.com/moussaKam/BARThez>`__.
This model was contributed by `moussakam <https://huggingface.co/moussakam>`__. The Authors' code can be found `here
<https://github.com/moussaKam/BARThez>`__.
Examples
_______________________________________________________________________________________________________________________
- BARThez can be fine-tuned on sequence-to-sequence tasks in a similar way as BART, check:
:prefix_link:`examples/seq2seq/ <examples/seq2seq/README.md>`.
:prefix_link:`examples/pytorch/summarization/ <examples/pytorch/summarization/README.md>`.
BarthezTokenizer

View File

@@ -42,7 +42,8 @@ Tips:
- BERT was trained with the masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation.
The original code can be found `here <https://github.com/google-research/bert>`__.
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
<https://github.com/google-research/bert>`__.
BertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -90,7 +91,7 @@ BertForPreTraining
:members: forward
BertModelLMHeadModel
BertLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertLMHeadModel
@@ -209,8 +210,50 @@ FlaxBertModel
:members: __call__
FlaxBertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBertForPreTraining
:members: __call__
FlaxBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBertForMaskedLM
:members: __call__
FlaxBertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBertForNextSentencePrediction
:members: __call__
FlaxBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBertForSequenceClassification
:members: __call__
FlaxBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBertForMultipleChoice
:members: __call__
FlaxBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBertForTokenClassification
:members: __call__
FlaxBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBertForQuestionAnswering
:members: __call__

View File

@@ -0,0 +1,80 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
BertJapanese
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The BERT models trained on Japanese text.
There are models with two different tokenization methods:
- Tokenize with MeCab and WordPiece. This requires some extra dependencies, `fugashi
<https://github.com/polm/fugashi>`__ which is a wrapper around `MeCab <https://taku910.github.io/mecab/>`__.
- Tokenize into characters.
To use `MecabTokenizer`, you should ``pip install transformers["ja"]`` (or ``pip install -e .["ja"]`` if you install
from source) to install dependencies.
See `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__.
Example of using a model with MeCab and WordPiece tokenization:
.. code-block::
>>> import torch
>>> from transformers import AutoModel, AutoTokenizer
>>> bertjapanese = AutoModel.from_pretrained("cl-tohoku/bert-base-japanese")
>>> tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese")
>>> ## Input Japanese Text
>>> line = "吾輩は猫である。"
>>> inputs = tokenizer(line, return_tensors="pt")
>>> print(tokenizer.decode(inputs['input_ids'][0]))
[CLS] 吾輩 は 猫 で ある 。 [SEP]
>>> outputs = bertjapanese(**inputs)
Example of using a model with Character tokenization:
.. code-block::
>>> bertjapanese = AutoModel.from_pretrained("cl-tohoku/bert-base-japanese-char")
>>> tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese-char")
>>> ## Input Japanese Text
>>> line = "吾輩は猫である。"
>>> inputs = tokenizer(line, return_tensors="pt")
>>> print(tokenizer.decode(inputs['input_ids'][0]))
[CLS] 吾 輩 は 猫 で あ る 。 [SEP]
>>> outputs = bertjapanese(**inputs)
Tips:
- This implementation is the same as BERT, except for tokenization method. Refer to the :doc:`documentation of BERT
<bert>` for more usage examples.
This model was contributed by `cl-tohoku <https://huggingface.co/cl-tohoku>`__.
BertJapaneseTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertJapaneseTokenizer
:members:

View File

@@ -38,22 +38,22 @@ Usage:
.. code-block::
# leverage checkpoints for Bert2Bert model...
# use BERT's cls token as BOS token and sep token as EOS token
encoder = BertGenerationEncoder.from_pretrained("bert-large-uncased", bos_token_id=101, eos_token_id=102)
# add cross attention layers and use BERT's cls token as BOS token and sep token as EOS token
decoder = BertGenerationDecoder.from_pretrained("bert-large-uncased", add_cross_attention=True, is_decoder=True, bos_token_id=101, eos_token_id=102)
bert2bert = EncoderDecoderModel(encoder=encoder, decoder=decoder)
>>> # leverage checkpoints for Bert2Bert model...
>>> # use BERT's cls token as BOS token and sep token as EOS token
>>> encoder = BertGenerationEncoder.from_pretrained("bert-large-uncased", bos_token_id=101, eos_token_id=102)
>>> # add cross attention layers and use BERT's cls token as BOS token and sep token as EOS token
>>> decoder = BertGenerationDecoder.from_pretrained("bert-large-uncased", add_cross_attention=True, is_decoder=True, bos_token_id=101, eos_token_id=102)
>>> bert2bert = EncoderDecoderModel(encoder=encoder, decoder=decoder)
# create tokenizer...
tokenizer = BertTokenizer.from_pretrained("bert-large-uncased")
>>> # create tokenizer...
>>> tokenizer = BertTokenizer.from_pretrained("bert-large-uncased")
input_ids = tokenizer('This is a long article to summarize', add_special_tokens=False, return_tensors="pt").input_ids
labels = tokenizer('This is a short summary', return_tensors="pt").input_ids
>>> input_ids = tokenizer('This is a long article to summarize', add_special_tokens=False, return_tensors="pt").input_ids
>>> labels = tokenizer('This is a short summary', return_tensors="pt").input_ids
# train...
loss = bert2bert(input_ids=input_ids, decoder_input_ids=labels, labels=labels).loss
loss.backward()
>>> # train...
>>> loss = bert2bert(input_ids=input_ids, decoder_input_ids=labels, labels=labels).loss
>>> loss.backward()
- Pretrained :class:`~transformers.EncoderDecoderModel` are also directly available in the model hub, e.g.,
@@ -61,15 +61,15 @@ Usage:
.. code-block::
# instantiate sentence fusion model
sentence_fuser = EncoderDecoderModel.from_pretrained("google/roberta2roberta_L-24_discofuse")
tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_discofuse")
>>> # instantiate sentence fusion model
>>> sentence_fuser = EncoderDecoderModel.from_pretrained("google/roberta2roberta_L-24_discofuse")
>>> tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_discofuse")
input_ids = tokenizer('This is the first sentence. This is the second sentence.', add_special_tokens=False, return_tensors="pt").input_ids
>>> input_ids = tokenizer('This is the first sentence. This is the second sentence.', add_special_tokens=False, return_tensors="pt").input_ids
outputs = sentence_fuser.generate(input_ids)
>>> outputs = sentence_fuser.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> print(tokenizer.decode(outputs[0]))
Tips:
@@ -79,7 +79,8 @@ Tips:
- For summarization, sentence splitting, sentence fusion and translation, no special tokens are required for the input.
Therefore, no EOS token should be added to the end of the input.
The original code can be found `here <https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder>`__.
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The original code can be
found `here <https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder>`__.
BertGenerationConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@@ -31,31 +31,31 @@ Example of use:
.. code-block::
import torch
from transformers import AutoModel, AutoTokenizer
>>> import torch
>>> from transformers import AutoModel, AutoTokenizer
bertweet = AutoModel.from_pretrained("vinai/bertweet-base")
>>> bertweet = AutoModel.from_pretrained("vinai/bertweet-base")
# For transformers v4.x+:
tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base", use_fast=False)
>>> # For transformers v4.x+:
>>> tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base", use_fast=False)
# For transformers v3.x:
# tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base")
>>> # For transformers v3.x:
>>> # tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base")
# INPUT TWEET IS ALREADY NORMALIZED!
line = "SC has first two presumptive cases of coronavirus , DHEC confirms HTTPURL via @USER :cry:"
>>> # INPUT TWEET IS ALREADY NORMALIZED!
>>> line = "SC has first two presumptive cases of coronavirus , DHEC confirms HTTPURL via @USER :cry:"
input_ids = torch.tensor([tokenizer.encode(line)])
>>> input_ids = torch.tensor([tokenizer.encode(line)])
with torch.no_grad():
features = bertweet(input_ids) # Models outputs are now tuples
>>> with torch.no_grad():
... features = bertweet(input_ids) # Models outputs are now tuples
## With TensorFlow 2.0+:
# from transformers import TFAutoModel
# bertweet = TFAutoModel.from_pretrained("vinai/bertweet-base")
>>> # With TensorFlow 2.0+:
>>> # from transformers import TFAutoModel
>>> # bertweet = TFAutoModel.from_pretrained("vinai/bertweet-base")
The original code can be found `here <https://github.com/VinAIResearch/BERTweet>`__.
This model was contributed by `dqnguyen <https://huggingface.co/dqnguyen>`__. The original code can be found `here
<https://github.com/VinAIResearch/BERTweet>`__.
BertweetTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@@ -0,0 +1,185 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
BigBird
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The BigBird model was proposed in `Big Bird: Transformers for Longer Sequences <https://arxiv.org/abs/2007.14062>`__ by
Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon,
Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and others. BigBird, is a sparse-attention
based transformer which extends Transformer based models, such as BERT to much longer sequences. In addition to sparse
attention, BigBird also applies global attention as well as random attention to the input sequence. Theoretically, it
has been shown that applying sparse, global, and random attention approximates full attention, while being
computationally much more efficient for longer sequences. As a consequence of the capability to handle longer context,
BigBird has shown improved performance on various long document NLP tasks, such as question answering and
summarization, compared to BERT or RoBERTa.
The abstract from the paper is the following:
*Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP.
Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence
length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that
reduces this quadratic dependency to linear. We show that BigBird is a universal approximator of sequence functions and
is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our
theoretical analysis reveals some of the benefits of having O(1) global tokens (such as CLS), that attend to the entire
sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to
8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context,
BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also
propose novel applications to genomics data.*
Tips:
- For an in-detail explanation on how BigBird's attention works, see `this blog post
<https://huggingface.co/blog/big-bird>`__.
- BigBird comes with 2 implementations: **original_full** & **block_sparse**. For the sequence length < 1024, using
**original_full** is advised as there is no benefit in using **block_sparse** attention.
- The code currently uses window size of 3 blocks and 2 global blocks.
- Sequence length must be divisible by block size.
- Current implementation supports only **ITC**.
- Current implementation doesn't support **num_random_blocks = 0**
This model was contributed by `vasudevgupta <https://huggingface.co/vasudevgupta>`__. The original code can be found
`here <https://github.com/google-research/bigbird>`__.
BigBirdConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BigBirdConfig
:members:
BigBirdTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BigBirdTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
BigBirdTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BigBirdTokenizerFast
:members:
BigBird specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.models.big_bird.modeling_big_bird.BigBirdForPreTrainingOutput
:members:
BigBirdModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BigBirdModel
:members: forward
BigBirdForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BigBirdForPreTraining
:members: forward
BigBirdForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BigBirdForCausalLM
:members: forward
BigBirdForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BigBirdForMaskedLM
:members: forward
BigBirdForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BigBirdForSequenceClassification
:members: forward
BigBirdForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BigBirdForMultipleChoice
:members: forward
BigBirdForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BigBirdForTokenClassification
:members: forward
BigBirdForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BigBirdForQuestionAnswering
:members: forward
FlaxBigBirdModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBigBirdModel
:members: __call__
FlaxBigBirdForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBigBirdForPreTraining
:members: __call__
FlaxBigBirdForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBigBirdForMaskedLM
:members: __call__
FlaxBigBirdForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBigBirdForSequenceClassification
:members: __call__
FlaxBigBirdForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBigBirdForMultipleChoice
:members: __call__
FlaxBigBirdForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBigBirdForTokenClassification
:members: __call__
FlaxBigBirdForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBigBirdForQuestionAnswering
:members: __call__

View File

@@ -0,0 +1,98 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
BigBirdPegasus
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The BigBird model was proposed in `Big Bird: Transformers for Longer Sequences <https://arxiv.org/abs/2007.14062>`__ by
Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon,
Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and others. BigBird, is a sparse-attention
based transformer which extends Transformer based models, such as BERT to much longer sequences. In addition to sparse
attention, BigBird also applies global attention as well as random attention to the input sequence. Theoretically, it
has been shown that applying sparse, global, and random attention approximates full attention, while being
computationally much more efficient for longer sequences. As a consequence of the capability to handle longer context,
BigBird has shown improved performance on various long document NLP tasks, such as question answering and
summarization, compared to BERT or RoBERTa.
The abstract from the paper is the following:
*Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP.
Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence
length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that
reduces this quadratic dependency to linear. We show that BigBird is a universal approximator of sequence functions and
is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our
theoretical analysis reveals some of the benefits of having O(1) global tokens (such as CLS), that attend to the entire
sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to
8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context,
BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also
propose novel applications to genomics data.*
Tips:
- For an in-detail explanation on how BigBird's attention works, see `this blog post
<https://huggingface.co/blog/big-bird>`__.
- BigBird comes with 2 implementations: **original_full** & **block_sparse**. For the sequence length < 1024, using
**original_full** is advised as there is no benefit in using **block_sparse** attention.
- The code currently uses window size of 3 blocks and 2 global blocks.
- Sequence length must be divisible by block size.
- Current implementation supports only **ITC**.
- Current implementation doesn't support **num_random_blocks = 0**.
- BigBirdPegasus uses the `PegasusTokenizer
<https://github.com/huggingface/transformers/blob/master/src/transformers/models/pegasus/tokenization_pegasus.py>`__.
The original code can be found `here <https://github.com/google-research/bigbird>`__.
BigBirdPegasusConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BigBirdPegasusConfig
:members:
BigBirdPegasusModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BigBirdPegasusModel
:members: forward
BigBirdPegasusForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BigBirdPegasusForConditionalGeneration
:members: forward
BigBirdPegasusForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BigBirdPegasusForSequenceClassification
:members: forward
BigBirdPegasusForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BigBirdPegasusForQuestionAnswering
:members: forward
BigBirdPegasusForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BigBirdPegasusForCausalLM
:members: forward

View File

@@ -36,7 +36,8 @@ and code publicly available. Human evaluations show our best models are superior
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
failure cases of our models.*
The authors' code can be found `here <https://github.com/facebookresearch/ParlAI>`__ .
This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__. The authors' code can be found `here
<https://github.com/facebookresearch/ParlAI>`__ .
Implementation Notes

View File

@@ -39,7 +39,8 @@ and code publicly available. Human evaluations show our best models are superior
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
failure cases of our models.*
The authors' code can be found `here <https://github.com/facebookresearch/ParlAI>`__ .
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The authors' code can be
found `here <https://github.com/facebookresearch/ParlAI>`__ .
BlenderbotSmallConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@@ -43,4 +43,5 @@ Tips:
that is sadly not open-sourced yet. It would be very useful for the community, if someone tries to implement the
algorithm to make BORT fine-tuning work.
The original code can be found `here <https://github.com/alexa/bort/>`__.
This model was contributed by `stefan-it <https://huggingface.co/stefan-it>`__. The original code can be found `here
<https://github.com/alexa/bort/>`__.

View File

@@ -0,0 +1,83 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
ByT5
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ByT5 model was presented in `ByT5: Towards a token-free future with pre-trained byte-to-byte models
<https://arxiv.org/abs/2105.13626>`_ by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir
Kale, Adam Roberts, Colin Raffel.
The abstract from the paper is the following:
*Most widely-used pre-trained language models operate on sequences of tokens corresponding to word or subword units.
Encoding text as a sequence of tokens requires a tokenizer, which is typically created as an independent artifact from
the model. Token-free models that instead operate directly on raw text (bytes or characters) have many benefits: they
can process text in any language out of the box, they are more robust to noise, and they minimize technical debt by
removing complex and error-prone text preprocessing pipelines. Since byte or character sequences are longer than token
sequences, past work on token-free models has often introduced new model architectures designed to amortize the cost of
operating directly on raw text. In this paper, we show that a standard Transformer architecture can be used with
minimal modifications to process byte sequences. We carefully characterize the trade-offs in terms of parameter count,
training FLOPs, and inference speed, and show that byte-level models are competitive with their token-level
counterparts. We also demonstrate that byte-level models are significantly more robust to noise and perform better on
tasks that are sensitive to spelling and pronunciation. As part of our contribution, we release a new set of
pre-trained byte-level Transformer models based on the T5 architecture, as well as all code and data used in our
experiments.*
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The original code can be
found `here <https://github.com/google-research/byt5>`__.
ByT5's architecture is based on the T5 model, so one can refer to :doc:`T5's documentation page <t5>`.
Example
_______________________________________________________________________________________________________________________
ByT5 works on raw UTF-8 bytes, so it can be used without a tokenizer:
.. code-block::
from transformers import T5ForConditionalGeneration
import torch
model = T5ForConditionalGeneration.from_pretrained('google/byt5-small')
input_ids = torch.tensor([list("Life is like a box of chocolates.".encode("utf-8"))]) + 3 # add 3 for special tokens
labels = torch.tensor([list("La vie est comme une boîte de chocolat.".encode("utf-8"))]) + 3 # add 3 for special tokens
loss = model(input_ids, labels=labels).loss # forward pass
For batched inference and training it is however recommended to make use of the tokenizer:
.. code-block::
from transformers import T5ForConditionalGeneration, AutoTokenizer
model = T5ForConditionalGeneration.from_pretrained('google/byt5-small')
tokenizer = AutoTokenizer.from_pretrained('google/byt5-small')
model_inputs = tokenizer(["Life is like a box of chocolates.", "Today is Monday."], padding="longest", return_tensors="pt")
labels = tokenizer(["La vie est comme une boîte de chocolat.", "Aujourd'hui c'est lundi."], padding="longest", return_tensors="pt").input_ids
loss = model(**model_inputs, labels=labels).loss # forward pass
ByT5Tokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ByT5Tokenizer
See :class:`~transformers.ByT5Tokenizer` for all details.

View File

@@ -37,7 +37,8 @@ Tips:
- This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa <roberta>` for usage examples
as well as the information relative to the inputs and outputs.
The original code can be found `here <https://camembert-model.fr/>`__.
This model was contributed by `camembert <https://huggingface.co/camembert>`__. The original code can be found `here
<https://camembert-model.fr/>`__.
CamembertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@@ -0,0 +1,175 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
CLIP
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The CLIP model was proposed in `Learning Transferable Visual Models From Natural Language Supervision
<https://arxiv.org/abs/2103.00020>`__ by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh,
Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. CLIP
(Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be
instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing
for the task, similarly to the zero-shot capabilities of GPT-2 and 3.
The abstract from the paper is the following:
*State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This
restricted form of supervision limits their generality and usability since additional labeled data is needed to specify
any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a
much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes
with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400
million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference
learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study
the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks
such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The
model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need
for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot
without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained
model weights at this https URL.*
Usage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLIP is a multi-modal vision and language model. It can be used for image-text similarity and for zero-shot image
classification. CLIP uses a ViT like transformer to get visual features and a causal language model to get the text
features. Both the text and visual features are then projected to a latent space with identical dimension. The dot
product between the projected image and text features is then used as a similar score.
To feed images to the Transformer encoder, each image is split into a sequence of fixed-size non-overlapping patches,
which are then linearly embedded. A [CLS] token is added to serve as representation of an entire image. The authors
also add absolute position embeddings, and feed the resulting sequence of vectors to a standard Transformer encoder.
The :class:`~transformers.CLIPFeatureExtractor` can be used to resize (or rescale) and normalize images for the model.
The :class:`~transformers.CLIPTokenizer` is used to encode the text. The :class:`~transformers.CLIPProcessor` wraps
:class:`~transformers.CLIPFeatureExtractor` and :class:`~transformers.CLIPTokenizer` into a single instance to both
encode the text and prepare the images. The following example shows how to get the image-text similarity scores using
:class:`~transformers.CLIPProcessor` and :class:`~transformers.CLIPModel`.
.. code-block::
>>> import torch
>>> from PIL import Image
>>> import requests
>>> from transformers import CLIPProcessor, CLIPModel
>>> model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
>>> processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)
>>> inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)
>>> outputs = model(**inputs)
>>> logits_per_image = outputs.logits_per_image # this is the image-text similarity score
>>> probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__. The original code can be found `here
<https://github.com/openai/CLIP>`__.
CLIPConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CLIPConfig
:members: from_text_vision_configs
CLIPTextConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CLIPTextConfig
:members:
CLIPVisionConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CLIPVisionConfig
:members:
CLIPTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CLIPTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
CLIPTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CLIPTokenizerFast
:members:
CLIPFeatureExtractor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CLIPFeatureExtractor
:members:
CLIPProcessor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CLIPProcessor
:members:
CLIPModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CLIPModel
:members: forward, get_text_features, get_image_features
CLIPTextModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CLIPTextModel
:members: forward
CLIPVisionModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CLIPVisionModel
:members: forward
FlaxCLIPModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxCLIPModel
:members: __call__, get_text_features, get_image_features
FlaxCLIPTextModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxCLIPTextModel
:members: __call__
FlaxCLIPVisionModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxCLIPVisionModel
:members: __call__

View File

@@ -34,8 +34,10 @@ ConvBERT significantly outperforms BERT and its variants in various downstream t
fewer model parameters. Remarkably, ConvBERTbase model achieves 86.4 GLUE score, 0.7 higher than ELECTRAbase, while
using less than 1/4 training cost. Code and pre-trained models will be released.*
ConvBERT training tips are similar to those of BERT. The original implementation can be found here:
https://github.com/yitu-opensource/ConvBert
ConvBERT training tips are similar to those of BERT.
This model was contributed by `abhishek <https://huggingface.co/abhishek>`__. The original implementation can be found
here: https://github.com/yitu-opensource/ConvBert
ConvBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -56,8 +58,7 @@ ConvBertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ConvBertTokenizerFast
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
:members:
ConvBertModel

View File

@@ -0,0 +1,45 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
CPM
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The CPM model was proposed in `CPM: A Large-scale Generative Chinese Pre-trained Language Model
<https://arxiv.org/abs/2012.00413>`__ by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin,
Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen,
Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
The abstract from the paper is the following:
*Pre-trained Language Models (PLMs) have proven to be beneficial for various downstream NLP tasks. Recently, GPT-3,
with 175 billion parameters and 570GB training data, drew a lot of attention due to the capacity of few-shot (even
zero-shot) learning. However, applying GPT-3 to address Chinese NLP tasks is still challenging, as the training corpus
of GPT-3 is primarily English, and the parameters are not publicly available. In this technical report, we release the
Chinese Pre-trained Language Model (CPM) with generative pre-training on large-scale Chinese training data. To the best
of our knowledge, CPM, with 2.6 billion parameters and 100GB Chinese training data, is the largest Chinese pre-trained
language model, which could facilitate several downstream Chinese NLP tasks, such as conversation, essay generation,
cloze test, and language understanding. Extensive experiments demonstrate that CPM achieves strong performance on many
NLP tasks in the settings of few-shot (even zero-shot) learning.*
This model was contributed by `canwenxu <https://huggingface.co/canwenxu>`__. The original implementation can be found
here: https://github.com/TsinghuaAI/CPM-Generate
Note: We only have a tokenizer here, since the model architecture is the same as GPT-2.
CpmTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CpmTokenizer
:members:

View File

@@ -46,7 +46,8 @@ Tips:
`reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage of
this argument.
The original code can be found `here <https://github.com/salesforce/ctrl>`__.
This model was contributed by `keskarnitishr <https://huggingface.co/keskarnitishr>`__. The original code can be found
`here <https://github.com/salesforce/ctrl>`__.
CTRLConfig

View File

@@ -38,7 +38,8 @@ the training data performs consistently better on a wide range of NLP tasks, ach
pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.*
The original code can be found `here <https://github.com/microsoft/DeBERTa>`__.
This model was contributed by `DeBERTa <https://huggingface.co/DeBERTa>`__. The original code can be found `here
<https://github.com/microsoft/DeBERTa>`__.
DebertaConfig
@@ -55,12 +56,18 @@ DebertaTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
DebertaTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaTokenizerFast
:members: build_inputs_with_special_tokens, create_token_type_ids_from_sequences
DebertaModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaModel
:members:
:members: forward
DebertaPreTrainedModel
@@ -74,25 +81,25 @@ DebertaForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaForMaskedLM
:members:
:members: forward
DebertaForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaForSequenceClassification
:members:
:members: forward
DebertaForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaForTokenClassification
:members:
:members: forward
DebertaForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaForQuestionAnswering
:members:
:members: forward

View File

@@ -0,0 +1,119 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
DeBERTa-v2
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The DeBERTa model was proposed in `DeBERTa: Decoding-enhanced BERT with Disentangled Attention
<https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen It is based on Google's
BERT model released in 2018 and Facebook's RoBERTa model released in 2019.
It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in
RoBERTa.
The abstract from the paper is the following:
*Recent progress in pre-trained neural language models has significantly improved the performance of many natural
language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with
disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the
disentangled attention mechanism, where each word is represented using two vectors that encode its content and
position, respectively, and the attention weights among words are computed using disentangled matrices on their
contents and relative positions. Second, an enhanced mask decoder is used to replace the output softmax layer to
predict the masked tokens for model pretraining. We show that these two techniques significantly improve the efficiency
of model pretraining and performance of downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of
the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9%
(90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). The DeBERTa code and
pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.*
The following information is visible directly on the [original implementation
repository](https://github.com/microsoft/DeBERTa). DeBERTa v2 is the second version of the DeBERTa model. It includes
the 1.5B model used for the SuperGLUE single-model submission and achieving 89.9, versus human baseline 89.8. You can
find more details about this submission in the authors'
[blog](https://www.microsoft.com/en-us/research/blog/microsoft-deberta-surpasses-human-performance-on-the-superglue-benchmark/)
New in v2:
- **Vocabulary** In v2 the tokenizer is changed to use a new vocabulary of size 128K built from the training data.
Instead of a GPT2-based tokenizer, the tokenizer is now
[sentencepiece-based](https://github.com/google/sentencepiece) tokenizer.
- **nGiE(nGram Induced Input Encoding)** The DeBERTa-v2 model uses an additional convolution layer aside with the first
transformer layer to better learn the local dependency of input tokens.
- **Sharing position projection matrix with content projection matrix in attention layer** Based on previous
experiments, this can save parameters without affecting the performance.
- **Apply bucket to encode relative postions** The DeBERTa-v2 model uses log bucket to encode relative positions
similar to T5.
- **900M model & 1.5B model** Two additional model sizes are available: 900M and 1.5B, which significantly improves the
performance of downstream tasks.
This model was contributed by `DeBERTa <https://huggingface.co/DeBERTa>`__. The original code can be found `here
<https://github.com/microsoft/DeBERTa>`__.
DebertaV2Config
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaV2Config
:members:
DebertaV2Tokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaV2Tokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
DebertaV2Model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaV2Model
:members: forward
DebertaV2PreTrainedModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaV2PreTrainedModel
:members: forward
DebertaV2ForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaV2ForMaskedLM
:members: forward
DebertaV2ForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaV2ForSequenceClassification
:members: forward
DebertaV2ForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaV2ForTokenClassification
:members: forward
DebertaV2ForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaV2ForQuestionAnswering
:members: forward

View File

@@ -0,0 +1,111 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
DeiT
-----------------------------------------------------------------------------------------------------------------------
.. note::
This is a recently introduced model so the API hasn't been tested extensively. There may be some bugs or slight
breaking changes to fix it in the future. If you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__.
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The DeiT model was proposed in `Training data-efficient image transformers & distillation through attention
<https://arxiv.org/abs/2012.12877>`__ by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre
Sablayrolles, Hervé Jégou. The `Vision Transformer (ViT) <https://huggingface.co/transformers/model_doc/vit.html>`__
introduced in `Dosovitskiy et al., 2020 <https://arxiv.org/abs/2010.11929>`__ has shown that one can match or even
outperform existing convolutional neural networks using a Transformer encoder (BERT-like). However, the ViT models
introduced in that paper required training on expensive infrastructure for multiple weeks, using external data. DeiT
(data-efficient image transformers) are more efficiently trained transformers for image classification, requiring far
less data and far less computing resources compared to the original ViT models.
The abstract from the paper is the following:
*Recently, neural networks purely based on attention were shown to address image understanding tasks such as image
classification. However, these visual transformers are pre-trained with hundreds of millions of images using an
expensive infrastructure, thereby limiting their adoption. In this work, we produce a competitive convolution-free
transformer by training on Imagenet only. We train them on a single computer in less than 3 days. Our reference vision
transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop evaluation) on ImageNet with no external
data. More importantly, we introduce a teacher-student strategy specific to transformers. It relies on a distillation
token ensuring that the student learns from the teacher through attention. We show the interest of this token-based
distillation, especially when using a convnet as a teacher. This leads us to report results competitive with convnets
for both Imagenet (where we obtain up to 85.2% accuracy) and when transferring to other tasks. We share our code and
models.*
Tips:
- Compared to ViT, DeiT models use a so-called distillation token to effectively learn from a teacher (which, in the
DeiT paper, is a ResNet like-model). The distillation token is learned through backpropagation, by interacting with
the class ([CLS]) and patch tokens through the self-attention layers.
- There are 2 ways to fine-tune distilled models, either (1) in a classic way, by only placing a prediction head on top
of the final hidden state of the class token and not using the distillation signal, or (2) by placing both a
prediction head on top of the class token and on top of the distillation token. In that case, the [CLS] prediction
head is trained using regular cross-entropy between the prediction of the head and the ground-truth label, while the
distillation prediction head is trained using hard distillation (cross-entropy between the prediction of the
distillation head and the label predicted by the teacher). At inference time, one takes the average prediction
between both heads as final prediction. (2) is also called "fine-tuning with distillation", because one relies on a
teacher that has already been fine-tuned on the downstream dataset. In terms of models, (1) corresponds to
:class:`~transformers.DeiTForImageClassification` and (2) corresponds to
:class:`~transformers.DeiTForImageClassificationWithTeacher`.
- Note that the authors also did try soft distillation for (2) (in which case the distillation prediction head is
trained using KL divergence to match the softmax output of the teacher), but hard distillation gave the best results.
- All released checkpoints were pre-trained and fine-tuned on ImageNet-1k only. No external data was used. This is in
contrast with the original ViT model, which used external data like the JFT-300M dataset/Imagenet-21k for
pre-training.
- The authors of DeiT also released more efficiently trained ViT models, which you can directly plug into
:class:`~transformers.ViTModel` or :class:`~transformers.ViTForImageClassification`. Techniques like data
augmentation, optimization, and regularization were used in order to simulate training on a much larger dataset
(while only using ImageNet-1k for pre-training). There are 4 variants available (in 3 different sizes):
`facebook/deit-tiny-patch16-224`, `facebook/deit-small-patch16-224`, `facebook/deit-base-patch16-224` and
`facebook/deit-base-patch16-384`. Note that one should use :class:`~transformers.DeiTFeatureExtractor` in order to
prepare images for the model.
This model was contributed by `nielsr <https://huggingface.co/nielsr>`__.
DeiTConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DeiTConfig
:members:
DeiTFeatureExtractor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DeiTFeatureExtractor
:members: __call__
DeiTModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DeiTModel
:members: forward
DeiTForImageClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DeiTForImageClassification
:members: forward
DeiTForImageClassificationWithTeacher
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DeiTForImageClassificationWithTeacher
:members: forward

View File

@@ -0,0 +1,207 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
DETR
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The DETR model was proposed in `End-to-End Object Detection with Transformers <https://arxiv.org/abs/2005.12872>`__ by
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov and Sergey Zagoruyko. DETR
consists of a convolutional backbone followed by an encoder-decoder Transformer which can be trained end-to-end for
object detection. It greatly simplifies a lot of the complexity of models like Faster-R-CNN and Mask-R-CNN, which use
things like region proposals, non-maximum suppression procedure and anchor generation. Moreover, DETR can also be
naturally extended to perform panoptic segmentation, by simply adding a mask head on top of the decoder outputs.
The abstract from the paper is the following:
*We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the
detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression
procedure or anchor generation that explicitly encode our prior knowledge about the task. The main ingredients of the
new framework, called DEtection TRansformer or DETR, are a set-based global loss that forces unique predictions via
bipartite matching, and a transformer encoder-decoder architecture. Given a fixed small set of learned object queries,
DETR reasons about the relations of the objects and the global image context to directly output the final set of
predictions in parallel. The new model is conceptually simple and does not require a specialized library, unlike many
other modern detectors. DETR demonstrates accuracy and run-time performance on par with the well-established and
highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily
generalized to produce panoptic segmentation in a unified manner. We show that it significantly outperforms competitive
baselines.*
This model was contributed by `nielsr <https://huggingface.co/nielsr>`__. The original code can be found `here
<https://github.com/facebookresearch/detr>`__.
The quickest way to get started with DETR is by checking the `example notebooks
<https://github.com/NielsRogge/Transformers-Tutorials/tree/master/DETR>`__ (which showcase both inference and
fine-tuning on custom data).
Here's a TLDR explaining how :class:`~transformers.DetrForObjectDetection` works:
First, an image is sent through a pre-trained convolutional backbone (in the paper, the authors use
ResNet-50/ResNet-101). Let's assume we also add a batch dimension. This means that the input to the backbone is a
tensor of shape :obj:`(batch_size, 3, height, width)`, assuming the image has 3 color channels (RGB). The CNN backbone
outputs a new lower-resolution feature map, typically of shape :obj:`(batch_size, 2048, height/32, width/32)`. This is
then projected to match the hidden dimension of the Transformer of DETR, which is :obj:`256` by default, using a
:obj:`nn.Conv2D` layer. So now, we have a tensor of shape :obj:`(batch_size, 256, height/32, width/32).` Next, the
feature map is flattened and transposed to obtain a tensor of shape :obj:`(batch_size, seq_len, d_model)` =
:obj:`(batch_size, width/32*height/32, 256)`. So a difference with NLP models is that the sequence length is actually
longer than usual, but with a smaller :obj:`d_model` (which in NLP is typically 768 or higher).
Next, this is sent through the encoder, outputting :obj:`encoder_hidden_states` of the same shape (you can consider
these as image features). Next, so-called **object queries** are sent through the decoder. This is a tensor of shape
:obj:`(batch_size, num_queries, d_model)`, with :obj:`num_queries` typically set to 100 and initialized with zeros.
These input embeddings are learnt positional encodings that the authors refer to as object queries, and similarly to
the encoder, they are added to the input of each attention layer. Each object query will look for a particular object
in the image. The decoder updates these embeddings through multiple self-attention and encoder-decoder attention layers
to output :obj:`decoder_hidden_states` of the same shape: :obj:`(batch_size, num_queries, d_model)`. Next, two heads
are added on top for object detection: a linear layer for classifying each object query into one of the objects or "no
object", and a MLP to predict bounding boxes for each query.
The model is trained using a **bipartite matching loss**: so what we actually do is compare the predicted classes +
bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N
(so if an image only contains 4 objects, 96 annotations will just have a "no object" as class and "no bounding box" as
bounding box). The `Hungarian matching algorithm <https://en.wikipedia.org/wiki/Hungarian_algorithm>`__ is used to find
an optimal one-to-one mapping of each of the N queries to each of the N annotations. Next, standard cross-entropy (for
the classes) and a linear combination of the L1 and `generalized IoU loss <https://giou.stanford.edu/>`__ (for the
bounding boxes) are used to optimize the parameters of the model.
DETR can be naturally extended to perform panoptic segmentation (which unifies semantic segmentation and instance
segmentation). :class:`~transformers.DetrForSegmentation` adds a segmentation mask head on top of
:class:`~transformers.DetrForObjectDetection`. The mask head can be trained either jointly, or in a two steps process,
where one first trains a :class:`~transformers.DetrForObjectDetection` model to detect bounding boxes around both
"things" (instances) and "stuff" (background things like trees, roads, sky), then freeze all the weights and train only
the mask head for 25 epochs. Experimentally, these two approaches give similar results. Note that predicting boxes is
required for the training to be possible, since the Hungarian matching is computed using distances between boxes.
Tips:
- DETR uses so-called **object queries** to detect objects in an image. The number of queries determines the maximum
number of objects that can be detected in a single image, and is set to 100 by default (see parameter
:obj:`num_queries` of :class:`~transformers.DetrConfig`). Note that it's good to have some slack (in COCO, the
authors used 100, while the maximum number of objects in a COCO image is ~70).
- The decoder of DETR updates the query embeddings in parallel. This is different from language models like GPT-2,
which use autoregressive decoding instead of parallel. Hence, no causal attention mask is used.
- DETR adds position embeddings to the hidden states at each self-attention and cross-attention layer before projecting
to queries and keys. For the position embeddings of the image, one can choose between fixed sinusoidal or learned
absolute position embeddings. By default, the parameter :obj:`position_embedding_type` of
:class:`~transformers.DetrConfig` is set to :obj:`"sine"`.
- During training, the authors of DETR did find it helpful to use auxiliary losses in the decoder, especially to help
the model output the correct number of objects of each class. If you set the parameter :obj:`auxiliary_loss` of
:class:`~transformers.DetrConfig` to :obj:`True`, then prediction feedforward neural networks and Hungarian losses
are added after each decoder layer (with the FFNs sharing parameters).
- If you want to train the model in a distributed environment across multiple nodes, then one should update the
`num_boxes` variable in the `DetrLoss` class of `modeling_detr.py`. When training on multiple nodes, this should be
set to the average number of target boxes across all nodes, as can be seen in the original implementation `here
<https://github.com/facebookresearch/detr/blob/a54b77800eb8e64e3ad0d8237789fcbf2f8350c5/models/detr.py#L227-L232>`__.
- :class:`~transformers.DetrForObjectDetection` and :class:`~transformers.DetrForSegmentation` can be initialized with
any convolutional backbone available in the `timm library <https://github.com/rwightman/pytorch-image-models>`__.
Initializing with a MobileNet backbone for example can be done by setting the :obj:`backbone` attribute of
:class:`~transformers.DetrConfig` to :obj:`"tf_mobilenetv3_small_075"`, and then initializing the model with that
config.
- DETR resizes the input images such that the shortest side is at least a certain amount of pixels while the longest is
at most 1333 pixels. At training time, scale augmentation is used such that the shortest side is randomly set to at
least 480 and at most 800 pixels. At inference time, the shortest side is set to 800. One can use
:class:`~transformers.DetrFeatureExtractor` to prepare images (and optional annotations in COCO format) for the
model. Due to this resizing, images in a batch can have different sizes. DETR solves this by padding images up to the
largest size in a batch, and by creating a pixel mask that indicates which pixels are real/which are padding.
Alternatively, one can also define a custom :obj:`collate_fn` in order to batch images together, using
:meth:`~transformers.DetrFeatureExtractor.pad_and_create_pixel_mask`.
- The size of the images will determine the amount of memory being used, and will thus determine the :obj:`batch_size`.
It is advised to use a batch size of 2 per GPU. See `this Github thread
<https://github.com/facebookresearch/detr/issues/150>`__ for more info.
As a summary, consider the following table:
+---------------------------------------------+---------------------------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------+
| **Task** | **Object detection** | **Instance segmentation** | **Panoptic segmentation** |
+---------------------------------------------+---------------------------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------+
| **Description** | Predicting bounding boxes and class labels around | Predicting masks around objects (i.e. instances) in an image | Predicting masks around both objects (i.e. instances) as well as |
| | objects in an image | | "stuff" (i.e. background things like trees and roads) in an image |
+---------------------------------------------+---------------------------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------+
| **Model** | :class:`~transformers.DetrForObjectDetection` | :class:`~transformers.DetrForSegmentation` | :class:`~transformers.DetrForSegmentation` |
+---------------------------------------------+---------------------------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------+
| **Example dataset** | COCO detection | COCO detection, | COCO panoptic |
| | | COCO panoptic | |
+---------------------------------------------+---------------------------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------+
| **Format of annotations to provide to** | {image_id: int, | {image_id: int, | {file_name: str, |
| :class:`~transformers.DetrFeatureExtractor` | annotations: List[Dict]}, each Dict being a COCO | annotations: [List[Dict]] } (in case of COCO detection) | image_id: int, |
| | object annotation | | segments_info: List[Dict] } |
| | | or | |
| | | | and masks_path (path to directory containing PNG files of the masks) |
| | | {file_name: str, | |
| | | image_id: int, | |
| | | segments_info: List[Dict]} (in case of COCO panoptic) | |
+---------------------------------------------+---------------------------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------+
| **Postprocessing** (i.e. converting the | :meth:`~transformers.DetrFeatureExtractor.post_process` | :meth:`~transformers.DetrFeatureExtractor.post_process_segmentation` | :meth:`~transformers.DetrFeatureExtractor.post_process_segmentation`, |
| output of the model to COCO API) | | | :meth:`~transformers.DetrFeatureExtractor.post_process_panoptic` |
+---------------------------------------------+---------------------------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------+
| **evaluators** | :obj:`CocoEvaluator` with iou_types = “bbox” | :obj:`CocoEvaluator` with iou_types = “bbox”, “segm” | :obj:`CocoEvaluator` with iou_tupes = “bbox, “segm” |
| | | | |
| | | | :obj:`PanopticEvaluator` |
+---------------------------------------------+---------------------------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------+
In short, one should prepare the data either in COCO detection or COCO panoptic format, then use
:class:`~transformers.DetrFeatureExtractor` to create :obj:`pixel_values`, :obj:`pixel_mask` and optional
:obj:`labels`, which can then be used to train (or fine-tune) a model. For evaluation, one should first convert the
outputs of the model using one of the postprocessing methods of :class:`~transformers.DetrFeatureExtractor`. These can
be be provided to either :obj:`CocoEvaluator` or :obj:`PanopticEvaluator`, which allow you to calculate metrics like
mean Average Precision (mAP) and Panoptic Quality (PQ). The latter objects are implemented in the `original repository
<https://github.com/facebookresearch/detr>`__. See the `example notebooks
<https://github.com/NielsRogge/Transformers-Tutorials/tree/master/DETR>`__ for more info regarding evaluation.
DETR specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.models.detr.modeling_detr.DetrModelOutput
:members:
.. autoclass:: transformers.models.detr.modeling_detr.DetrObjectDetectionOutput
:members:
.. autoclass:: transformers.models.detr.modeling_detr.DetrSegmentationOutput
:members:
DetrConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DetrConfig
:members:
DetrFeatureExtractor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DetrFeatureExtractor
:members: __call__, pad_and_create_pixel_mask, post_process, post_process_segmentation, post_process_panoptic
DetrModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DetrModel
:members: forward
DetrForObjectDetection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DetrForObjectDetection
:members: forward
DetrForSegmentation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DetrForSegmentation
:members: forward

View File

@@ -44,8 +44,8 @@ Tips:
- DistilBERT doesn't have options to select the input positions (:obj:`position_ids` input). This could be added if
necessary though, just let us know if you need this option.
The original code can be found `here
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
This model was contributed by `victorsanh <https://huggingface.co/victorsanh>`__. The original code can be found
:prefix_link:`here <examples/research-projects/distillation>`.
DistilBertConfig

View File

@@ -30,7 +30,8 @@ our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% ab
retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA
benchmarks.*
The original code can be found `here <https://github.com/facebookresearch/DPR>`__.
This model was contributed by `lhoestq <https://huggingface.co/lhoestq>`__. The original code can be found `here
<https://github.com/facebookresearch/DPR>`__.
DPRConfig

View File

@@ -54,7 +54,8 @@ Tips:
:class:`~transformers.ElectraForPreTraining` model (the classification head will be randomly initialized as it
doesn't exist in the generator).
The original code can be found `here <https://github.com/google-research/electra>`__.
This model was contributed by `lysandre <https://huggingface.co/lysandre>`__. The original code can be found `here
<https://github.com/google-research/electra>`__.
ElectraConfig
@@ -184,3 +185,52 @@ TFElectraForQuestionAnswering
.. autoclass:: transformers.TFElectraForQuestionAnswering
:members: call
FlaxElectraModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxElectraModel
:members: __call__
FlaxElectraForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxElectraForPreTraining
:members: __call__
FlaxElectraForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxElectraForMaskedLM
:members: __call__
FlaxElectraForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxElectraForSequenceClassification
:members: __call__
FlaxElectraForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxElectraForMultipleChoice
:members: __call__
FlaxElectraForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxElectraForTokenClassification
:members: __call__
FlaxElectraForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxElectraForQuestionAnswering
:members: __call__

View File

@@ -35,7 +35,8 @@ time they outperform other pretraining approaches. Different versions of FlauBER
protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared to the research
community for further reproducible experiments in French NLP.*
The original code can be found `here <https://github.com/getalp/Flaubert>`__.
This model was contributed by `formiel <https://huggingface.co/formiel>`__. The original code can be found `here
<https://github.com/getalp/Flaubert>`__.
FlaubertConfig

View File

@@ -34,7 +34,8 @@ data, then decode using noisy channel model reranking. Our submissions are ranke
human evaluation campaign. On En->De, our system significantly outperforms other systems as well as human translations.
This system improves upon our WMT'18 submission by 4.5 BLEU points.*
The original code can be found here <https://github.com/pytorch/fairseq/tree/master/examples/wmt19>__.
This model was contributed by `stas <https://huggingface.co/stas>`__. The original code can be found here
<https://github.com/pytorch/fairseq/tree/master/examples/wmt19>__.
Implementation Notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -56,7 +57,7 @@ FSMTTokenizer
.. autoclass:: transformers.FSMTTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, prepare_seq2seq_batch, save_vocabulary
create_token_type_ids_from_sequences, save_vocabulary
FSMTModel

View File

@@ -49,7 +49,8 @@ Tips:
:class:`~transformers.FunnelBaseModel`, :class:`~transformers.FunnelForSequenceClassification` and
:class:`~transformers.FunnelForMultipleChoice`.
The original code can be found `here <https://github.com/laiguokun/Funnel-Transformer>`__.
This model was contributed by `sgugger <https://huggingface.co/sgugger>`__. The original code can be found `here
<https://github.com/laiguokun/Funnel-Transformer>`__.
FunnelConfig

View File

@@ -45,12 +45,13 @@ Tips:
`Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by Hugging Face
showcasing the generative capabilities of several models. GPT is one of them.
The original code can be found `here <https://github.com/openai/finetune-transformer-lm>`__.
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
<https://github.com/openai/finetune-transformer-lm>`__.
Note:
If you want to reproduce the original tokenization process of the `OpenAI GPT` paper, you will need to install ``ftfy``
and ``SpaCy``::
and ``SpaCy``:
.. code-block:: bash

View File

@@ -45,7 +45,8 @@ Tips:
Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
different sizes: small, medium, large, xl and a distilled version of the small checkpoint: `distilgpt-2`.
The original code can be found `here <https://openai.com/blog/better-language-models/>`__.
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
<https://openai.com/blog/better-language-models/>`__.
GPT2Config
@@ -138,3 +139,17 @@ TFSequenceClassifierOutputWithPast
.. autoclass:: transformers.modeling_tf_outputs.TFSequenceClassifierOutputWithPast
:members:
FlaxGPT2Model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxGPT2Model
:members: __call__
FlaxGPT2LMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxGPT2LMHeadModel
:members: __call__

View File

@@ -0,0 +1,73 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
GPT Neo
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The GPTNeo model was released in the `EleutherAI/gpt-neo <https://github.com/EleutherAI/gpt-neo>`__ repository by Sid
Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. It is a GPT2 like causal language model trained on the
`Pile <https://pile.eleuther.ai/>`__ dataset.
The architecture is similar to GPT2 except that GPT Neo uses local attention in every other layer with a window size of
256 tokens.
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__.
Generation
_______________________________________________________________________________________________________________________
The :obj:`generate()` method can be used to generate text using GPT Neo model.
.. code-block::
>>> from transformers import GPTNeoForCausalLM, GPT2Tokenizer
>>> model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B")
>>> tokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B")
>>> prompt = "In a shocking finding, scientists discovered a herd of unicorns living in a remote, " \
... "previously unexplored valley, in the Andes Mountains. Even more surprising to the " \
... "researchers was the fact that the unicorns spoke perfect English."
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids
>>> gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=100,)
>>> gen_text = tokenizer.batch_decode(gen_tokens)[0]
GPTNeoConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPTNeoConfig
:members:
GPTNeoModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPTNeoModel
:members: forward
GPTNeoForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPTNeoForCausalLM
:members: forward
GPTNeoForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPTNeoForSequenceClassification
:members: forward

View File

@@ -40,23 +40,25 @@ Examples of use:
.. code-block::
from transformers import HerbertTokenizer, RobertaModel
>>> from transformers import HerbertTokenizer, RobertaModel
tokenizer = HerbertTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
model = RobertaModel.from_pretrained("allegro/herbert-klej-cased-v1")
>>> tokenizer = HerbertTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
>>> model = RobertaModel.from_pretrained("allegro/herbert-klej-cased-v1")
encoded_input = tokenizer.encode("Kto ma lepszą sztukę, ma lepszy rząd to jasne.", return_tensors='pt')
outputs = model(encoded_input)
>>> encoded_input = tokenizer.encode("Kto ma lepszą sztukę, ma lepszy rząd to jasne.", return_tensors='pt')
>>> outputs = model(encoded_input)
# HerBERT can also be loaded using AutoTokenizer and AutoModel:
import torch
from transformers import AutoModel, AutoTokenizer
>>> # HerBERT can also be loaded using AutoTokenizer and AutoModel:
>>> import torch
>>> from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
model = AutoModel.from_pretrained("allegro/herbert-klej-cased-v1")
>>> tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
>>> model = AutoModel.from_pretrained("allegro/herbert-klej-cased-v1")
The original code can be found `here <https://github.com/allegro/HerBERT>`__.
This model was contributed by `rmroczkowski <https://huggingface.co/rmroczkowski>`__. The original code can be found
`here <https://github.com/allegro/HerBERT>`__.
HerbertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@@ -0,0 +1,65 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Hubert
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Hubert was proposed in `HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
<https://arxiv.org/abs/2106.07447>`__ by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan
Salakhutdinov, Abdelrahman Mohamed.
The abstract from the paper is the following:
*Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are
multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training
phase, and (3) sound units have variable lengths with no explicit segmentation. To deal with these three problems, we
propose the Hidden-Unit BERT (HuBERT) approach for self-supervised speech representation learning, which utilizes an
offline clustering step to provide aligned target labels for a BERT-like prediction loss. A key ingredient of our
approach is applying the prediction loss over the masked regions only, which forces the model to learn a combined
acoustic and language model over the continuous inputs. HuBERT relies primarily on the consistency of the unsupervised
clustering step rather than the intrinsic quality of the assigned cluster labels. Starting with a simple k-means
teacher of 100 clusters, and using two iterations of clustering, the HuBERT model either matches or improves upon the
state-of-the-art wav2vec 2.0 performance on the Librispeech (960h) and Libri-light (60,000h) benchmarks with 10min, 1h,
10h, 100h, and 960h fine-tuning subsets. Using a 1B parameter model, HuBERT shows up to 19% and 13% relative WER
reduction on the more challenging dev-other and test-other evaluation subsets.*
Tips:
- Hubert is a speech model that accepts a float array corresponding to the raw waveform of the speech signal.
- Hubert model was fine-tuned using connectionist temporal classification (CTC) so the model output has to be decoded
using :class:`~transformers.Wav2Vec2CTCTokenizer`.
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__.
HubertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.HubertConfig
:members:
HubertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.HubertModel
:members: forward
HubertForCTC
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.HubertForCTC
:members: forward

View File

@@ -0,0 +1,89 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
I-BERT
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The I-BERT model was proposed in `I-BERT: Integer-only BERT Quantization <https://arxiv.org/abs/2101.01321>`__ by
Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney and Kurt Keutzer. It's a quantized version of RoBERTa running
inference up to four times faster.
The abstract from the paper is the following:
*Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language
Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive for
efficient inference at the edge, and even at the data center. While quantization can be a viable solution for this,
previous work on quantizing Transformer based models use floating-point arithmetic during inference, which cannot
efficiently utilize integer-only logical units such as the recent Turing Tensor Cores, or traditional integer-only ARM
processors. In this work, we propose I-BERT, a novel quantization scheme for Transformer based models that quantizes
the entire inference with integer-only arithmetic. Based on lightweight integer-only approximation methods for
nonlinear operations, e.g., GELU, Softmax, and Layer Normalization, I-BERT performs an end-to-end integer-only BERT
inference without any floating point calculation. We evaluate our approach on GLUE downstream tasks using
RoBERTa-Base/Large. We show that for both cases, I-BERT achieves similar (and slightly higher) accuracy as compared to
the full-precision baseline. Furthermore, our preliminary implementation of I-BERT shows a speedup of 2.4 - 4.0x for
INT8 inference on a T4 GPU system as compared to FP32 inference. The framework has been developed in PyTorch and has
been open-sourced.*
This model was contributed by `kssteven <https://huggingface.co/kssteven>`__. The original code can be found `here
<https://github.com/kssteven418/I-BERT>`__.
IBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.IBertConfig
:members:
IBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.IBertModel
:members: forward
IBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.IBertForMaskedLM
:members: forward
IBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.IBertForSequenceClassification
:members: forward
IBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.IBertForMultipleChoice
:members: forward
IBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.IBertForTokenClassification
:members: forward
IBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.IBertForQuestionAnswering
:members: forward

View File

@@ -56,31 +56,32 @@ Tips:
.. code-block::
def normalize_bbox(bbox, width, height):
return [
int(1000 * (bbox[0] / width)),
int(1000 * (bbox[1] / height)),
int(1000 * (bbox[2] / width)),
int(1000 * (bbox[3] / height)),
]
def normalize_bbox(bbox, width, height):
return [
int(1000 * (bbox[0] / width)),
int(1000 * (bbox[1] / height)),
int(1000 * (bbox[2] / width)),
int(1000 * (bbox[3] / height)),
]
Here, :obj:`width` and :obj:`height` correspond to the width and height of the original document in which the token
occurs. Those can be obtained using the Python Image Library (PIL) library for example, as follows:
.. code-block::
from PIL import Image
from PIL import Image
image = Image.open("name_of_your_document - can be a png file, pdf, etc.")
image = Image.open("name_of_your_document - can be a png file, pdf, etc.")
width, height = image.size
width, height = image.size
- For a demo which shows how to fine-tune :class:`LayoutLMForTokenClassification` on the `FUNSD dataset
<https://guillaumejaume.github.io/FUNSD/>`__ (a collection of annotated forms), see `this notebook
<https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb>`__.
It includes an inference part, which shows how to use Google's Tesseract on a new document.
The original code can be found `here <https://github.com/microsoft/unilm/tree/master/layoutlm>`_.
This model was contributed by `liminghao1630 <https://huggingface.co/liminghao1630>`__. The original code can be found
`here <https://github.com/microsoft/unilm/tree/master/layoutlm>`_.
LayoutLMConfig
@@ -130,3 +131,31 @@ LayoutLMForTokenClassification
.. autoclass:: transformers.LayoutLMForTokenClassification
:members:
TFLayoutLMModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLayoutLMModel
:members:
TFLayoutLMForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLayoutLMForMaskedLM
:members:
TFLayoutLMForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLayoutLMForSequenceClassification
:members:
TFLayoutLMForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLayoutLMForTokenClassification
:members:

View File

@@ -53,6 +53,8 @@ Tips:
- A notebook showing how to fine-tune LED, can be accessed `here
<https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing>`__.
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__.
LEDConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -73,8 +75,7 @@ LEDTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LEDTokenizerFast
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
:members:
LED specific outputs

View File

@@ -40,7 +40,8 @@ Tips:
token belongs to which segment. Just separate your segments with the separation token :obj:`tokenizer.sep_token` (or
:obj:`</s>`).
The Authors' code can be found `here <https://github.com/allenai/longformer>`__.
This model was contributed by `beltagy <https://huggingface.co/beltagy>`__. The Authors' code can be found `here
<https://github.com/allenai/longformer>`__.
Longformer Self Attention
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@@ -0,0 +1,159 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
LUKE
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The LUKE model was proposed in `LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention
<https://arxiv.org/abs/2010.01057>`_ by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda and Yuji Matsumoto.
It is based on RoBERTa and adds entity embeddings as well as an entity-aware self-attention mechanism, which helps
improve performance on various downstream tasks involving reasoning about entities such as named entity recognition,
extractive and cloze-style question answering, entity typing, and relation classification.
The abstract from the paper is the following:
*Entity representations are useful in natural language tasks involving entities. In this paper, we propose new
pretrained contextualized representations of words and entities based on the bidirectional transformer. The proposed
model treats words and entities in a given text as independent tokens, and outputs contextualized representations of
them. Our model is trained using a new pretraining task based on the masked language model of BERT. The task involves
predicting randomly masked words and entities in a large entity-annotated corpus retrieved from Wikipedia. We also
propose an entity-aware self-attention mechanism that is an extension of the self-attention mechanism of the
transformer, and considers the types of tokens (words or entities) when computing attention scores. The proposed model
achieves impressive empirical performance on a wide range of entity-related tasks. In particular, it obtains
state-of-the-art results on five well-known datasets: Open Entity (entity typing), TACRED (relation classification),
CoNLL-2003 (named entity recognition), ReCoRD (cloze-style question answering), and SQuAD 1.1 (extractive question
answering).*
Tips:
- This implementation is the same as :class:`~transformers.RobertaModel` with the addition of entity embeddings as well
as an entity-aware self-attention mechanism, which improves performance on tasks involving reasoning about entities.
- LUKE treats entities as input tokens; therefore, it takes :obj:`entity_ids`, :obj:`entity_attention_mask`,
:obj:`entity_token_type_ids` and :obj:`entity_position_ids` as extra input. You can obtain those using
:class:`~transformers.LukeTokenizer`.
- :class:`~transformers.LukeTokenizer` takes :obj:`entities` and :obj:`entity_spans` (character-based start and end
positions of the entities in the input text) as extra input. :obj:`entities` typically consist of [MASK] entities or
Wikipedia entities. The brief description when inputting these entities are as follows:
- *Inputting [MASK] entities to compute entity representations*: The [MASK] entity is used to mask entities to be
predicted during pretraining. When LUKE receives the [MASK] entity, it tries to predict the original entity by
gathering the information about the entity from the input text. Therefore, the [MASK] entity can be used to address
downstream tasks requiring the information of entities in text such as entity typing, relation classification, and
named entity recognition.
- *Inputting Wikipedia entities to compute knowledge-enhanced token representations*: LUKE learns rich information
(or knowledge) about Wikipedia entities during pretraining and stores the information in its entity embedding. By
using Wikipedia entities as input tokens, LUKE outputs token representations enriched by the information stored in
the embeddings of these entities. This is particularly effective for tasks requiring real-world knowledge, such as
question answering.
- There are three head models for the former use case:
- :class:`~transformers.LukeForEntityClassification`, for tasks to classify a single entity in an input text such as
entity typing, e.g. the `Open Entity dataset <https://www.cs.utexas.edu/~eunsol/html_pages/open_entity.html>`__.
This model places a linear head on top of the output entity representation.
- :class:`~transformers.LukeForEntityPairClassification`, for tasks to classify the relationship between two entities
such as relation classification, e.g. the `TACRED dataset <https://nlp.stanford.edu/projects/tacred/>`__. This
model places a linear head on top of the concatenated output representation of the pair of given entities.
- :class:`~transformers.LukeForEntitySpanClassification`, for tasks to classify the sequence of entity spans, such as
named entity recognition (NER). This model places a linear head on top of the output entity representations. You
can address NER using this model by inputting all possible entity spans in the text to the model.
:class:`~transformers.LukeTokenizer` has a ``task`` argument, which enables you to easily create an input to these
head models by specifying ``task="entity_classification"``, ``task="entity_pair_classification"``, or
``task="entity_span_classification"``. Please refer to the example code of each head models.
There are also 3 notebooks available, which showcase how you can reproduce the results as reported in the paper with
the HuggingFace implementation of LUKE. They can be found `here
<https://github.com/studio-ousia/luke/tree/master/notebooks>`__.
Example:
.. code-block::
>>> from transformers import LukeTokenizer, LukeModel, LukeForEntityPairClassification
>>> model = LukeModel.from_pretrained("studio-ousia/luke-base")
>>> tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-base")
# Example 1: Computing the contextualized entity representation corresponding to the entity mention "Beyoncé"
>>> text = "Beyoncé lives in Los Angeles."
>>> entity_spans = [(0, 7)] # character-based entity span corresponding to "Beyoncé"
>>> inputs = tokenizer(text, entity_spans=entity_spans, add_prefix_space=True, return_tensors="pt")
>>> outputs = model(**inputs)
>>> word_last_hidden_state = outputs.last_hidden_state
>>> entity_last_hidden_state = outputs.entity_last_hidden_state
# Example 2: Inputting Wikipedia entities to obtain enriched contextualized representations
>>> entities = ["Beyoncé", "Los Angeles"] # Wikipedia entity titles corresponding to the entity mentions "Beyoncé" and "Los Angeles"
>>> entity_spans = [(0, 7), (17, 28)] # character-based entity spans corresponding to "Beyoncé" and "Los Angeles"
>>> inputs = tokenizer(text, entities=entities, entity_spans=entity_spans, add_prefix_space=True, return_tensors="pt")
>>> outputs = model(**inputs)
>>> word_last_hidden_state = outputs.last_hidden_state
>>> entity_last_hidden_state = outputs.entity_last_hidden_state
# Example 3: Classifying the relationship between two entities using LukeForEntityPairClassification head model
>>> model = LukeForEntityPairClassification.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
>>> tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
>>> entity_spans = [(0, 7), (17, 28)] # character-based entity spans corresponding to "Beyoncé" and "Los Angeles"
>>> inputs = tokenizer(text, entity_spans=entity_spans, return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> predicted_class_idx = int(logits[0].argmax())
>>> print("Predicted class:", model.config.id2label[predicted_class_idx])
This model was contributed by `ikuyamada <https://huggingface.co/ikuyamada>`__ and `nielsr
<https://huggingface.co/nielsr>`__. The original code can be found `here <https://github.com/studio-ousia/luke>`__.
LukeConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LukeConfig
:members:
LukeTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LukeTokenizer
:members: __call__, save_vocabulary
LukeModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LukeModel
:members: forward
LukeForEntityClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LukeForEntityClassification
:members: forward
LukeForEntityPairClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LukeForEntityPairClassification
:members: forward
LukeForEntitySpanClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LukeForEntitySpanClassification
:members: forward

View File

@@ -52,7 +52,8 @@ Tips:
contains self-attention for each respective modality and cross-attention, only the cross attention is returned and
both self attention outputs are disregarded.
The original code can be found `here <https://github.com/airsplay/lxmert>`__.
This model was contributed by `eltoto1219 <https://huggingface.co/eltoto1219>`__. The original code can be found `here
<https://github.com/airsplay/lxmert>`__.
LxmertConfig

View File

@@ -0,0 +1,130 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
M2M100
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The M2M100 model was proposed in `Beyond English-Centric Multilingual Machine Translation
<https://arxiv.org/abs/2010.11125>`__ by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky,
Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy
Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
The abstract from the paper is the following:
*Existing work in translation demonstrated the potential of massively multilingual machine translation by training a
single model able to translate between any pair of languages. However, much of this work is English-Centric by training
only on data which was translated from or to English. While this is supported by large sources of training data, it
does not reflect translation needs worldwide. In this work, we create a true Many-to-Many multilingual translation
model that can translate directly between any pair of 100 languages. We build and open source a training dataset that
covers thousands of language directions with supervised data, created through large-scale mining. Then, we explore how
to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters
to create high quality models. Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly
translating between non-English directions while performing competitively to the best single systems of WMT. We
open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.*
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__.
Training and Generation
_______________________________________________________________________________________________________________________
M2M100 is a multilingual encoder-decoder (seq-to-seq) model primarily intended for translation tasks. As the model is
multilingual it expects the sequences in a certain format: A special language id token is used as prefix in both the
source and target text. The source text format is :obj:`[lang_code] X [eos]`, where :obj:`lang_code` is source language
id for source text and target language id for target text, with :obj:`X` being the source or target text.
The :class:`~transformers.M2M100Tokenizer` depends on :obj:`sentencepiece` so be sure to install it before running the
examples. To install :obj:`sentencepiece` run ``pip install sentencepiece``.
- Supervised Training
.. code-block::
from transformers import M2M100Config, M2M100ForConditionalGeneration, M2M100Tokenizer
model = M2M100ForConditionalGeneration.from_pretrained('facebook/m2m100_418M')
tokenizer = M2M100Tokenizer.from_pretrained('facebook/m2m100_418M', src_lang="en", tgt_lang="fr")
src_text = "Life is like a box of chocolates."
tgt_lang = "La vie est comme une boîte de chocolat."
model_inputs = tokenizer(src_text, return_tensors="pt")
with tokenizer.as_target_tokenizer():
labels = tokenizer(tgt_text, return_tensors="pt").input_ids
loss = model(**model_inputs, labels=labels) # forward pass
- Generation
M2M100 uses the :obj:`eos_token_id` as the :obj:`decoder_start_token_id` for generation with the target language id
being forced as the first generated token. To force the target language id as the first generated token, pass the
`forced_bos_token_id` parameter to the `generate` method. The following example shows how to translate between
Hindi to French and Chinese to English using the `facebook/m2m100_418M` checkpoint.
.. code-block::
>>> from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
>>> hi_text = "जीवन एक चॉकलेट बॉक्स की तरह है।"
>>> chinese_text = "生活就像一盒巧克力。"
>>> model = M2M100ForConditionalGeneration.from_pretrained("facebook/m2m100_418M")
>>> tokenizer = M2M100Tokenizer.from_pretrained("facebook/m2m100_418M")
>>> # translate Hindi to French
>>> tokenizer.src_lang = "hi"
>>> encoded_hi = tokenizer(hi_text, return_tensors="pt")
>>> generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.get_lang_id("fr"))
>>> tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
"La vie est comme une boîte de chocolat."
>>> # translate Chinese to English
>>> tokenizer.src_lang = "zh"
>>> encoded_zh = tokenizer(chinese_text, return_tensors="pt")
>>> generated_tokens = model.generate(**encoded_zh, forced_bos_token_id=tokenizer.get_lang_id("en"))
>>> tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
"Life is like a box of chocolate."
M2M100Config
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.M2M100Config
:members:
M2M100Tokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.M2M100Tokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
M2M100Model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.M2M100Model
:members: forward
M2M100ForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.M2M100ForConditionalGeneration
:members: forward

View File

@@ -37,6 +37,7 @@ Implementation Notes
- the model starts generating with :obj:`pad_token_id` (which has 0 as a token_embedding) as the prefix (Bart uses
:obj:`<s/>`),
- Code to bulk convert models can be found in ``convert_marian_to_pytorch.py``.
- This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__.
Naming
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -76,27 +77,29 @@ require 3 character language codes:
.. code-block:: python
from transformers import MarianMTModel, MarianTokenizer
src_text = [
'>>fra<< this is a sentence in english that we want to translate to french',
'>>por<< This should go to portuguese',
'>>esp<< And this to Spanish'
]
>>> from transformers import MarianMTModel, MarianTokenizer
>>> src_text = [
... '>>fra<< this is a sentence in english that we want to translate to french',
... '>>por<< This should go to portuguese',
... '>>esp<< And this to Spanish'
>>> ]
model_name = 'Helsinki-NLP/opus-mt-en-roa'
tokenizer = MarianTokenizer.from_pretrained(model_name)
print(tokenizer.supported_language_codes)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer.prepare_seq2seq_batch(src_text, return_tensors="pt"))
tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
# ["c'est une phrase en anglais que nous voulons traduire en français",
# 'Isto deve ir para o português.',
# 'Y esto al español']
>>> model_name = 'Helsinki-NLP/opus-mt-en-roa'
>>> tokenizer = MarianTokenizer.from_pretrained(model_name)
>>> print(tokenizer.supported_language_codes)
['>>zlm_Latn<<', '>>mfe<<', '>>hat<<', '>>pap<<', '>>ast<<', '>>cat<<', '>>ind<<', '>>glg<<', '>>wln<<', '>>spa<<', '>>fra<<', '>>ron<<', '>>por<<', '>>ita<<', '>>oci<<', '>>arg<<', '>>min<<']
>>> model = MarianMTModel.from_pretrained(model_name)
>>> translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
>>> [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
["c'est une phrase en anglais que nous voulons traduire en français",
'Isto deve ir para o português.',
'Y esto al español']
Code to see available pretrained models:
Here is the code to see all available pretrained models on the hub:
.. code-block:: python
@@ -147,21 +150,22 @@ Example of translating english to many romance languages, using old-style 2 char
.. code-block::python
from transformers import MarianMTModel, MarianTokenizer
src_text = [
'>>fr<< this is a sentence in english that we want to translate to french',
'>>pt<< This should go to portuguese',
'>>es<< And this to Spanish'
]
>>> from transformers import MarianMTModel, MarianTokenizer
>>> src_text = [
... '>>fr<< this is a sentence in english that we want to translate to french',
... '>>pt<< This should go to portuguese',
... '>>es<< And this to Spanish'
>>> ]
model_name = 'Helsinki-NLP/opus-mt-en-ROMANCE'
tokenizer = MarianTokenizer.from_pretrained(model_name)
print(tokenizer.supported_language_codes)
>>> model_name = 'Helsinki-NLP/opus-mt-en-ROMANCE'
>>> tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer.prepare_seq2seq_batch(src_text, return_tensors="pt"))
tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
# ["c'est une phrase en anglais que nous voulons traduire en français", 'Isto deve ir para o português.', 'Y esto al español']
>>> model = MarianMTModel.from_pretrained(model_name)
>>> translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
>>> tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
["c'est une phrase en anglais que nous voulons traduire en français",
'Isto deve ir para o português.',
'Y esto al español']
@@ -176,7 +180,7 @@ MarianTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MarianTokenizer
:members: prepare_seq2seq_batch
:members: as_target_tokenizer
MarianModel

View File

@@ -10,14 +10,14 @@
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
MBart
MBart and MBart-50
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** If you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
@patrickvonplaten
Overview
Overview of MBart
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The MBart model was presented in `Multilingual Denoising Pre-training for Neural Machine Translation
@@ -29,35 +29,37 @@ corpora in many languages using the BART objective. mBART is one of the first me
sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only
on the encoder, decoder, or reconstructing parts of the text.
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__. The Authors' code can be found `here
<https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__
Examples
Training of MBart
_______________________________________________________________________________________________________________________
- Examples and scripts for fine-tuning mBART and other models for sequence to sequence tasks can be found in
:prefix_link:`examples/seq2seq/ <examples/seq2seq/README.md>`.
- Given the large embeddings table, mBART consumes a large amount of GPU RAM, especially for fine-tuning.
:class:`MarianMTModel` is usually a better choice for bilingual machine translation.
MBart is a multilingual encoder-decoder (sequence-to-sequence) model primarily intended for translation task. As the
model is multilingual it expects the sequences in a different format. A special language id token is added in both the
source and target text. The source text format is :obj:`X [eos, src_lang_code]` where :obj:`X` is the source text. The
target text format is :obj:`[tgt_lang_code] X [eos]`. :obj:`bos` is never used.
Training
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
MBart is a multilingual encoder-decoder (seq-to-seq) model primarily intended for translation task. As the model is
multilingual it expects the sequences in a different format. A special language id token is added in both the source
and target text. The source text format is :obj:`X [eos, src_lang_code]` where :obj:`X` is the source text. The target
text format is :obj:`[tgt_lang_code] X [eos]`. :obj:`bos` is never used.
The :meth:`~transformers.MBartTokenizer.prepare_seq2seq_batch` handles this automatically and should be used to encode
the sequences for sequence-to-sequence fine-tuning.
The regular :meth:`~transformers.MBartTokenizer.__call__` will encode source text format, and it should be wrapped
inside the context manager :meth:`~transformers.MBartTokenizer.as_target_tokenizer` to encode target text format.
- Supervised training
.. code-block::
example_english_phrase = "UN Chief Says There Is No Military Solution in Syria"
expected_translation_romanian = "Şeful ONU declară că nu există o soluţie militară în Siria"
batch = tokenizer.prepare_seq2seq_batch(example_english_phrase, src_lang="en_XX", tgt_lang="ro_RO", tgt_texts=expected_translation_romanian, return_tensors="pt")
model(input_ids=batch['input_ids'], labels=batch['labels']) # forward pass
>>> from transformers import MBartForConditionalGeneration, MBartTokenizer
>>> tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-en-ro")
>>> example_english_phrase = "UN Chief Says There Is No Military Solution in Syria"
>>> expected_translation_romanian = "Şeful ONU declară că nu există o soluţie militară în Siria"
>>> inputs = tokenizer(example_english_phrase, return_tensors="pt", src_lang="en_XX", tgt_lang="ro_RO")
>>> with tokenizer.as_target_tokenizer():
... labels = tokenizer(expected_translation_romanian, return_tensors="pt")
>>> model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-en-ro")
>>> # forward pass
>>> model(**inputs, labels=batch['labels'])
- Generation
@@ -66,14 +68,95 @@ the sequences for sequence-to-sequence fine-tuning.
.. code-block::
from transformers import MBartForConditionalGeneration, MBartTokenizer
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-en-ro")
tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-en-ro")
article = "UN Chief Says There Is No Military Solution in Syria"
batch = tokenizer.prepare_seq2seq_batch(src_texts=[article], src_lang="en_XX", return_tensors="pt")
translated_tokens = model.generate(**batch, decoder_start_token_id=tokenizer.lang_code_to_id["ro_RO"])
translation = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
assert translation == "Şeful ONU declară că nu există o soluţie militară în Siria"
>>> from transformers import MBartForConditionalGeneration, MBartTokenizer
>>> tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-en-ro", src_lang="en_XX")
>>> article = "UN Chief Says There Is No Military Solution in Syria"
>>> inputs = tokenizer(article, return_tensors="pt")
>>> translated_tokens = model.generate(**inputs, decoder_start_token_id=tokenizer.lang_code_to_id["ro_RO"])
>>> tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
"Şeful ONU declară că nu există o soluţie militară în Siria"
Overview of MBart-50
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
MBart-50 was introduced in the `Multilingual Translation with Extensible Multilingual Pretraining and Finetuning
<https://arxiv.org/abs/2008.00401>` paper by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav
Chaudhary, Jiatao Gu, Angela Fan. MBart-50 is created using the original `mbart-large-cc25` checkpoint by extendeding
its embedding layers with randomly initialized vectors for an extra set of 25 language tokens and then pretrained on 50
languages.
According to the abstract
*Multilingual translation models can be created through multilingual finetuning. Instead of finetuning on one
direction, a pretrained model is finetuned on many directions at the same time. It demonstrates that pretrained models
can be extended to incorporate additional languages without loss of performance. Multilingual finetuning improves on
average 1 BLEU over the strongest baselines (being either multilingual from scratch or bilingual finetuning) while
improving 9.3 BLEU on average over bilingual baselines from scratch.*
Training of MBart-50
_______________________________________________________________________________________________________________________
The text format for MBart-50 is slightly different from mBART. For MBart-50 the language id token is used as a prefix
for both source and target text i.e the text format is :obj:`[lang_code] X [eos]`, where :obj:`lang_code` is source
language id for source text and target language id for target text, with :obj:`X` being the source or target text
respectively.
MBart-50 has its own tokenizer :class:`~transformers.MBart50Tokenizer`.
- Supervised training
.. code-block::
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50")
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50", src_lang="en_XX", tgt_lang="ro_RO")
src_text = " UN Chief Says There Is No Military Solution in Syria"
tgt_text = "Şeful ONU declară că nu există o soluţie militară în Siria"
model_inputs = tokenizer(src_text, return_tensors="pt")
with tokenizer.as_target_tokenizer():
labels = tokenizer(tgt_text, return_tensors="pt").input_ids
model(**model_inputs, labels=labels) # forward pass
- Generation
To generate using the mBART-50 multilingual translation models, :obj:`eos_token_id` is used as the
:obj:`decoder_start_token_id` and the target language id is forced as the first generated token. To force the
target language id as the first generated token, pass the `forced_bos_token_id` parameter to the `generate` method.
The following example shows how to translate between Hindi to French and Arabic to English using the
`facebook/mbart-50-large-many-to-many` checkpoint.
.. code-block::
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
article_hi = "संयुक्त राष्ट्र के प्रमुख का कहना है कि सीरिया में कोई सैन्य समाधान नहीं है"
article_ar = "الأمين العام للأمم المتحدة يقول إنه لا يوجد حل عسكري في سوريا."
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
# translate Hindi to French
tokenizer.src_lang = "hi_IN"
encoded_hi = tokenizer(article_hi, return_tensors="pt")
generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.lang_code_to_id["fr_XX"])
tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
# => "Le chef de l 'ONU affirme qu 'il n 'y a pas de solution militaire en Syria."
# translate Arabic to English
tokenizer.src_lang = "ar_AR"
encoded_ar = tokenizer(article_ar, return_tensors="pt")
generated_tokens = model.generate(**encoded_ar, forced_bos_token_id=tokenizer.lang_code_to_id["en_XX"])
tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
# => "The Secretary-General of the United Nations says there is no military solution in Syria."
MBartConfig
@@ -87,7 +170,7 @@ MBartTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MBartTokenizer
:members: build_inputs_with_special_tokens, prepare_seq2seq_batch
:members: as_target_tokenizer, build_inputs_with_special_tokens
MBartTokenizerFast
@@ -97,6 +180,20 @@ MBartTokenizerFast
:members:
MBart50Tokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MBart50Tokenizer
:members:
MBart50TokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MBart50TokenizerFast
:members:
MBartModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@@ -0,0 +1,154 @@
..
Copyright 2021 NVIDIA Corporation and The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
MegatronBERT
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The MegatronBERT model was proposed in `Megatron-LM: Training Multi-Billion Parameter Language Models Using Model
Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley,
Jared Casper and Bryan Catanzaro.
The abstract from the paper is the following:
*Recent work in language modeling demonstrates that training large transformer models advances the state of the art in
Natural Language Processing applications. However, very large models can be quite difficult to train due to memory
constraints. In this work, we present our techniques for training very large transformer models and implement a simple,
efficient intra-layer model parallel approach that enables training transformer models with billions of parameters. Our
approach does not require a new compiler or library changes, is orthogonal and complimentary to pipeline model
parallelism, and can be fully implemented with the insertion of a few communication operations in native PyTorch. We
illustrate this approach by converging transformer based models up to 8.3 billion parameters using 512 GPUs. We sustain
15.1 PetaFLOPs across the entire application with 76% scaling efficiency when compared to a strong single GPU baseline
that sustains 39 TeraFLOPs, which is 30% of peak FLOPs. To demonstrate that large language models can further advance
the state of the art (SOTA), we train an 8.3 billion parameter transformer language model similar to GPT-2 and a 3.9
billion parameter model similar to BERT. We show that careful attention to the placement of layer normalization in
BERT-like models is critical to achieving increased performance as the model size grows. Using the GPT-2 model we
achieve SOTA results on the WikiText103 (10.8 compared to SOTA perplexity of 15.8) and LAMBADA (66.5% compared to SOTA
accuracy of 63.2%) datasets. Our BERT model achieves SOTA results on the RACE dataset (90.9% compared to SOTA accuracy
of 89.4%).*
Tips:
We have provided pretrained `BERT-345M <https://ngc.nvidia.com/catalog/models/nvidia:megatron_bert_345m>`__ checkpoints
for use to evaluate or finetuning downstream tasks.
To access these checkpoints, first `sign up <https://ngc.nvidia.com/signup>`__ for and setup the NVIDIA GPU Cloud (NGC)
Registry CLI. Further documentation for downloading models can be found in the `NGC documentation
<https://docs.nvidia.com/dgx/ngc-registry-cli-user-guide/index.html#topic_6_4_1>`__.
Alternatively, you can directly download the checkpoints using:
BERT-345M-uncased::
.. code-block:: bash
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_bert_345m/versions/v0.1_uncased/zip
-O megatron_bert_345m_v0_1_uncased.zip
BERT-345M-cased::
.. code-block:: bash
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_bert_345m/versions/v0.1_cased/zip -O
megatron_bert_345m_v0_1_cased.zip
Once you have obtained the checkpoints from NVIDIA GPU Cloud (NGC), you have to convert them to a format that will
easily be loaded by Hugging Face Transformers and our port of the BERT code.
The following commands allow you to do the conversion. We assume that the folder ``models/megatron_bert`` contains
``megatron_bert_345m_v0_1_{cased, uncased}.zip`` and that the commands are run from inside that folder::
.. code-block:: bash
python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py megatron_bert_345m_v0_1_uncased.zip
.. code-block:: bash
python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py megatron_bert_345m_v0_1_cased.zip
This model was contributed by `jdemouth <https://huggingface.co/jdemouth>`__. The original code can be found `here
<https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU and multi-node implementation of the
Megatron Language models. In particular, it contains a hybrid model parallel approach using "tensor parallel" and
"pipeline parallel" techniques.
MegatronBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MegatronBertConfig
:members:
MegatronBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MegatronBertModel
:members: forward
MegatronBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MegatronBertForMaskedLM
:members: forward
MegatronBertForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MegatronBertForCausalLM
:members: forward
MegatronBertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MegatronBertForNextSentencePrediction
:members: forward
MegatronBertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MegatronBertForPreTraining
:members: forward
MegatronBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MegatronBertForSequenceClassification
:members: forward
MegatronBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MegatronBertForMultipleChoice
:members: forward
MegatronBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MegatronBertForTokenClassification
:members: forward
MegatronBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MegatronBertForQuestionAnswering
:members: forward

View File

@@ -0,0 +1,71 @@
..
Copyright 2021 NVIDIA Corporation and The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
MegatronGPT2
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The MegatronGPT2 model was proposed in `Megatron-LM: Training Multi-Billion Parameter Language Models Using Model
Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley,
Jared Casper and Bryan Catanzaro.
The abstract from the paper is the following:
*Recent work in language modeling demonstrates that training large transformer models advances the state of the art in
Natural Language Processing applications. However, very large models can be quite difficult to train due to memory
constraints. In this work, we present our techniques for training very large transformer models and implement a simple,
efficient intra-layer model parallel approach that enables training transformer models with billions of parameters. Our
approach does not require a new compiler or library changes, is orthogonal and complimentary to pipeline model
parallelism, and can be fully implemented with the insertion of a few communication operations in native PyTorch. We
illustrate this approach by converging transformer based models up to 8.3 billion parameters using 512 GPUs. We sustain
15.1 PetaFLOPs across the entire application with 76% scaling efficiency when compared to a strong single GPU baseline
that sustains 39 TeraFLOPs, which is 30% of peak FLOPs. To demonstrate that large language models can further advance
the state of the art (SOTA), we train an 8.3 billion parameter transformer language model similar to GPT-2 and a 3.9
billion parameter model similar to BERT. We show that careful attention to the placement of layer normalization in
BERT-like models is critical to achieving increased performance as the model size grows. Using the GPT-2 model we
achieve SOTA results on the WikiText103 (10.8 compared to SOTA perplexity of 15.8) and LAMBADA (66.5% compared to SOTA
accuracy of 63.2%) datasets. Our BERT model achieves SOTA results on the RACE dataset (90.9% compared to SOTA accuracy
of 89.4%).*
Tips:
We have provided pretrained `GPT2-345M <https://ngc.nvidia.com/catalog/models/nvidia:megatron_lm_345m>`__ checkpoints
for use to evaluate or finetuning downstream tasks.
To access these checkpoints, first `sign up <https://ngc.nvidia.com/signup>`__ for and setup the NVIDIA GPU Cloud (NGC)
Registry CLI. Further documentation for downloading models can be found in the `NGC documentation
<https://docs.nvidia.com/dgx/ngc-registry-cli-user-guide/index.html#topic_6_4_1>`__.
Alternatively, you can directly download the checkpoints using::
.. code-block:: bash
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_lm_345m/versions/v0.0/zip -O
megatron_gpt2_345m_v0_0.zip
Once you have obtained the checkpoint from NVIDIA GPU Cloud (NGC), you have to convert it to a format that will easily
be loaded by Hugging Face Transformers GPT2 implementation.
The following command allows you to do the conversion. We assume that the folder ``models/megatron_gpt2`` contains
``megatron_gpt2_345m_v0_0.zip`` and that the command is run from that folder::
.. code-block:: bash
python3 $PATH_TO_TRANSFORMERS/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py megatron_gpt2_345m_v0_0.zip
This model was contributed by `jdemouth <https://huggingface.co/jdemouth>`__. The original code can be found `here
<https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU and multi-node implementation of the
Megatron Language models. In particular, it contains a hybrid model parallel approach using "tensor parallel" and
"pipeline parallel" techniques.

View File

@@ -44,7 +44,8 @@ Tips:
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. Models trained
with a causal language modeling (CLM) objective are better in that regard.
The original code can be found `here <https://github.com/google-research/mobilebert>`__.
This model was contributed by `vshampor <https://huggingface.co/vshampor>`__. The original code can be found `here
<https://github.com/google-research/mobilebert>`__.
MobileBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@@ -28,7 +28,8 @@ multilingual variant of T5 that was pre-trained on a new Common Crawl-based data
the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual
benchmarks. All of the code and model checkpoints*
The original code can be found `here <https://github.com/google-research/multilingual-t5>`__.
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The original code can be
found `here <https://github.com/google-research/multilingual-t5>`__.
MT5Config
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@@ -31,7 +31,8 @@ According to the abstract,
extractive summary.
- Pegasus achieves SOTA summarization performance on all 12 downstream tasks, as measured by ROUGE and human eval.
The Authors' code can be found `here <https://github.com/google-research/pegasus>`__.
This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__. The Authors' code can be found `here
<https://github.com/google-research/pegasus>`__.
Checkpoints
@@ -51,8 +52,9 @@ All the `checkpoints <https://huggingface.co/models?search=pegasus>`__ are fine-
Examples
_______________________________________________________________________________________________________________________
- :prefix_link:`Script <examples/seq2seq/finetune_pegasus_xsum.sh>` to fine-tune pegasus on the XSUM dataset. Data
download instructions at :prefix_link:`examples/seq2seq/ <examples/seq2seq/README.md>`.
- :prefix_link:`Script <examples/research_projects/seq2seq-distillation/finetune_pegasus_xsum.sh>` to fine-tune pegasus
on the XSUM dataset. Data download instructions at :prefix_link:`examples/pytorch/summarization/
<examples/pytorch/summarization/README.md>`.
- FP16 is not supported (help/ideas on this appreciated!).
- The adafactor optimizer is recommended for pegasus fine-tuning.
@@ -78,20 +80,20 @@ Usage Example
.. code-block:: python
from transformers import PegasusForConditionalGeneration, PegasusTokenizer
import torch
src_text = [
""" PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow."""
]
>>> from transformers import PegasusForConditionalGeneration, PegasusTokenizer
>>> import torch
>>> src_text = [
... """ PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow."""
>>> ]
model_name = 'google/pegasus-xsum'
torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)
batch = tokenizer.prepare_seq2seq_batch(src_text, truncation=True, padding='longest', return_tensors="pt").to(torch_device)
translated = model.generate(**batch)
tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
assert tgt_text[0] == "California's largest electricity provider has turned off power to hundreds of thousands of customers."
>>> model_name = 'google/pegasus-xsum'
>>> device = 'cuda' if torch.cuda.is_available() else 'cpu'
>>> tokenizer = PegasusTokenizer.from_pretrained(model_name)
>>> model = PegasusForConditionalGeneration.from_pretrained(model_name).to(device)
>>> batch = tokenizer(src_text, truncation=True, padding='longest', return_tensors="pt").to(device)
>>> translated = model.generate(**batch)
>>> tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
>>> assert tgt_text[0] == "California's largest electricity provider has turned off power to hundreds of thousands of customers."
@@ -107,7 +109,7 @@ PegasusTokenizer
warning: ``add_tokens`` does not work at the moment.
.. autoclass:: transformers.PegasusTokenizer
:members: __call__, prepare_seq2seq_batch
:members:
PegasusTokenizerFast

Some files were not shown because too many files have changed in this diff Show More