Sylvain Gugger
5f80c15ef5
Fix memory regression in Seq2Seq example ( #9713 )
...
* Fix memory regression in Seq2Seq example
* Fix test and properly deal with -100
* Easier condition with device safety
* Patch for MBartTokenzierFast
2021-01-21 12:05:46 -05:00
Julien Plu
a7dabfb3d1
Fix TF s2s models ( #9478 )
...
* Fix Seq2Seq models for serving
* Apply style
* Fix lonfgormer
* Fix mBart/Pegasus/Blenderbot
* Apply style
* Add a main intermediate layer
* Apply style
* Remove import
* Apply tf.function to Longformer
* Fix utils check_copy
* Update S2S template
* Fix BART + Blenderbot
* Fix BlenderbotSmall
* Fix BlenderbotSmall
* Fix BlenderbotSmall
* Fix MBart
* Fix Marian
* Fix Pegasus + template
* Apply style
* Fix common attributes test
* Forgot to fix the LED test
* Apply Patrick's comment on LED Decoder
2021-01-21 17:03:29 +01:00
Nicolas Patry
23e5a36ee6
Changing model default for TableQuestionAnsweringPipeline. ( #9729 )
...
* Changing model default for TableQuestionAnsweringPipeline.
- Discussion: https://discuss.huggingface.co/t/table-question-answering-is-not-an-available-task-under-pipeline/3284/6
* Updating slow tests that were out of sync.
2021-01-21 14:31:51 +01:00
Julien Plu
3f290e6c84
Fix mixed precision in TF models ( #9163 )
...
* Fix Gelu precision
* Fix gelu_fast
* Naming
* Fix usage and apply style
* add TF gelu approximate version
* add TF gelu approximate version
* add TF gelu approximate version
* Apply style
* Fix albert
* Remove the usage of the Activation layer
2021-01-21 07:00:11 -05:00
Suraj Patil
248fa1ae72
fix T5 head mask in model_parallel ( #9726 )
...
* fix head mask in model_parallel
* pass correct head mask
2021-01-21 12:16:14 +01:00
Patrick von Platen
ca422e3d7d
finish ( #9721 )
2021-01-21 05:17:13 -05:00
guillaume-be
fb36c273a2
Allow text generation for ProphetNetForCausalLM ( #9707 )
...
* Moved ProphetNetForCausalLM's parent initialization after config update
* Added unit tests for generation for ProphetNetForCausalLM
2021-01-21 11:13:38 +01:00
Muennighoff
6a346f0358
fix typo ( #9708 )
...
* fix typo
Co-authored-by: Suraj Patil <surajp815@gmail.com >
2021-01-21 13:51:01 +05:30
Stas Bekman
4a20b7c450
[trainer] no --deepspeed and --sharded_ddp together ( #9712 )
...
* no --deepspeed and --sharded_ddp together
* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2021-01-20 16:50:21 -08:00
Sylvain Gugger
3cd91e8162
Fix WAND_DISABLED test ( #9703 )
...
* Fix WAND_DISABLED test
* Remove duplicate import
* Make a test that actually works...
* Fix style
2021-01-20 12:30:24 -05:00
Sylvain Gugger
2a703773aa
Fix style
2021-01-20 12:17:40 -05:00
Stas Bekman
cd5565bed3
fix the backward for deepspeed ( #9705 )
2021-01-20 09:07:07 -08:00
Gunjan Chhablani
538245b0c2
Fix Trainer and Args to mention AdamW, not Adam. ( #9685 )
...
* Fix Trainer and Args to mention AdamW, not Adam.
* Update the docs for Training Arguments.
* Change arguments adamw_* to adam_*
* Fixed links to AdamW in TrainerArguments docs
* Fix line length in Training Args docs.
2021-01-20 11:59:31 -05:00
NielsRogge
d1370d29b1
Add DeBERTa head models ( #9691 )
...
* Add DebertaForMaskedLM, DebertaForTokenClassification, DebertaForQuestionAnswering
* Add docs and fix quality
* Fix Deberta not having pooler
2021-01-20 10:18:50 -05:00
Sylvain Gugger
a7b62fece5
Fix Funnel Transformer conversion script ( #9683 )
2021-01-20 09:50:20 -05:00
acul3
8940c7662d
Add t5 convert to transformers-cli ( #9654 )
...
* Update run_mlm.py
* add t5 model to transformers-cli convert
* update rum_mlm.py same as master
* update converting model docs
* update converting model docs
* Update convert.py
* Trigger notification
* update import sorted
* fix typo t5
2021-01-20 09:34:27 -05:00
Julien Plu
7251a4736d
Fix template ( #9697 )
2021-01-20 09:04:53 -05:00
Julien Plu
14042d560f
New TF embeddings (cleaner and faster) ( #9418 )
...
* Create new embeddings + add to BERT
* Add Albert
* Add DistilBert
* Add Albert + Electra + Funnel
* Add Longformer + Lxmert
* Add last models
* Apply style
* Update the template
* Remove unused imports
* Rename attribute
* Import embeddings in their own model file
* Replace word_embeddings per weight
* fix naming
* Fix Albert
* Fix Albert
* Fix Longformer
* Fix Lxmert Mobilebert and MPNet
* Fix copy
* Fix template
* Update the get weights function
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Update src/transformers/models/electra/modeling_tf_electra.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* address Sylvain's comments
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2021-01-20 12:08:12 +01:00
Julien Plu
12f0d7e8e0
Fix label datatype in TF Trainer ( #9616 )
...
* Fix label datatype
* Apply style
2021-01-20 12:08:00 +01:00
LSinev
a98173cc45
make RepetitionPenaltyLogitsProcessor faster ( #9600 )
2021-01-20 10:23:01 +01:00
Sylvain Gugger
7e662e6a3b
Fix model templates and use less than 119 chars ( #9684 )
...
* Fix model templates and use less than 119 chars
* Missing new line
2021-01-19 17:11:22 -05:00
Daniel Stancl
2ebbbf558c
Add separated decoder_head_mask for T5 Models ( #9634 )
...
* Add decoder_head_mask for PyTorch T5 model
* Add decoder_head_mask args into T5Model and T5ForConditionalGeneration
* Slightly change the order of input args to be in accordance
with the convention from BART-based models introduced within the PR #9569 .
* Make style for modeling_t5.py
* Add decoder_head_mask for TF T5 models
* Separate head_mask and decoder_head_mask args in TF T5 models
* Slightly change the order of input args to follow convention
of BART-based models updated in PR #9569
* Update test_forward_signature tests/test_modeling_tf_common.py
w.r.t. the changed order of input args
* Add FutureWarnings for T5 and TFT5 models
* Add FutureWarnings for T5 and TFT5 models warning a user that
input argument `head_mask` was split into two arguments -
`head_mask` and `decoder_head_mask`
* Add default behaviour - `decoder_head_mask` is set to copy
`head_mask`
* Fix T5 modeling and FutureWarning
* Make proper usage of head_mask and decoder_head_mask
in cross_attention
* Fix conditions for raising FutureWarning
* Reformat FutureWarning in T5 modeling
* Refactor the warning message
2021-01-19 22:50:25 +01:00
Sylvain Gugger
e4c06ed664
New run_seq2seq script ( #9605 )
...
* New run_seq2seq script
* Add tests
* Mark as slow
* Update examples/seq2seq/run_seq2seq.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
* Update src/transformers/data/data_collator.py
Co-authored-by: Suraj Patil <surajp815@gmail.com >
* Update src/transformers/data/data_collator.py
Co-authored-by: Suraj Patil <surajp815@gmail.com >
* Address review comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Suraj Patil <surajp815@gmail.com >
2021-01-19 15:22:17 -05:00
Julien Plu
fa876aee2a
Fix TF Flaubert and XLM ( #9661 )
...
* Fix Flaubert and XLM
* Fix Flaubert and XLM
* Apply style
2021-01-19 18:02:57 +01:00
max yue
11ec74905a
Update integrations.py ( #9652 )
...
File "/share/apps/anaconda3/envs/my_env/lib/python3.7/site-packages/transformers/integrations.py", line 419, in __init__
self._SummaryWriter = SummaryWriter
UnboundLocalError: local variable 'SummaryWriter' referenced before assignment
2021-01-19 11:39:49 -05:00
Yusuke Mori
b020a736c3
Update past_key_values in GPT-2 ( #9596 )
...
* Update past_key_values in gpt2 (#9391 )
* Update generation_utils, and rename some items
* Update modeling_gpt2 to avoid an error in gradient_checkpointing
* Remove 'reorder_cache' from util and add variations to XLNet, TransfoXL, GPT-2
* Change the location of '_reorder_cache' in modeling files
* Add '_reorder_cache' in modeling_ctrl
* Fix a bug of my last commit in CTRL
* Add '_reorder_cache' to GPT2DoubleHeadsModel
* Manage 'use_cache' in config of test_modeling_gpt2
* Clean up the doc string
* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Fix the doc string (GPT-2, CTRL)
* improve gradient_checkpointing_behavior
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
2021-01-19 16:00:15 +01:00
Sylvain Gugger
d302d88b47
Fix GPT conversion script ( #9676 )
2021-01-19 09:55:37 -05:00
Sylvain Gugger
053efc5d2d
Fix imports in conversion scripts ( #9674 )
2021-01-19 09:40:15 -05:00
Patrick von Platen
2390c16fd2
add mbart to automodel for masked lm ( #9673 )
2021-01-19 15:19:11 +01:00
Sergey Mkrtchyan
917dbb15e0
Fix DPRReaderTokenizer's attention_mask ( #9663 )
...
* Fix the attention_mask in DPRReaderTokenizer
* Add an integration test for DPRReader inference
* Run make style
2021-01-19 05:43:11 -05:00
Daniel Stancl
357fb1c5d8
Add head_mask/decoder_head_mask for BART ( #9569 )
...
* Add head_mask/decoder_head_mask for BART
This branch implement head_mask and decoder_head_mask
for BART-based models. Full list below:
- BART
- MBart
- Blenderbot
- BlenderbotSmall
- Marian
- Pegasus
Everything is accompanied with updated testing.
* Fix test_headmasking for BART models
* Fix text_headmasking for BART-like models
which has only 2 layers in each modules.
The condition
```
self.assertNotEqual(attentions[1][..., 0, :, :].flatten().sum().item(), 0.0)
```
is, therefore, invalid for encoder-decoder models considering
the `head_mask`
```
head_mask = torch.ones(
self.model_tester.num_hidden_layers,
self.model_tester.num_attention_heads,
device=torch_device,
)
head_mask[0, 0] = 0
head_mask[-1, :-1] = 0
```
specified in the `test_headmasking` test/function.
* Adjust test_modeling_common.py to reflect T5 input args
* Update tests/test_modeling_common.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* make style
* make fix-copies
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2021-01-18 13:35:22 +01:00
Devrim
65eb5d9ac5
Fix: torch.utils.checkpoint import error. ( #9626 )
2021-01-18 04:33:39 -05:00
Stas Bekman
c60e0e1ee4
deepspeed + grad acumm ( #9622 )
2021-01-15 10:12:26 -08:00
Lysandre Debut
6d3b688b04
Ignore lm_head decoder bias warning ( #9615 )
...
* Ignore lm_head decoder bias warning
* Revert "Ignore lm_head decoder bias warning"
This reverts commit f25177a9da6ca898e351f46c8b1515971de5c670.
* predictions -> lm_head
2021-01-15 09:40:21 -05:00
Julien Plu
8eba1f8ca8
Remove unused token_type_ids in MPNet ( #9564 )
...
* Add warning
* Remove unused import
* Fix missing call
* Fix missing call
* Completely remove token_type_ids
* Apply style
* Remove unused import
* Update src/transformers/models/mpnet/modeling_tf_mpnet.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2021-01-15 08:06:29 -05:00
Patrick von Platen
90ca8d36e9
[TF Led] Fix wrong decoder attention mask behavior ( #9601 )
...
* fix tf led
* remove loop file
2021-01-15 06:40:27 -05:00
Kiyoung Kim
85788bae5c
Revert "Gradient accumulation for TFTrainer ( #9585 )"
...
This reverts commit 3f40070c88 .
2021-01-15 10:47:01 +01:00
Sylvain Gugger
329fe2746a
Upstream (and rename) sortish sampler ( #9574 )
...
* Upstream (and rename) sortish sampler
* Use proper sampler
* Update src/transformers/trainer_pt_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2021-01-14 10:38:14 -05:00
Kiyoung Kim
3f40070c88
Gradient accumulation for TFTrainer ( #9585 )
...
* gradient accumulation for tftrainer
* label naming
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* label naming
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2021-01-14 10:16:39 -05:00
Lysandre Debut
280db79ac1
BatchEncoding.to with device with tests ( #9584 )
2021-01-14 07:57:58 -05:00
Julien Plu
a26536f0c8
Make logs tf compliant ( #9565 )
2021-01-14 04:56:53 -05:00
Julien Plu
14d677ca4a
Compliancy with tf-nightly ( #9570 )
...
* Compliancy with tf-nightly
* Add more version + restore min version check
2021-01-14 04:35:35 -05:00
Sylvain Gugger
5e1bea4f16
Fix Trainer with a parallel model ( #9578 )
...
* Fix Trainer with a parallel model
* More clean up
2021-01-14 03:23:41 -05:00
Lysandre
e63cad7936
v4.3.0.dev0
2021-01-13 16:16:54 +01:00
Lysandre
7d9a9d0c72
Release: v4.2.0
Model templates runner / run_tests_templates (push) Has been cancelled
Release - Conda / build_and_package (push) Has been cancelled
2021-01-13 16:01:51 +01:00
Sylvain Gugger
04dc65e5c6
Fix data parallelism in Trainer ( #9566 )
...
* Fix data parallelism in Trainer
* Update src/transformers/training_args.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2021-01-13 09:54:41 -05:00
LSinev
0c9f01a8e5
Speed up TopKLogitsWarper and TopPLogitsWarper (pytorch) ( #9557 )
...
* make TopKLogitsWarper faster
* make TopPLogitsWarper faster
2021-01-13 07:47:47 -05:00
Lysandre Debut
245cdb469d
Fix barthez tokenizer ( #9562 )
2021-01-13 06:24:10 -05:00
Suraj Patil
69ed36063a
fix BlenderbotSmallTokenizer ( #9538 )
...
* add model_input_names
* fix test
2021-01-13 10:53:43 +05:30
Stas Bekman
2df34f4aba
[trainer] deepspeed integration ( #9211 )
...
* deepspeed integration
* style
* add test
* ds wants to do its own backward
* fp16 assert
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* style
* for clarity extract what args are being passed to deepspeed
* introduce the concept of self.wrapped_model
* s/self.wrapped_model/self.model_wrapped/
* complete transition to self.wrapped_model / self.model
* fix
* doc
* give ds its own init
* add custom overrides, handle bs correctly
* fix test
* clean up model_init logic, fix small bug
* complete fix
* collapse --deepspeed_config into --deepspeed
* style
* start adding doc notes
* style
* implement hf2ds optimizer and scheduler configuration remapping
* oops
* call get_num_training_steps absolutely when needed
* workaround broken auto-formatter
* deepspeed_config arg is no longer needed - fixed in deepspeed master
* use hf's fp16 args in config
* clean
* start on the docs
* rebase cleanup
* finish up --fp16
* clarify the supported stages
* big refactor thanks to discovering deepspeed.init_distributed
* cleanup
* revert fp16 part
* add checkpoint-support
* more init ds into integrations
* extend docs
* cleanup
* unfix docs
* clean up old code
* imports
* move docs
* fix logic
* make it clear which file it's referring to
* document nodes/gpus
* style
* wrong format
* style
* deepspeed handles gradient clipping
* easier to read
* major doc rewrite
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* docs
* switch to AdamW optimizer
* style
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
* clarify doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2021-01-12 19:05:18 -08:00