Commit Graph

6671 Commits

Author SHA1 Message Date
Lysandre
093b88f4e9 Update scatter to use torch 1.8.0 2021-03-05 07:31:51 -05:00
Patrick von Platen
c503a1c15e [ProphetNet] Bart-like Refactor (#10501)
* first step to refactor

* make all fast tests pass

* make all slow tests pass

* save intermediate

* correct cache

* finish PR

* make fp16 work
2021-03-04 23:27:12 +03:00
Sylvain Gugger
6290169eb3 Rework TPU checkpointing in Trainer (#10504)
* Rework TPU checkpointing in Trainer

* Wraps the barrier in a dist test

* Address review comments

* Remove line
2021-03-04 11:46:11 -05:00
Philipp Schmid
805c5200dc Removes overwrites for output_dir (#10521)
* removed overwrites

* remove default value for output_dir

* adjusted typing
2021-03-04 17:12:37 +01:00
Sylvain Gugger
a5bd40b75c Not always consider a local model a checkpoint in run_glue (#10517) 2021-03-04 11:11:39 -05:00
Sylvain Gugger
745ea78dcc Revert "Not always consider a local model a checkpoint in run_glue"
This reverts commit f3660613bc.
2021-03-04 09:45:18 -05:00
Sylvain Gugger
f3660613bc Not always consider a local model a checkpoint in run_glue 2021-03-04 09:44:02 -05:00
Sylvain Gugger
948b730f97 Remove unsupported methods from ModelOutput doc (#10505) 2021-03-03 14:55:18 -05:00
Sylvain Gugger
b70f441b72 Smp grad accum (#10488)
* Fix gradient accumulation for SM Model Parallelism

* Style and divide loss by grad accum steps
2021-03-03 12:13:29 -05:00
felixgwu
d064fb5647 Fix the bug in constructing the all_hidden_states of DeBERTa v2 (#10466)
* fix all_hidden_states

* use output_states instead of next_kv
2021-03-03 12:05:21 -05:00
Stas Bekman
188574ac50 remap MODEL_FOR_QUESTION_ANSWERING_MAPPING classes to names auto-generated file (#10487)
* remap classes to strings

* missing new util

* style

* doc

* move the autogenerated file

* Trigger CI
2021-03-03 08:54:00 -08:00
Sylvain Gugger
801ff969ce Refactor checkpoint name in BERT and MobileBERT (#10424)
* Refactor checkpoint name in BERT and MobileBERT

* Add option to check copies

* Add QuestionAnswering

* Add last models

* Make black happy
2021-03-03 11:21:17 -05:00
Jeff Yang
39f70a4058 feat(docs): navigate with left/right arrow keys (#10481)
* feat(docs): navigate with left/right arrow keys

* fix: add missing comma
2021-03-03 11:17:12 -05:00
Patrick von Platen
2d2ed2cc18 [T5] Fix speed degradation bug t5 (#10496)
* fix speed degradation bug t5

* fix for all models

* fix code quality
2021-03-03 12:42:41 +03:00
WybeKoper
5dc303e281 Fixed minor spelling mistakes (#10489)
Co-authored-by: WybeKoper <WybeKoper@users.noreply.github.com>
2021-03-03 14:17:25 +05:30
Mehrad Moradshahi
1750e62900 Generate can return cross-attention weights too (#10493) 2021-03-03 13:57:02 +05:30
Martin Schmitt
b013842244 Changed num_beams to num_beams // num_beam_groups when initialising PrefixConstrainedLogitsProcessor in _get_logits_processor to fix compatibility issue when constrained decoding is used together with grouped beam search (#10475) 2021-03-02 10:41:54 +03:00
Lysandre Debut
0c2325198f Add I-BERT to README (#10462) 2021-03-01 12:12:31 -05:00
Lysandre Debut
9248e27037 Remove Anthony from the bug reports in Transformers 2021-03-01 10:23:40 -05:00
Suraj Patil
a106bde5a7 [Wav2Vec2FeatureExtractor] smal fixes (#10455)
* smal fixes

* don't check for None
2021-03-01 20:19:52 +05:30
Patrick von Platen
11655fafdd remove feature extraction config (#10457) 2021-03-01 12:30:12 +03:00
Patrick von Platen
0234de8418 Add Fine-Tuning for Wav2Vec2 (#10145)
* add encode labels function to tokenizer

* start adding finetuning

* init dropout

* upload

* correct convert script

* apply changes

* fix second typo

* make first dummy training run

* adapt convert script

* push confg for comparison

* remove conf

* finish training

* adapt data collator

* add research folder

* update according to fairseq feedback

* some minor corrections

* refactor masking indices a bit

* some minor changes

* clean tokenizer

* finish clean-up

* remove previous logic

* update run script

* correct training

* finish changes

* finish model

* correct bug

* fix training a bit more

* add some tests

* finish gradient checkpointing

* finish example

* correct gradient checkpointing

* improve tokenization method

* revert changes in tokenizer

* revert general change

* adapt fine-tuning

* update

* save intermediate test

* Update README.md

* finish finetuning

* delete conversion script

* Update src/transformers/models/wav2vec2/configuration_wav2vec2.py

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* finish wav2vec2 script

* finish wav2vec2 fine-tuning

* finalize test

* correct test

* adapt tests

* finish

* remove test file

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-01 12:13:17 +03:00
Patrick von Platen
3c733f3208 Update ibert.rst (#10445) 2021-02-28 19:03:49 +03:00
Darigov Research
aeba4f95bb Adds terms to Glossary (#10443)
* feat: Adds three definitions to glossary from @cronoik

Needed a definition for transformer which in turn needed 2 more definitions

To do with issue https://github.com/huggingface/transformers/issues/9078

* fix: Adjusts definition of neural network to make it easier to read
2021-02-28 08:27:54 -05:00
Tanmay Garg
256482ac92 Introduce save_strategy training argument (#10286)
* Introduce save_strategy training argument

* deprecate EvaluationStrategy

* collapse EvaluationStrategy and LoggingStrategy into a single
  IntervalStrategy enum

* modify tests to use modified enum
2021-02-27 19:34:22 -05:00
Bhadresh Savani
aca6288ff4 updated logging and saving metrics (#10436)
* updated logging and saving metrics

* space removal
2021-02-27 09:53:44 -08:00
Stas Bekman
f52a15897b [run_seq2seq.py] restore functionality: saving to test_generations.txt (#10428)
This PR restores the original functionality that for some reason was modified.

Fixes: https://github.com/huggingface/transformers/issues/10381

@sgugger
2021-02-27 08:21:50 -08:00
Lysandre Debut
311b7048c5 Fix conda-build (#10431) 2021-02-26 20:20:30 -05:00
Stas Bekman
ee04b69822 [examples] better model example (#10427)
* refactors

* typo
2021-02-26 17:01:01 -08:00
Amog Kamsetty
a85eb616f7 Ray Tune Integration Bug Fixes (#10406)
* fixes

* update resources

* formatting

* remove import

* add log statement

* use fstring

* add period

* Update src/transformers/integrations.py
2021-02-26 19:06:08 -05:00
Kai Fricke
98569d4ba2 Add Ray Tune hyperparameter search integration test (#10414) 2021-02-26 10:18:33 -05:00
Patrick von Platen
d03695f3a2 [LED] Correct Docs (#10419)
* correct docs

* correct tf model docs as well
2021-02-26 17:53:28 +03:00
Mansi Mane
7fc686efb1 Sagemaker Model Parallel tensoboard writing fix (#10403)
* Added tb fix

* Removed local rank condition

* Updated reference to args
2021-02-26 08:04:55 -05:00
Julien Chaumond
83d2d55c94 [ci, flax] non-existing models are unlikely to pass tests (#10409)
😂
2021-02-26 12:35:36 +03:00
Sylvain Gugger
17b6e0d474 Fix run_glue evaluation when model has a label correspondence (#10401) 2021-02-25 15:30:38 -05:00
Sylvain Gugger
26f8b2cb10 Make Barthez tokenizer tests a bit faster (#10399)
* Make Barthez tokenizer tests a bit faster

* Quality
2021-02-25 11:42:25 -05:00
Andrea Bacciu
b040e6efc1 Fix None in add_token_positions - issue #10210 (#10374)
* Fix None in add_token_positions - issue #10210

Fix None in add_token_positions related to the issue #10210

* add_token_positions fix None values in end_positions vector

add_token_positions fix None in end_positions vector as proposed by @joeddav
2021-02-25 09:18:33 -07:00
Sylvain Gugger
9d14be5c20 Add support for ZeRO-2/3 and ZeRO-offload in fairscale (#10354)
* Ass support for ZeRO-2/3 and ZeRO-offload in fairscale

* Quality

* Rework from review comments

* Add doc

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Address review comments

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-02-25 11:07:53 -05:00
Lysandre Debut
88cc26dcd1 Ignore unexpected weights from PT conversion (#10397) 2021-02-25 10:42:27 -05:00
Sehoon Kim
63645b3b11 I-BERT model support (#10153)
* IBertConfig, IBertTokentizer added

* IBert Model names moified

* tokenizer bugfix

* embedding -> QuantEmbedding

* quant utils added

* quant_mode added to configuration

* QuantAct added, Embedding layer + QuantAct addition

* QuantAct added

* unused path removed, QKV quantized

* self attention layer all quantized, except softmax

* temporarl commit

* all liner layers quantized

* quant_utils bugfix

* bugfix: requantization missing

* IntGELU added

* IntSoftmax added

* LayerNorm implemented

* LayerNorm implemented all

* names changed: roberta->ibert

* config not inherit from ROberta

* No support for CausalLM

* static quantization added, quantize_model.py removed

* import modules uncommented

* copyrights fixed

* minor bugfix

* quant_modules, quant_utils merged as one file

* import * fixed

* unused runfile removed

* make style run

* configutration.py docstring fixed

* refactoring: comments removed, function name fixed

* unused dependency removed

* typo fixed

* comments(Copied from), assertion string added

* refactoring: super(..) -> super(), etc.

* refactoring

* refarctoring

* make style

* refactoring

* cuda -> to(x.device)

* weight initialization removed

* QuantLinear set_param removed

* QuantEmbedding set_param removed

* IntLayerNorm set_param removed

* assert string added

* assertion error message fixed

* is_decoder removed

* enc-dec arguments/functions removed

* Converter removed

* quant_modules docstring fixed

* conver_slow_tokenizer rolled back

* quant_utils docstring fixed

* unused aruments e.g. use_cache removed from config

* weight initialization condition fixed

* x_min, x_max initialized with small values to avoid div-zero exceptions

* testing code for ibert

* test emb, linear, gelu, softmax added

* test ln and act added

* style reformatted

* force_dequant added

* error tests overrided

* make style

* Style + Docs

* force dequant tests added

* Fix fast tokenizer in init

* Fix doc

* Remove space

* docstring, IBertConfig, chunk_size

* test_modeling_ibert refactoring

* quant_modules.py refactoring

* e2e integration test added

* tokenizers removed

* IBertConfig added to tokenizer_auto.py

* bugfix

* fix docs & test

* fix style num 2

* final fixes

Co-authored-by: Sehoon Kim <sehoonkim@berkeley.edu>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-02-25 10:06:42 -05:00
Patrick von Platen
cb38ffcc5e [PretrainedFeatureExtractor] + Wav2Vec2FeatureExtractor, Wav2Vec2Processor, Wav2Vec2Tokenizer (#10324)
* push to show

* small improvement

* small improvement

* Update src/transformers/feature_extraction_utils.py

* Update src/transformers/feature_extraction_utils.py

* implement base

* add common tests

* make all tests pass for wav2vec2

* make padding work & add more tests

* finalize feature extractor utils

* add call method to feature extraction

* finalize feature processor

* finish tokenizer

* finish general processor design

* finish tests

* typo

* remove bogus file

* finish docstring

* add docs

* finish docs

* small fix

* correct docs

* save intermediate

* load changes

* apply changes

* apply changes to doc

* change tests

* apply surajs recommend

* final changes

* Apply suggestions from code review

* fix typo

* fix import

* correct docstring
2021-02-25 17:42:46 +03:00
abhishek thakur
9dc7825744 Remove unused variable in example for Q&A (#10392) 2021-02-25 09:18:47 -05:00
mingruimingrui
894db6701e Bugfix: Removal of padding_idx in BartLearnedPositionalEmbedding (#10200)
* Assumption of padding_idx <2 might not stand

* Use offset instead of 2

* Fix with black

* Change behavior to warning instead for backward compatibility.

* Fix with black

* Remove warning

* Make padding_idx non-required

* padding_idx fix for blenderbot

* padding_idx fix for blenderbot_small

* padding_idx fix for led

* padding_idx fix for mbart

* Remove extra whitespaces

* padding_idx fix for template

* Fix padding_idx passed to nn.Embedding mistake

* Fixed padding_idx passed to positional embedding in template

* Remove padding_idx from pytorch learned positional embeddings

* Remove accidentally added quotes

* Remove padding_idx from tf learned positional embeddings

* Remove zeroing of weights in __init__

Co-authored-by: Wang Ming Rui <mingrui.wang@C02CJTUYMD6M.local>
2021-02-25 14:33:13 +03:00
Lysandre Debut
55fe80d084 Only run model templates tests once (#10388) 2021-02-24 19:48:00 -05:00
Lysandre Debut
22bd047e91 Run GA on every push even on forks (#10383) 2021-02-24 19:23:39 -05:00
Lysandre
3591844306 v4.3.3 docs 2021-02-24 15:19:01 -05:00
Stas Bekman
bdbb2c756b [trainer] move secondary methods into a separate file (#10363)
* move secondary methods into a separate file

* cleanup

* style
2021-02-24 08:32:52 -08:00
Poedator
5f2a3d721c fix deprecated ref to tokenizer.max_len (#10220)
This is to fix deprecated reference to `tokenizer.max_len` with `tokenizer.model_max_length` - similar to [issue 8739](https://github.com/huggingface/transformers/issues/8739) and [PR 8604](https://github.com/huggingface/transformers/pull/8604). 
Example [here](https://colab.research.google.com/gist/poedator/f8776349e5c625ce287fc6fcd312fa1e/tokenizer-max_len-error-in-transformers_glue.ipynb). The error happens when `glue_convert_examples_to_features` is called without `max_length` parameter specified. In that case line 119 with wrong reference gets called. This simple fix should  do it.
2021-02-24 09:01:28 -05:00
Julien Plu
cdcdd5f03a Rework casts (#10274) 2021-02-24 08:38:29 -05:00
abhishek thakur
2d458b2c7d ConvBERT fix torch <> tf weights conversion (#10314)
* convbert conversion test

* fin

* fin

* fin

* clean up tf<->pt conversion

* remove from_pt

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2021-02-24 14:55:34 +03:00