Commit Graph

1074 Commits

Author SHA1 Message Date
Ashwin Geet Dsa
66a5a6fda8 fix to ensure that returned tensors after the tokenization is Long (#7039)
* fix to ensure that returned tensors after the tokenization is Long

* fix to ensure that returned tensors after the tokenization is Long

Co-authored-by: Ashwin Geet Dsa <adsa@grvingt-6.nancy.grid5000.fr>
2020-09-10 11:04:03 -04:00
Sylvain Gugger
15a189049e Add TF Funnel Transformer (#7029)
* Add TF Funnel Transformer

* Proper dummy input

* Formatting

* Update src/transformers/modeling_tf_funnel.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* One review comment forgotten

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-10 10:41:56 -04:00
Patrick von Platen
7fd1febf38 Add "Leveraging Pretrained Checkpoints for Generation" Seq2Seq models. (#6594)
* add conversion script

* improve conversion script

* make style

* add tryout files

* fix

* update

* add causal bert

* better names

* add tokenizer file as well

* finish causal_bert

* fix small bugs

* improve generate

* change naming

* renaming

* renaming

* renaming

* remove leftover files

* clean files

* add fix tokenizer

* finalize

* correct slow test

* update docs

* small fixes

* fix link

* adapt check repo

* apply sams and sylvains recommendations

* fix import

* implement Lysandres recommendations

* fix logger warn
2020-09-10 16:40:51 +02:00
Yu Liu
762cba3bda Albert pretrain datasets/ datacollator (#6168)
* add dataset for albert pretrain

* datacollator for albert pretrain

* naming, comprehension, file reading change

* data cleaning is no needed after this modification

* delete prints

* fix a bug

* file structure change

* add tests for albert datacollator

* remove random seed

* add back len and get item function

* sample file for testing and test code added

* format change for black

* more format change

* Style

* var assignment issue resolve

* add back wrongly deleted DataCollatorWithPadding in init file

* Style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-09-10 07:56:29 -04:00
Johann C. Rocholl
49e9be0639 Fix confusing warnings during TF2 import from PyTorch (#6623)
1. Swapped missing_keys and unexpected_keys.

2. Copy&paste error caused these warnings to say "from TF 2.0" when it's actually "from PyTorch".
2020-09-10 05:31:59 -04:00
Stas Bekman
4ee1053dcf add -y to bypass prompt for transformers-cli upload (#7035) 2020-09-10 04:58:29 -04:00
Lysandre Debut
15478c1287 Batch encore plus and overflowing tokens fails when non existing overflowing tokens for a sequence (#6677)
* Patch and test

* Fix tests
2020-09-09 06:55:17 -04:00
Henry Dashwood
9fd11bf1a8 replace torch.triu with onnx compatible code (#6929) 2020-09-09 04:56:40 -04:00
Julien Chaumond
ed71c21d6a [from_pretrained] Allow tokenizer_type ≠ model_type (#6995) 2020-09-09 04:22:59 -04:00
Stas Bekman
03e363f9ae [generation] consistently add eos tokens (#6982)
Currently beam search returns inconsistent outputs - if hypos have different lengths we get eos, if they are the same - we don't.

This PR makes the output consistent.

Also why not also replace:

```
            if sent_lengths[i] < max_length:
                decoded[i, sent_lengths[i]] = eos_token_id
```
with:
```
            decoded[i, sent_lengths[i]] = eos_token_id
```
Shouldn't eos always be there? If the data gets truncated, the caller needs to user a larger `max_length`.

Please correct me if my logic is flawed.
2020-09-09 04:08:36 -04:00
Stas Bekman
d0963486c1 adding TRANSFORMERS_VERBOSITY env var (#6961)
* introduce TRANSFORMERS_VERBOSITY env var + test + test helpers

* cleanup

* remove helper function
2020-09-09 04:08:01 -04:00
Patrick von Platen
120176ea29 [Longformer] Fix longformer documentation (#7016)
* fix longformer

* allow position ids to not be initialized
2020-09-08 18:51:28 +02:00
Lysandre Debut
5c4eb4b1ac Fixing FLOPS merge by checking if torch is available (#7013)
* Should check if `torch` is available

* fixed samples_count error, distributed_concat arguments

* style

* Import torch at beginning of file

Co-authored-by: TevenLeScao <teven.lescao@gmail.com>
2020-09-08 10:51:58 -04:00
Teven
01d340adfa Floating-point operations logging in trainer (#6768)
* neFLOs calculation, logging, and reloading (#1)

* testing distributed consecutive batches

* fixed AttributeError from DataParallel

* removed verbosity

* rotate with use_mtime=True

* removed print

* fixed interaction with gradient accumulation

* indent formatting

* distributed neflo counting

* fixed typo

* fixed typo

* mean distributed losses

* exporting log history

* moved a few functions

* floating_point_ops clarification for transformers with parameter-reuse

* code quality

* double import

* made flo estimation more task-agnostic

* only logging flos if computed

* code quality

* unused import

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Sylvain review

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* black

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-08 10:00:56 -04:00
Sylvain Gugger
d155b38d6e Funnel transformer (#6908)
* Initial model

* Fix upsampling

* Add special cls token id and test

* Formatting

* Test and fist FunnelTokenizerFast

* Common tests

* Fix the check_repo script and document Funnel

* Doc fixes

* Add all models

* Write doc

* Fix test

* Initial model

* Fix upsampling

* Add special cls token id and test

* Formatting

* Test and fist FunnelTokenizerFast

* Common tests

* Fix the check_repo script and document Funnel

* Doc fixes

* Add all models

* Write doc

* Fix test

* Fix copyright

* Forgot some layers can be repeated

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/modeling_funnel.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* Update src/transformers/modeling_funnel.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments

* Update src/transformers/modeling_funnel.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Slow integration test

* Make small integration test

* Formatting

* Add checkpoint and separate classification head

* Formatting

* Expand list, fix link and add in pretrained models

* Styling

* Add the model in all summaries

* Typo fixes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-08 08:08:08 -04:00
Stuart Mesham
25afb4ea50 fixed trainer tr_loss memory leak (#6999)
* fixed trainer tr_loss memory leak

* detached returned training loss from computation graph in the Trainer class' training_step() method

* Revert "fixed trainer tr_loss memory leak"

This reverts commit 47226e4e
2020-09-08 08:07:33 -04:00
Stas Bekman
c18f5916a0 typo (#7001)
apologies for the tiny PRs, just sending those as I find them.
2020-09-08 01:22:20 -04:00
Jangwon Park
90ec78b514 Add missing arguments for BertWordPieceTokenizer (#5810) 2020-09-07 08:35:41 -04:00
Lysandre Debut
77cd0e13d2 Conversion scripts shouldn't have relative imports (#6991) 2020-09-07 08:31:06 -04:00
Stas Bekman
848fbe1e35 [gen utils] missing else case (#6980)
* [gen utils] missing else case

1. `else` is missing - I hit that case while porting a model. Probably needs to assert there?
2. also the comment on top seems to be outdated (just vocab_size is being set there)

* typo
2020-09-07 07:28:06 -04:00
tznurmin
f7e80721eb Fixed the default number of attention heads in Reformer Configuration (#6973) 2020-09-07 12:12:22 +02:00
Stas Bekman
acfaad74ab [docstring] missing arg (#6933)
* [docstring] missing arg

add the missing `tie_word_embeddings` entry

* cleanup

* Update src/transformers/configuration_reformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-07 05:36:16 -04:00
Stas Bekman
c3317e1f80 typo (#6959)
there is no var `decoder_input_ids`, but there is `input_ids` for decoder :)
2020-09-07 05:16:24 -04:00
Lysandre Debut
9ef9c39728 Cannot index None (#6984) 2020-09-07 04:56:08 -04:00
Sylvain Gugger
08de989a0a Trainer with grad accum (#6930)
* Add warning for gradient accumulation

* Formatting
2020-09-07 04:54:00 -04:00
Boris Dayma
995a958dd1 feat: allow prefix for any generative model (#5885)
* feat: allow padding_text for any generative model

* docs(pipelines.py): correct typo

* Update src/transformers/pipelines.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* feat: rename padding_text to prefix

* fix: cannot tokenize empty text

* fix: pass prefix arg to pipeline

* test: add prefix to text-generetation pipeline

* style: fix style

* style: clean code and variable name more explicit

* set arg docstring to optional

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-07 03:03:45 -04:00
Stas Bekman
48ff6d5109 [doc] remove the implied defaults to :obj:None, s/True/ :obj:`True/, etc. (#6956)
* remove the implied defaults to :obj:`None`

* fix bug in the original

* replace to :obj:`True`, :obj:`False`
2020-09-04 18:22:25 -04:00
Stas Bekman
eff274d629 typo (#6952) 2020-09-04 16:14:37 -04:00
Stas Bekman
c5d43a872f [docstring] misc arg doc corrections (#6932)
* correct bool types

fix docstring s/int/bool/

* fix description

* fix num_labels to match reality
2020-09-04 10:09:42 -04:00
Yih-Dar
a75e319819 Fix mixed precision issue in TF DistilBert (#6915)
* Remove hard-coded uses of float32 to fix mixed precision use in TF Distilbert

* fix style

* fix gelu dtype issue in TF Distilbert

* fix numeric overflow while using half precision
2020-09-04 14:29:57 +02:00
krfricke
0f360d3d1c move wandb/comet logger init to train() to allow parallel logging (#6850)
* move wandb/comet logger init to train() to allow parallel logging

* Setup wandb/comet loggers on first call to log()
2020-09-03 11:49:14 -04:00
Antonio V Mendoza
ea2c6f1afc Adding the LXMERT pretraining model (MultiModal languageXvision) to HuggingFace's suite of models (#5793)
* added template files for LXMERT and competed the configuration_lxmert.py

* added modeling, tokization, testing, and finishing touched for lxmert [yet to be tested]

* added model card for lxmert

* cleaning up lxmert code

* Update src/transformers/modeling_lxmert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_tf_lxmert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_tf_lxmert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_lxmert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* tested torch lxmert, changed documtention, updated outputs, and other small fixes

* Update src/transformers/convert_pytorch_checkpoint_to_tf2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/convert_pytorch_checkpoint_to_tf2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/convert_pytorch_checkpoint_to_tf2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* renaming, other small issues, did not change TF code in this commit

* added lxmert question answering model in pytorch

* added capability to edit number of qa labels for lxmert

* made answer optional for lxmert question answering

* add option to return hidden_states for lxmert

* changed default qa labels for lxmert

* changed config archive path

* squshing 3 commits: merged UI + testing improvments + more UI and testing

* changed some variable names for lxmert

* TF LXMERT

* Various fixes to LXMERT

* Final touches to LXMERT

* AutoTokenizer order

* Add LXMERT to index.rst and README.md

* Merge commit test fixes + Style update

* TensorFlow 2.3.0 sequential model changes variable names

Remove inherited test

* Update src/transformers/modeling_tf_pytorch_utils.py

* Update docs/source/model_doc/lxmert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/lxmert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_lxmert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added suggestions

* Fixes

* Final fixes for TF model

* Fix docs

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-03 04:02:25 -04:00
Sylvain Gugger
8f2723caf0 Output attention takes an s (#6903)
* Fix output_attention -> output_attentions

* Formatting

* One unsaved file
2020-09-02 08:11:45 -04:00
Yohei Tamura
485da7222f fix error class instantiation (#6634) 2020-09-02 07:36:32 -04:00
Suraj Patil
4230d30f77 [pipelines] Text2TextGenerationPipeline (#6744)
* add Text2TextGenerationPipeline

* remove max length warning

* remove comments

* remove input_length

* fix typo

* add tests

* use TFAutoModelForSeq2SeqLM

* doc

* typo

* add the doc below TextGenerationPipeline

* doc nit

* style

* delete comment
2020-09-02 07:34:35 -04:00
Prajjwal Bhargava
6b24281229 fix typo in comments (#6838) 2020-09-02 06:55:37 -04:00
Patrick von Platen
8abd7f69fc fix warning for position ids (#6884) 2020-09-02 06:44:51 -04:00
Parthe Pandit
7cb0572c64 Update modeling_bert.py (#6897)
outptus -> outputs in example of BertForPreTraining
2020-09-02 06:39:01 -04:00
Patrick von Platen
1889e96c8c fix QA example for PT (#6890) 2020-09-02 09:53:09 +02:00
Patrick von Platen
4d1a3ffde8 [EncoderDecoder] Add xlm-roberta to encoder decoder (#6878)
* finish xlm-roberta

* finish docs

* expose XLMRobertaForCausalLM
2020-09-01 21:56:39 +02:00
Jin Young (Daniel) Sohn
21d719238c Add cache_dir to save features TextDataset (#6879)
* Add cache_dir to save features TextDataset

This is in case the dataset is in a RO filesystem, for which is the case
in tests (GKE TPU tests).

* style
2020-09-01 11:42:17 -04:00
Lysandre
4b3ee9cbc5 Release: v3.1.0 2020-09-01 14:27:52 +02:00
Patrick von Platen
afc4ece462 [Generate] Facilitate PyTorch generate using ModelOutputs (#6735)
* fix generate for GPT2 Double Head

* fix gpt2 double head model

* fix  bart / t5

* also add for no beam search

* fix no beam search

* fix encoder decoder

* simplify t5

* simplify t5

* fix t5 tests

* fix BART

* fix transfo-xl

* fix conflict

* integrating sylvains and sams comments

* fix tf past_decoder_key_values

* fix enc dec test
2020-09-01 12:38:25 +02:00
Funtowicz Morgan
397f819615 Restore PaddingStrategy.MAX_LENGTH on QAPipeline while no v2. (#6875)
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-09-01 05:35:35 -04:00
Sam Shleifer
a32d85f0d4 delete reinit (#6862) 2020-09-01 03:43:27 -04:00
Sylvain Gugger
d5f1ffa0d8 Logging doc (#6852)
* Add logging doc

* Foamtting

* Update docs/source/main_classes/logging.rst

* Update src/transformers/utils/logging.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-01 03:16:34 -04:00
Sam Shleifer
367235ee52 Bart can make decoder_input_ids from labels (#6758) 2020-08-31 16:16:47 -04:00
Sylvain Gugger
a59bcefbb1 Split hp search methods (#6857)
* Split the run_hp_search by backend

* Unused import
2020-08-31 15:16:39 -04:00
krfricke
23f9611c16 Add checkpointing to Ray Tune HPO (#6747)
* Introduce HPO checkpointing for PBT

* Moved checkpoint saving

* Fixed checkpoint subdir pass

* Fixed style

* Enable/disable checkpointing, check conditions for various tune schedulers incl. PBT

* Adjust number of GPUs to number of jobs

* Avoid mode pickling in ray

* Move hp search to integrations
2020-08-31 14:38:46 -04:00
Jin Young (Daniel) Sohn
02d09c8fcc Only access loss tensor every logging_steps (#6802)
* Only access loss tensor every logging_steps

* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards

* Fix style (#6803)

* t5 model should make decoder_attention_mask (#6800)

* [s2s] Test hub configs in self-scheduled CI (#6809)

* [s2s] round runtime in run_eval (#6798)

* Pegasus finetune script: add --adafactor (#6811)

* [bart] rename self-attention -> attention (#6708)

* [tests] fix typos in inputs (#6818)

* Fixed open in colab link (#6825)

* Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827)

* BR_BERTo model card (#6793)

* clearly indicate shuffle=False (#6312)

* Clarify shuffle

* clarify shuffle

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>

* [s2s README] Add more dataset download instructions (#6737)

* Style

* Patch logging issue

* Set default logging level to `WARNING` instead of `INFO`

* TF Flaubert w/ pre-norm (#6841)

* Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644)

* add datacollator and dataset for next sentence prediction task

* bug fix (numbers of special tokens & truncate sequences)

* bug fix (+ dict inputs support for data collator)

* add padding for nsp data collator; renamed cached files to avoid conflict.

* add test for nsp data collator

* Style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* Fix in Adafactor docstrings (#6845)

* Fix resuming training for Windows (#6847)

* Only access loss tensor every logging_steps

* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards

* comments

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Thomas Ashish Cherian <6967017+PandaWhoCodes@users.noreply.github.com>
Co-authored-by: Zane Lim <zyuanlim@gmail.com>
Co-authored-by: Rodolfo De Nadai <rdenadai@gmail.com>
Co-authored-by: xujiaze13 <37360975+xujiaze13@users.noreply.github.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Huang Lianzhe <hlz@pku.edu.cn>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-31 11:35:51 -04:00