Commit Graph

61 Commits

Author SHA1 Message Date
Patrick von Platen
3d5dea9bf0 Add example batch size to all commands (#15596) 2022-02-10 08:52:07 -05:00
davidleonfdez
f1a4c4ead5 [WIP] Add preprocess_logits_for_metrics Trainer param (#15473)
* Add preprocess_logits_for_metrics Trainer param

* Compute accuracy in LM examples

* Improve comments
2022-02-03 12:07:20 -05:00
Lysandre
eab338104d Docs for version v4.16.0 2022-01-27 13:11:51 -05:00
Lysandre
f87db5e412 Release: v4.16.0
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
2022-01-27 13:06:33 -05:00
Patrick von Platen
fa39ff9fc4 Docs for v4.16.0dev0 2021-12-22 20:39:44 +01:00
Patrick von Platen
05fa1a7ac1 Release: v4.15.0
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
2021-12-22 18:43:15 +01:00
Lysandre
7c9c41f43c Docs for v4.14.0 2021-12-15 18:29:53 +01:00
Lysandre
960d8cb41d Release: v4.14.0
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
2021-12-15 18:20:35 +01:00
Josué Nascimento
971e36667a Change how to load config of XLNetLMHeadModel (#14746) 2021-12-13 12:34:26 -05:00
Lysandre
ab31b3e41b Docs for v4.14.0dev0 2021-12-09 17:09:23 +01:00
Lysandre
4da3a696e4 Release: v4.13.0
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
2021-12-09 16:55:21 +01:00
Julien Chaumond
6cdc3a7844 [urls to hub] Replace outdated model tags with their now-canonical pipeline types (#14617)
* Replace outdated model tags with their now-canonical pipeline types

* spam the CI till it's green
2021-12-06 04:35:01 -05:00
Nicholas Broad
69e16abf98 Switch from using sum for flattening lists of lists in group_texts (#14472)
* remove sum for list flattening

* change to chain(*)

* make chain object a list

* delete empty lines

per sgugger's suggestions

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-11-22 16:17:26 -05:00
Stas Bekman
11f65d4158 [test] add test for --config_overrides (#14466)
* add test for --config_overrides

* remove unneeded parts of the test
2021-11-22 11:33:43 -05:00
Sylvain Gugger
08a5f57567 Add new LFS prune API (#14294) 2021-11-05 18:58:51 -04:00
Lysandre
b8fad022a0 v4.13.0.dev0 2021-10-28 12:56:46 -04:00
Lysandre
62bf536631 Release v4.12.0
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
2021-10-28 12:09:49 -04:00
Emanuel Huber
ebd48c6de5 Replace assertions with ValueError exception (#14142)
Updated masked-language modeling examples in pytorch
with convention defined by #12789
2021-10-26 17:14:29 -04:00
Patrick von Platen
44eb8bdeea map only on one process (#13810) 2021-09-30 18:52:53 +02:00
Lysandre
11c69b8045 Docs for version v4.11.0 2021-09-27 14:19:38 -04:00
Lysandre
dc193c906d Release: v4.11.0
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
2021-09-27 14:14:09 -04:00
Gunjan Chhablani
38580455de Add model card creation snippet to example scripts (#13730)
* Update run_glue.py

* Update run_glue.py

* Add model creation snippet to other scripts

* Fix style
2021-09-24 15:51:46 +02:00
Sylvain Gugger
27d4639779 Make gradient_checkpointing a training argument (#13657)
* Make gradient_checkpointing a training argument

* Update src/transformers/modeling_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/configuration_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix tests

* Style

* document Gradient Checkpointing as a performance feature

* Small rename

* PoC for not using the config

* Adapt BC to new PoC

* Forgot to save

* Rollout changes to all other models

* Fix typo

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
2021-09-22 07:51:38 -04:00
Sylvain Gugger
b7d264be0d Add push_to_hub to no_trainer examples (#13659)
* Add push_to_hub to no_trainer examples

* Quality

* Document integration

* Roll out to other examples
2021-09-21 13:13:30 -04:00
Lysandre
5ee67a4412 Docs for v4.10.0 2021-08-31 16:02:31 +02:00
Lysandre
d12bbe4942 Release: v4.10.0
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
2021-08-31 15:53:10 +02:00
Stefan Schweter
4046e66e40 examples: only use keep_linebreaks when reading TXT files (#13320)
* examples: only use keep_linebreaks when reading TXT files for all CLM examples

* examples: only use keep_linebreaks when reading TXT files for all CLM examples

* examples: only use keep_linebreaks when reading TXT files for all CLM examples
2021-08-28 16:22:29 +02:00
Stefan Schweter
319d840b46 examples: add keep_linebreaks option to CLM examples (#13150)
* examples: add keep_linebreaks option to text dataset loader for all CLM examples

* examples: introduce new keep_linebreaks option as data argument in CLM examples
2021-08-27 11:35:45 +02:00
Allan Lin
91ff480e26 Update namespaces inside torch.utils.data to the latest. (#13167)
* Update torch.utils.data namespaces to the latest.

* Format

* Update Dataloader.

* Style
2021-08-19 14:29:51 +02:00
Sylvain Gugger
7fcee113c1 Tpu tie weights (#13030)
* Fix tied weights on TPU

* Manually tie weights in no trainer examples

* Fix for test

* One last missing

* Gettning owned by my scripts

* Address review comments

* Fix test

* Fix tests

* Fix reformer tests
2021-08-06 20:41:39 +02:00
Sylvain Gugger
303989de0e Add accelerate to examples requirements (#12888) 2021-07-26 09:57:34 -04:00
Lysandre
40de2d5a4f Docs for v4.10.0dev0 2021-07-22 12:52:25 +02:00
Lysandre
72aee83ced Release: v4.9.0
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
2021-07-22 12:11:55 +02:00
Sylvain Gugger
6f1adc4334 Fix group_lengths for short datasets (#12558) 2021-07-08 07:23:41 -04:00
Souvic Chakraborty
1d6623c6a2 MLM training fails with no validation file(same as #12406 for pytorch now) (#12517)
* Validation split percentage to be used for custom data files also

Issue same as https://github.com/huggingface/transformers/issues/12406 fixed for pytorch branch run_mlm.py

* Validation split added in the right place

* Update run_clm.py

* validation split added for custom files

* Validation split added for custom files

* Update run_plm.py

* fixed validation split for custom files as input for pytorch examples in lm

* Update run_clm_no_trainer.py

* args modified
2021-07-07 09:05:44 -04:00
Bhadresh Savani
04dbea31a9 [Examples] Added context manager to datasets map (#12367)
* added cotext manager to datasets map

* fixed style and spaces

* fixed warning of deprecation

* changed desc
2021-06-28 09:14:00 -07:00
Taha ValizadehAslani
9490d668d2 Update run_mlm.py (#12344)
Before the code could not be used for validation only because of this line:
extension = data_args.train_file.split(".")[-1]
was assuming that extension must be extracted from the training dataset. This line would run regardless of the training or validation options of the user. This would lead to an error if the user only wants to run an evaluation only and does not want to do train (because the training file does not exist). I modified it to extract extension from the training file if the user wants to do train and extract it from the validation file if the user wants to run eval. This way the code can be used for both training and validation separately.
2021-06-28 07:49:22 -04:00
Bhadresh Savani
539ee456d4 [Examples] Replicates the new --log_level feature to all trainer-based pytorch (#12359)
* added log_level

* fix comment

* fixed log_level

* Trigger CI

* Unfied logging

* simplified args for log_level
2021-06-25 14:58:42 -07:00
Stas Bekman
4a872caef4 remove extra white space from log format (#12360) 2021-06-25 13:20:14 -07:00
Sylvain Gugger
2150dfed31 v4.9.0.dev0 2021-06-23 13:31:19 -04:00
Sylvain Gugger
9252a5127f Release: v4.8.0 2021-06-23 13:25:56 -04:00
Bhavitvya Malik
e43e11260f update desc for map in all examples (#12226)
* update desc for map in all examples

* added plm

* suggestions
2021-06-17 15:37:31 -04:00
Lysandre
0daadc1919 Docs for v4.8.0 2021-06-17 18:17:42 +02:00
Lysandre
7a6c9fab8e Release: v4.7.0
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
2021-06-17 17:57:42 +02:00
Sylvain Gugger
7d7ceca396 Model card defaults (#12122)
* [WIP] Model card defaults

* finetuned_from default value

* Add all mappings to the mapping file

* Be more defensive on finetuned_from arg

* Add default task tag

* Separate tags from tasks

* Edge case for dataset

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-06-15 16:01:37 -04:00
Kumar Abhishek
9de62cfbce [lm examples] Replicate --config_overrides addition to other LM examples (#12135)
* [lm examples] Replicate --config_overrides addition to other LM examples

* Removing no trainer files changes

* Update README

Co-authored-by: Kumar Abhishek <kabhishek@expedia.com>
2021-06-14 08:12:22 -04:00
Nicholas Broad
cd7961b632 Use text_column_name variable instead of "text" (#12132)
* Use text_column_name variable instead of "text"

`text_column_name` was already defined above where I made the changes and it was also used below where I made changes.

This is a very minor change. If a dataset does not use "text" as the column name, then the `tokenize_function` will now use whatever column is assigned to `text_column_name`. `text_column_name` is just the first column name if "text" is not a column name. It makes the function a little more robust, though I would assume that 90% + of datasets use "text" anyway.

* black formatting

* make style

Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>
2021-06-14 08:11:13 -04:00
Sylvain Gugger
b8ab541340 Don't log anything before logging is setup in examples (#12121)
* Don't log anything before logging is setup in examples

* Last example
2021-06-14 08:03:33 -04:00
Sylvain Gugger
fd6902838a Properly indent block_size (#12070) 2021-06-08 10:27:02 -04:00
cdleong
49bee0aea4 Add torch to requirements.txt in language-modeling (#12040)
* Add torch to requirements.txt in language-modeling

* Update examples/pytorch/language-modeling/requirements.txt

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-08 09:02:35 -04:00