Commit Graph

1063 Commits

Author SHA1 Message Date
Yacine Jernite
49c5202522 Eli5 examples (#4968)
* add eli5 examples

* add dense query script

* query_di

* merging

* merging

* add_utils

* adds nearest neighbor wikipedia

* batch queries

* training_retriever

* new notebooks

* moved retriever traiing script

* finished wiki40b

* max_len_fix

* train_s2s

* retriever_batch_checkpointing

* cleanup

* merge

* dim_fix

* fix_indexer

* fix_wiki40b_snippets

* fix_embed_for_r

* fp32 index

* fix_sparse_q

* joint_training

* remove obsolete datasets

* add_passage_nn_results

* add_passage_nn_results

* add_batch_nn

* add_batch_nn

* add_data_scripts

* notebook

* notebook

* notebook

* fix_multi_gpu

* add_app

* full_caching

* full_caching

* notebook

* sparse_done

* images

* notebook

* add_image_gif

* with_Gif

* add_contr_image

* notebook

* notebook

* notebook

* train_functions

* notebook

* min_retrieval_length

* pandas_option

* notebook

* min_retrieval_length

* notebook

* notebook

* eval_Retriever

* notebook

* images

* notebook

* add_example

* add_example

* notebook

* fireworks

* notebook

* notebook

* joe's notebook comments

* app_update

* notebook

* notebook_link

* captions

* notebook

* assing RetriBert model

* add RetriBert to Auto

* change AutoLMHead to AutoSeq2Seq

* notebook downloads from hf models

* style_black

* style_black

* app_update

* app_update

* fix_app_update

* style

* style

* isort

* Delete WikiELI5training.ipynb

* Delete evaluate_eli5.py

* Delete WikiELI5explore.ipynb

* Delete ExploreWikiELI5Support.html

* Delete explainlikeimfive.py

* Delete wiki_snippets.py

* children before parent

* children before parent

* style_black

* style_black_only

* isort

* isort_new

* Update src/transformers/modeling_retribert.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* typo fixes

* app_without_asset

* cleanup

* Delete ELI5animation.gif

* Delete ELI5contrastive.svg

* Delete ELI5wiki_index.svg

* Delete choco_bis.svg

* Delete fireworks.gif

* Delete huggingface_logo.jpg

* Delete huggingface_logo.svg

* Delete Long_Form_Question_Answering_with_ELI5_and_Wikipedia.ipynb

* Delete eli5_app.py

* Delete eli5_utils.py

* readme

* Update README.md

* unused imports

* moved_info

* default_beam

* ftuned model

* disclaimer

* Update src/transformers/modeling_retribert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* black

* add_doc

* names

* isort_Examples

* isort_Examples

* Add doc to index

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-06-16 16:36:58 -04:00
Sam Shleifer
c3e607496c [cleanup] examples test_run_squad uses tiny model (#5059) 2020-06-16 14:06:45 -04:00
Sylvain Gugger
d5477baf7d Convert hans to Trainer (#5025)
* Convert hans to Trainer

* Tick box
2020-06-16 08:06:31 -04:00
Anthony MOI
36434220fc [HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)
* Use tokenizers pre-tokenized pipeline

* failing pretrokenized test

* Fix is_pretokenized in python

* add pretokenized tests

* style and quality

* better tests for batched pretokenized inputs

* tokenizers clean up - new padding_strategy - split the files

* [HUGE] refactoring tokenizers - padding - truncation - tests

* style and quality

* bump up requied tokenizers version to 0.8.0-rc1

* switched padding/truncation API - simpler better backward compat

* updating tests for custom tokenizers

* style and quality - tests on pad

* fix QA pipeline

* fix backward compatibility for max_length only

* style and quality

* Various cleans up - add verbose

* fix tests

* update docstrings

* Fix tests

* Docs reformatted

* __call__ method documented

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-06-15 17:12:51 -04:00
Sylvain Gugger
1affde2f10 Make DataCollator a callable (#5015)
* Make DataCollator a callable

* Update src/transformers/data/data_collator.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-15 11:58:33 -04:00
Stefan Schweter
d812e6d76e NER: fix construction of input examples for RoBERTa (#4943)
* utils_ner: do not add extra sep token for RoBERTa model

* run_pl_ner: do not add extra sep token for RoBERTa model
2020-06-15 08:30:40 -04:00
Sylvain Gugger
403d309857 Hans data (#4854)
* Update hans data to be able to use Trainer

* Fixes

* Deal with tokenizer that don't have token_ids

* Clean up things

* Simplify data use

* Fix the input dict

* Formatting + proper path in README
2020-06-13 09:35:13 -04:00
VictorSanh
473808da0d update mvmt-pruning/saving_prunebert (updating torch to 1.5) 2020-06-11 19:42:45 +00:00
Sylvain Gugger
e8db8b845a Remove unused arguments in Multiple Choice example (#4853)
* Remove unused arguments

* Formatting

* Remove second todo comment
2020-06-09 20:05:09 -04:00
songyouwei
29c36e9f36 run_pplm.py bug fix (#4867)
`is_leaf` may become `False` after `.to(device=device)` function call.
2020-06-09 19:14:27 -04:00
Sam Shleifer
f90bc44d9a [examples] Cleanup summarization docs (#4876) 2020-06-09 17:38:28 -04:00
Amil Khare
02e5f79662 [examples] consolidate summarization examples (#4837) 2020-06-09 11:14:12 -04:00
daniel-shan
b6f365a8ed Updates args in tf squad example. (#4820)
Co-authored-by: Daniel Shan <daniel.shan@workday.com>
2020-06-08 05:36:09 -04:00
Mr Ruben
ddf9a3dfc7 Updated path "cd examples/text-generation/pplm" (#4778)
https://github.com/huggingface/transformers/issues/4776
2020-06-05 21:16:48 -04:00
Sam Shleifer
875288b344 [isort] add matplotlib to known 3rd party dependencies (#4800) 2020-06-05 17:27:31 -04:00
Julien Chaumond
b9109f2de1 [doc] Make it clearer that text-generation does not involve training 2020-06-05 14:59:22 +02:00
Stefan Schweter
2a4b9e09c0 NER: Add new WNUT’17 example (#4681)
* ner: add preprocessing script for examples that splits longer sentences

* ner: example shell scripts use local preprocessing now

* ner: add new example section for WNUT’17 NER task. Remove old English CoNLL-03 results

* ner: satisfy black and isort
2020-06-04 19:13:17 -04:00
prajjwal1
48a05026de removed deprecared use of Variable api from pplm example 2020-06-04 18:07:49 -04:00
Jason Phang
492b352ab6 Remove unnecessary model_type arg in example (#4771) 2020-06-04 13:41:24 -04:00
Jin Young Sohn
b231a413f5 Add cache_dir to save features in GLUE + Differentiate match/mismatch for MNLI metrics (#4621)
* Glue task cleaup

* Enable writing cache to cache_dir in case dataset lives in readOnly
filesystem.
* Differentiate match vs mismatch for MNLI metrics.

* Style

* Fix pytype

* Fix type

* Use cache_dir in mnli mismatch eval dataset

* Small Tweaks

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-02 13:40:14 -04:00
Julien Chaumond
b42586ea56 Fix CI after killing archive maps (#4724)
Some checks failed
GitHub-hosted runner / check_code_quality (push) Has been cancelled
* 🐛 Fix model ids for BART and Flaubert
2020-06-02 10:21:09 -04:00
Julien Chaumond
d4c2cb402d Kill model archive maps (#4636)
* Kill model archive maps

* Fixup

* Also kill model_archive_map for MaskedBertPreTrainedModel

* Unhook config_archive_map

* Tokenizers: align with model id changes

* make style && make quality

* Fix CI
2020-06-02 09:39:33 -04:00
Lysandre Debut
88762a2f8c Specify PyTorch versions for examples (#4710) 2020-06-02 04:29:28 -04:00
Victor SANH
bf760c80b5 finish README 2020-06-01 09:23:31 -04:00
Victor SANH
9d7d9b3ae0 weird import 2020-06-01 09:23:31 -04:00
Victor SANH
2a3c88a659 Update examples/movement-pruning/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-01 09:23:31 -04:00
Victor SANH
4ac462bfb8 Update examples/movement-pruning/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-01 09:23:31 -04:00
Victor SANH
35fa0bbca0 clarify README 2020-06-01 09:23:31 -04:00
Victor SANH
cc746a5020 flake8 compliance 2020-06-01 09:23:31 -04:00
Victor SANH
b11386e158 less prints in saving prunebert 2020-06-01 09:23:31 -04:00
Victor SANH
8b5d4003ab complete README 2020-06-01 09:23:31 -04:00
Victor SANH
5c8e5b3709 commplying with isort 2020-06-01 09:23:31 -04:00
Victor SANH
db2a3b2e01 space 2020-06-01 09:23:31 -04:00
Victor SANH
5f8f2d849a add floppy bert model notebok 2020-06-01 09:23:31 -04:00
Victor SANH
b41948f5cd add requirements 2020-06-01 09:23:31 -04:00
Victor SANH
fb8f4277b2 add scripts 2020-06-01 09:23:31 -04:00
Victor SANH
d489a6d3d5 add masked_run_* 2020-06-01 09:23:31 -04:00
Victor SANH
e4c07faf0a add sparsity modules 2020-06-01 09:23:31 -04:00
Patrick von Platen
96f57c9ccb [Benchmark] Memory benchmark utils (#4198)
* improve memory benchmarking

* correct typo

* fix current memory

* check torch memory allocated

* better pytorch function

* add total cached gpu memory

* add total gpu required

* improve torch gpu usage

* update memory usage

* finalize memory tracing

* save intermediate benchmark class

* fix conflict

* improve benchmark

* improve benchmark

* finalize

* make style

* improve benchmarking

* correct typo

* make train function more flexible

* fix csv save

* better repr of bytes

* better print

* fix __repr__ bug

* finish plot script

* rename plot file

* delete csv and small improvements

* fix in plot

* fix in plot

* correct usage of timeit

* remove redundant line

* remove redundant line

* fix bug

* add hf parser tests

* add versioning and platform info

* make style

* add gpu information

* ensure backward compatibility

* finish adding all tests

* Update src/transformers/benchmark/benchmark_args.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/benchmark/benchmark_args_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* delete csv files

* fix isort ordering

* add out of memory handling

* add better train memory handling

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-05-27 23:22:16 +02:00
Lysandre Debut
6a17688021 per_device instead of per_gpu/error thrown when argument unknown (#4618)
* per_device instead of per_gpu/error thrown when argument unknown

* [docs] Restore examples.md symlink

* Correct absolute links so that symlink to the doc works correctly

* Update src/transformers/hf_argparser.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Warning + reorder

* Docs

* Style

* not for squad

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-27 11:36:55 -04:00
Hao Tan
a9aa7456ac Add back --do_lower_case to uncased models (#4245)
The option `--do_lower_case` is currently required by the uncased models (i.e., bert-base-uncased, bert-large-uncased).

Results:
BERT-BASE without --do_lower_case:  'exact': 73.83, 'f1': 82.22
BERT-BASE with --do_lower_case:  'exact': 81.02, 'f1': 88.34
2020-05-26 21:13:07 -04:00
Antonis Maronikolakis
50d1ce411f add DistilBERT to supported models (#4558) 2020-05-25 14:50:45 -04:00
Zhangyx
49296533ca Adds predict stage for glue tasks, and generate result files which can be submitted to gluebenchmark.com (#4463)
* Adds predict stage for glue tasks, and generate result files which could be submitted to gluebenchmark.com website.

* Use Split enum + always output the label name

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-21 09:17:44 -04:00
Tobias Lee
271bedb485 [examples] fix no grad in second pruning in run_bertology (#4479)
* fix no grad in second pruning and typo

* fix prune heads attention mismatch problem

* fix

* fix

* fix

* run make style

* run make style
2020-05-21 09:17:03 -04:00
Patrick von Platen
aa925a52fa [Tests, GPU, SLOW] fix a bunch of GPU hardcoded tests in Pytorch (#4468)
* fix gpu slow tests in pytorch

* change model to device syntax
2020-05-19 21:35:04 +02:00
Julien Chaumond
5e7fe8b585 Distributed eval: SequentialDistributedSampler + gather all results (#4243)
* Distributed eval: SequentialDistributedSampler + gather all results

* For consistency only write to disk from world_master

Close https://github.com/huggingface/transformers/issues/4272

* Working distributed eval

* Hook into scripts

* Fix #3721 again

* TPU.mesh_reduce: stay in tensor space

Thanks @jysohn23

* Just a small comment

* whitespace

* torch.hub: pip install packaging

* Add test scenarii
2020-05-18 22:02:39 -04:00
Boris Dayma
d9ece8233d fix(run_language_modeling): use arg overwrite_cache (#4407) 2020-05-18 11:37:35 -04:00
Julien Chaumond
757baee846 Fix un-prefixed f-string
see https://github.com/huggingface/transformers/pull/4367#discussion_r426356693

Hat/tip @girishponkiya
2020-05-18 11:20:46 -04:00
Julien Chaumond
15550ce0d1 [skip ci] remove local rank 2020-05-15 17:08:38 -04:00
Lysandre Debut
edf9ac11d4 Should return overflowing information for the log (#4385) 2020-05-15 09:49:11 -04:00