Files

Stas Bekman 1eeb206bef [ported model] FSMT (FairSeq MachineTranslation) (#6940 )

* ready for PR

* cleanup

* correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST

* fix

* perfectionism

* revert change from another PR

* odd, already committed this one

* non-interactive upload workaround

* backup the failed experiment

* store langs in config

* workaround for localizing model path

* doc clean up as in https://github.com/huggingface/transformers/pull/6956

* style

* back out debug mode

* document: run_eval.py --num_beams 10

* remove unneeded constant

* typo

* re-use bart's Attention

* re-use EncoderLayer, DecoderLayer from bart

* refactor

* send to cuda and fp16

* cleanup

* revert (moved to another PR)

* better error message

* document run_eval --num_beams

* solve the problem of tokenizer finding the right files when model is local

* polish, remove hardcoded config

* add a note that the file is autogenerated to avoid losing changes

* prep for org change, remove unneeded code

* switch to model4.pt, update scores

* s/python/bash/

* missing init (but doesn't impact the finetuned model)

* cleanup

* major refactor (reuse-bart)

* new model, new expected weights

* cleanup

* cleanup

* full link

* fix model type

* merge porting notes

* style

* cleanup

* have to create a DecoderConfig object to handle vocab_size properly

* doc fix

* add note (not a public class)

* parametrize

* - add bleu scores integration tests

* skip test if sacrebleu is not installed

* cache heavy models/tokenizers

* some tweaks

* remove tokens that aren't used

* more purging

* simplify code

* switch to using decoder_start_token_id

* add doc

* Revert "major refactor (reuse-bart)"

This reverts commit 226dad15ca6a9ef4e26178526e878e8fc5c85874.

* decouple from bart

* remove unused code #1

* remove unused code #2

* remove unused code #3

* update instructions

* clean up

* move bleu eval to examples

* check import only once

* move data+gen script into files

* reuse via import

* take less space

* add prepare_seq2seq_batch (auto-tested)

* cleanup

* recode test to use json instead of yaml

* ignore keys not needed

* use the new -y in transformers-cli upload -y

* [xlm tok] config dict: fix str into int to match definition (#7034)

* [s2s] --eval_max_generate_length (#7018)

* Fix CI with change of name of nlp (#7054)

* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last

* extending to support allen_nlp wmt models

- allow a specific checkpoint file to be passed
- more arg settings
- scripts for allen_nlp models

* sync with changes

* s/fsmt-wmt/wmt/ in model names

* s/fsmt-wmt/wmt/ in model names (p2)

* s/fsmt-wmt/wmt/ in model names (p3)

* switch to a better checkpoint

* typo

* make non-optional args such - adjust tests where possible or skip when there is no other choice

* consistency

* style

* adjust header

* cards moved (model rename)

* use best custom hparams

* update info

* remove old cards

* cleanup

* s/stas/facebook/

* update scores

* s/allen_nlp/allenai/

* url maps aren't needed

* typo

* move all the doc / build /eval generators to their own scripts

* cleanup

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* fix indent

* duplicated line

* style

* use the correct add_start_docstrings

* oops

* resizing can't be done with the core approach, due to 2 dicts

* check that the arg is a list

* style

* style

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

2020-09-17 11:31:29 -04:00

adversarial

Black 20 release

2020-08-26 17:20:22 +02:00

benchmarking

Black 20 release

2020-08-26 17:20:22 +02:00

bert-loses-patience

[logging] remove no longer needed verbosity override (#7100 )

2020-09-15 04:01:14 -04:00

bertology

Black 20 release

2020-08-26 17:20:22 +02:00

contrib

Transformer-XL: Remove unused parameters (#7087 )

2020-09-17 06:10:34 -04:00

deebert

[logging] remove no longer needed verbosity override (#7100 )

2020-09-15 04:01:14 -04:00

distillation

[logging] remove no longer needed verbosity override (#7100 )

2020-09-15 04:01:14 -04:00

language-modeling

Add cache_dir to save features TextDataset (#6879 )

2020-09-01 11:42:17 -04:00

longform-qa

Fix CI with change of name of nlp (#7054 )

2020-09-10 14:51:08 -04:00

lxmert

Demoing LXMERT with raw images by incorporating the FRCNN model for roi-pooled extraction and bounding-box predction on the GQA answer set. (#6986 )

2020-09-14 10:07:04 -04:00

movement-pruning

[logging] remove no longer needed verbosity override (#7100 )

2020-09-15 04:01:14 -04:00

multiple-choice

Black 20 release

2020-08-26 17:20:22 +02:00

question-answering

[logging] remove no longer needed verbosity override (#7100 )

2020-09-15 04:01:14 -04:00

seq2seq

[ported model] FSMT (FairSeq MachineTranslation) (#6940 )

2020-09-17 11:31:29 -04:00

text-classification

[logging] remove no longer needed verbosity override (#7100 )

2020-09-15 04:01:14 -04:00

text-generation

feat: allow prefix for any generative model (#5885 )

2020-09-07 03:03:45 -04:00

token-classification

Fix the TF Trainer gradient accumulation and the TF NER example (#6713 )

2020-08-27 08:45:34 -04:00

conftest.py

enable easy checkout switch (#5645 )

2020-07-31 04:34:46 -04:00

lightning_base.py

clearly indicate shuffle=False (#6312 )

2020-08-30 19:26:10 +08:00

README.md

correct pl link in readme (#6364 )

2020-08-10 03:08:46 -04:00

requirements.txt

[examples testing] restore code (#7099 )

2020-09-14 08:54:23 -04:00

test_examples.py

[examples testing] restore code (#7099 )

2020-09-14 08:54:23 -04:00

test_xla_examples.py

Add setup for TPU CI to run every hour. (#6219 )

2020-08-07 11:17:07 -04:00

xla_spawn.py

[TPU] Doc, fix xla_spawn.py, only preprocess dataset once (#4223 )

2020-05-08 14:10:05 -04:00

README.md

Examples

Version 2.9 of 🤗 Transformers introduces a new Trainer class for PyTorch, and its equivalent TFTrainer for TF 2. Running the examples requires PyTorch 1.3.1+ or TensorFlow 2.2+.

Here is the list of all our examples:

grouped by task (all official examples work for multiple models)
with information on whether they are built on top of Trainer/TFTrainer (if not, they still work, they might just lack some features),
whether they also include examples for pytorch-lightning, which is a great fully-featured, general-purpose training library for PyTorch,
links to Colab notebooks to walk through the scripts and run them easily,
links to Cloud deployments to be able to deploy large-scale trainings in the Cloud with little to no setup.

This is still a work-in-progress – in particular documentation is still sparse – so please contribute improvements/pull requests.

The Big Table of Tasks

Task	Example datasets	Trainer support	TFTrainer support	pytorch-lightning	Colab
`language-modeling`	Raw text	✅	-	-
`text-classification`	GLUE, XNLI	✅	✅	✅
`token-classification`	CoNLL NER	✅	✅	✅	-
`multiple-choice`	SWAG, RACE, ARC	✅	✅	-
`question-answering`	SQuAD	✅	✅	-	-
`text-generation`	-	n/a	n/a	n/a
`distillation`	All	-	-	-	-
`summarization`	CNN/Daily Mail	-	-	✅	-
`translation`	WMT	-	-	✅	-
`bertology`	-	-	-	-	-
`adversarial`	HANS	✅	-	-	-

Important note

Important To make sure you can successfully run the latest versions of the example scripts, you have to install the library from source and install some example-specific requirements. Execute the following steps in a new virtual environment:

git clone https://github.com/huggingface/transformers
cd transformers
pip install .
pip install -r ./examples/requirements.txt

One-click Deploy to Cloud (wip)

Azure

Running on TPUs

When using Tensorflow, TPUs are supported out of the box as a tf.distribute.Strategy.

When using PyTorch, we support TPUs thanks to pytorch/xla. For more context and information on how to setup your TPU environment refer to Google's documentation and to the very detailed pytorch/xla README.

In this repo, we provide a very simple launcher script named xla_spawn.py that lets you run our example scripts on multiple TPU cores without any boilerplate. Just pass a --num_cores flag to this script, then your regular training script with its arguments (this is similar to the torch.distributed.launch helper for torch.distributed).

For example for run_glue:

python examples/xla_spawn.py --num_cores 8 \
	examples/text-classification/run_glue.py
	--model_name_or_path bert-base-cased \
	--task_name mnli \
	--data_dir ./data/glue_data/MNLI \
	--output_dir ./models/tpu \
	--overwrite_output_dir \
	--do_train \
	--do_eval \
	--num_train_epochs 1 \
	--save_steps 20000

Feedback and more use cases and benchmarks involving TPUs are welcome, please share with the community.

Logging & Experiment tracking

You can easily log and monitor your runs code. The following are currently supported:

Weights & Biases

To use Weights & Biases, install the wandb package with:

pip install wandb

Then log in the command line:

wandb login

If you are in Jupyter or Colab, you should login with:

import wandb
wandb.login()

Whenever you use Trainer or TFTrainer classes, your losses, evaluation metrics, model topology and gradients (for Trainer only) will automatically be logged.

When using 🤗 Transformers with PyTorch Lightning, runs can be tracked through WandbLogger. Refer to related documentation & examples.

Comet.ml

To use comet_ml, install the Python package with:

pip install comet_ml

or if in a Conda environment:

conda install -c comet_ml -c anaconda -c conda-forge comet_ml

README.md Unescape Escape

Examples

The Big Table of Tasks

Important note

One-click Deploy to Cloud (wip)

Azure

Running on TPUs

Logging & Experiment tracking

Weights & Biases

Comet.ml

README.md