Examples reorg (#11350)

* Base move * Examples reorganization * Update references * Put back test data * Move conftest * More fixes * Move test data to test fixtures * Update path * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comments and clean Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-21 11:11:20 -04:00
parent ca7ff64f5b
commit dabeb15292
105 changed files with 1062 additions and 560 deletions
--- a/docs/source/benchmarks.rst
+++ b/docs/source/benchmarks.rst
@@ -65,10 +65,10 @@ respectively.
 .. code-block:: bash

    ## PYTORCH CODE
-    python examples/benchmarking/run_benchmark.py --help
+    python examples/pytorch/benchmarking/run_benchmark.py --help

    ## TENSORFLOW CODE
-    python examples/benchmarking/run_benchmark_tf.py --help
+    python examples/tensorflow/benchmarking/run_benchmark_tf.py --help


 An instantiated benchmark object can then simply be run by calling ``benchmark.run()``.
--- a/docs/source/converting_tensorflow_models.rst
+++ b/docs/source/converting_tensorflow_models.rst
@@ -33,8 +33,8 @@ You can convert any TensorFlow checkpoint for BERT (in particular `the pre-train
 This CLI takes as input a TensorFlow checkpoint (three files starting with ``bert_model.ckpt``\ ) and the associated
 configuration file (\ ``bert_config.json``\ ), and creates a PyTorch model for this configuration, loads the weights
 from the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that
-can be imported using ``from_pretrained()`` (see example in :doc:`quicktour` , `run_glue.py
-<https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_glue.py>`_\ ).
+can be imported using ``from_pretrained()`` (see example in :doc:`quicktour` , :prefix_link:`run_glue.py
+<examples/pytorch/text-classification/run_glue.py>` \ ).

 You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow
 checkpoint (the three files starting with ``bert_model.ckpt``\ ) but be sure to keep the configuration file (\
--- a/docs/source/installation.md
+++ b/docs/source/installation.md
@@ -168,13 +168,13 @@ Here is an example of how this can be used on a filesystem that is shared betwee
 On the instance with the normal network run your program which will download and cache models (and optionally datasets if you use 🤗 Datasets). For example:

 ```
-python examples/seq2seq/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
+python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
 ```

 and then with the same filesystem you can now run the same program on a firewalled instance:
 ```
 HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
-python examples/seq2seq/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
+python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
 ```
 and it should succeed without any hanging waiting to timeout.

--- a/docs/source/main_classes/processors.rst
+++ b/docs/source/main_classes/processors.rst
@@ -68,8 +68,8 @@ Additionally, the following method can be used to load values from a data file a
 Example usage
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-An example using these processors is given in the `run_glue.py
-<https://github.com/huggingface/pytorch-transformers/blob/master/examples/text-classification/run_glue.py>`__ script.
+An example using these processors is given in the :prefix_link:`run_glue.py
+<examples/legacy/text-classification/run_glue.py>` script.


 XNLI
@@ -89,8 +89,8 @@ This library hosts the processor to load the XNLI data:

 Please note that since the gold labels are available on the test set, evaluation is performed on the test set.

-An example using these processors is given in the `run_xnli.py
-<https://github.com/huggingface/pytorch-transformers/blob/master/examples/text-classification/run_xnli.py>`__ script.
+An example using these processors is given in the :prefix_link:`run_xnli.py
+<examples/legacy/text-classification/run_xnli.py>` script.


 SQuAD
@@ -169,4 +169,4 @@ Using `tensorflow_datasets` is as easy as using a data file:


 Another example using these processors is given in the :prefix_link:`run_squad.py
-<examples/question-answering/run_squad.py>` script.
+<examples/legacy/question-answering/run_squad.py>` script.
--- a/docs/source/main_classes/trainer.rst
+++ b/docs/source/main_classes/trainer.rst
@@ -338,7 +338,7 @@ For example here is how you could use it for ``run_translation.py`` with 2 GPUs:

 .. code-block:: bash

-    python -m torch.distributed.launch --nproc_per_node=2 examples/seq2seq/run_translation.py \
+    python -m torch.distributed.launch --nproc_per_node=2 examples/pytorch/translation/run_translation.py \
    --model_name_or_path t5-small --per_device_train_batch_size 1   \
    --output_dir output_dir --overwrite_output_dir \
    --do_train --max_train_samples 500 --num_train_epochs 1 \
@@ -363,7 +363,7 @@ For example here is how you could use it for ``run_translation.py`` with 2 GPUs:

 .. code-block:: bash

-    python -m torch.distributed.launch --nproc_per_node=2 examples/seq2seq/run_translation.py \
+    python -m torch.distributed.launch --nproc_per_node=2 examples/pytorch/translation/run_translation.py \
    --model_name_or_path t5-small --per_device_train_batch_size 1   \
    --output_dir output_dir --overwrite_output_dir \
    --do_train --max_train_samples 500 --num_train_epochs 1 \
@@ -540,7 +540,7 @@ Here is an example of running ``run_translation.py`` under DeepSpeed deploying a

 .. code-block:: bash

-    deepspeed examples/seq2seq/run_translation.py \
+    deepspeed examples/pytorch/translation/run_translation.py \
    --deepspeed tests/deepspeed/ds_config.json \
    --model_name_or_path t5-small --per_device_train_batch_size 1   \
    --output_dir output_dir --overwrite_output_dir --fp16 \
@@ -565,7 +565,7 @@ To deploy DeepSpeed with one GPU adjust the :class:`~transformers.Trainer` comma

 .. code-block:: bash

-    deepspeed --num_gpus=1 examples/seq2seq/run_translation.py \
+    deepspeed --num_gpus=1 examples/pytorch/translation/run_translation.py \
    --deepspeed tests/deepspeed/ds_config.json \
    --model_name_or_path t5-small --per_device_train_batch_size 1   \
    --output_dir output_dir --overwrite_output_dir --fp16 \
@@ -617,7 +617,7 @@ Notes:

   .. code-block:: bash

-       deepspeed --include localhost:1 examples/seq2seq/run_translation.py ...
+       deepspeed --include localhost:1 examples/pytorch/translation/run_translation.py ...

   In this example, we tell DeepSpeed to use GPU 1 (second gpu).

@@ -711,7 +711,7 @@ shell from a cell. For example, to use ``run_translation.py`` you would launch i
 .. code-block::

    !git clone https://github.com/huggingface/transformers
-    !cd transformers; deepspeed examples/seq2seq/run_translation.py ...
+    !cd transformers; deepspeed examples/pytorch/translation/run_translation.py ...

 or with ``%%bash`` magic, where you can write a multi-line code for the shell program to run:

@@ -721,7 +721,7 @@ or with ``%%bash`` magic, where you can write a multi-line code for the shell pr

    git clone https://github.com/huggingface/transformers
    cd transformers
-    deepspeed examples/seq2seq/run_translation.py ...
+    deepspeed examples/pytorch/translation/run_translation.py ...

 In such case you don't need any of the code presented at the beginning of this section.

--- a/docs/source/model_doc/bart.rst
+++ b/docs/source/model_doc/bart.rst
@@ -43,7 +43,7 @@ Examples
 _______________________________________________________________________________________________________________________

 - Examples and scripts for fine-tuning BART and other models for sequence to sequence tasks can be found in
-  :prefix_link:`examples/seq2seq/ <examples/seq2seq/README.md>`.
+  :prefix_link:`examples/pytorch/summarization/ <examples/pytorch/summarization/README.md>`.
 - An example of how to train :class:`~transformers.BartForConditionalGeneration` with a Hugging Face :obj:`datasets`
  object can be found in this `forum discussion
  <https://discuss.huggingface.co/t/train-bart-for-conditional-generation-e-g-summarization/1904>`__.
--- a/docs/source/model_doc/barthez.rst
+++ b/docs/source/model_doc/barthez.rst
@@ -43,7 +43,7 @@ Examples
 _______________________________________________________________________________________________________________________

 - BARThez can be fine-tuned on sequence-to-sequence tasks in a similar way as BART, check:
-  :prefix_link:`examples/seq2seq/ <examples/seq2seq/README.md>`.
+  :prefix_link:`examples/pytorch/summarization/ <examples/pytorch/summarization/README.md>`.


 BarthezTokenizer
--- a/docs/source/model_doc/distilbert.rst
+++ b/docs/source/model_doc/distilbert.rst
@@ -44,8 +44,8 @@ Tips:
 - DistilBERT doesn't have options to select the input positions (:obj:`position_ids` input). This could be added if
  necessary though, just let us know if you need this option.

-This model was contributed by `victorsanh <https://huggingface.co/victorsanh>`__. The original code can be found `here
-<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
+This model was contributed by `victorsanh <https://huggingface.co/victorsanh>`__. The original code can be found
+:prefix_link:`here <examples/research-projects/distillation>`.


 DistilBertConfig
--- a/docs/source/model_doc/pegasus.rst
+++ b/docs/source/model_doc/pegasus.rst
@@ -53,7 +53,8 @@ Examples
 _______________________________________________________________________________________________________________________

 - :prefix_link:`Script <examples/research_projects/seq2seq-distillation/finetune_pegasus_xsum.sh>` to fine-tune pegasus
-  on the XSUM dataset. Data download instructions at :prefix_link:`examples/seq2seq/ <examples/seq2seq/README.md>`.
+  on the XSUM dataset. Data download instructions at :prefix_link:`examples/pytorch/summarization/
+  <examples/pytorch/summarization/README.md>`.
 - FP16 is not supported (help/ideas on this appreciated!).
 - The adafactor optimizer is recommended for pegasus fine-tuning.

--- a/docs/source/model_doc/retribert.rst
+++ b/docs/source/model_doc/retribert.rst
@@ -21,7 +21,7 @@ Question Answering <https://yjernite.github.io/lfqa.html>`__. RetriBERT is a sma
 pair of BERT encoders with lower-dimension projection for dense semantic indexing of text.

 This model was contributed by `yjernite <https://huggingface.co/yjernite>`__. Code to train and use the model can be
-found `here <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
+found :prefix_link:`here <examples/research-projects/distillation>`.


 RetriBertConfig
--- a/docs/source/model_doc/xlnet.rst
+++ b/docs/source/model_doc/xlnet.rst
@@ -41,7 +41,7 @@ Tips:
  using only a sub-set of the output tokens as target which are selected with the :obj:`target_mapping` input.
 - To use XLNet for sequential decoding (i.e. not in fully bi-directional setting), use the :obj:`perm_mask` and
  :obj:`target_mapping` inputs to control the attention span and outputs (see examples in
-  `examples/text-generation/run_generation.py`)
+  `examples/pytorch/text-generation/run_generation.py`)
 - XLNet is one of the few models that has no sequence length limit.

 This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
--- a/docs/source/model_summary.rst
+++ b/docs/source/model_summary.rst
@@ -682,7 +682,8 @@ The `mbart-large-en-ro checkpoint <https://huggingface.co/facebook/mbart-large-e
 romanian translation.

 The `mbart-large-cc25 <https://huggingface.co/facebook/mbart-large-cc25>`_ checkpoint can be finetuned for other
-translation and summarization tasks, using code in ```examples/seq2seq/``` , but is not very useful without finetuning.
+translation and summarization tasks, using code in ```examples/pytorch/translation/``` , but is not very useful without
+finetuning.


 ProphetNet
--- a/docs/source/multilingual.rst
+++ b/docs/source/multilingual.rst
@@ -90,8 +90,8 @@ You can then feed it all as input to your model:
    >>> outputs = model(input_ids, langs=langs)


-The example :prefix_link:`run_generation.py <examples/text-generation/run_generation.py>` can generate text using the
-CLM checkpoints from XLM, using the language embeddings.
+The example :prefix_link:`run_generation.py <examples/pytorch/text-generation/run_generation.py>` can generate text
+using the CLM checkpoints from XLM, using the language embeddings.

 XLM without Language Embeddings
 -----------------------------------------------------------------------------------------------------------------------
--- a/docs/source/sagemaker.md
+++ b/docs/source/sagemaker.md
@@ -325,7 +325,7 @@ When you create a `HuggingFace` Estimator, you can specify a [training script th

 If you are using `git_config` to run the [🤗 Transformers examples scripts](https://github.com/huggingface/transformers/tree/master/examples) keep in mind that you need to configure the right `'branch'` for you `transformers_version`, e.g. if you use `transformers_version='4.4.2` you have to use `'branch':'v4.4.2'`. 

-As an example to use `git_config` with an [example script from the transformers repository](https://github.com/huggingface/transformers/tree/master/examples/text-classification).
+As an example to use `git_config` with an [example script from the transformers repository](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification).

 _Tip: define `output_dir` as `/opt/ml/model` in the hyperparameter for the script to save your model to S3 after training._

@@ -338,7 +338,7 @@ git_config = {'repo': 'https://github.com/huggingface/transformers.git','branch'
 # create the Estimator
 huggingface_estimator = HuggingFace(
        entry_point='run_glue.py',
-        source_dir='./examples/text-classification',
+        source_dir='./examples/pytorch/text-classification',
        git_config=git_config,
        instance_type='ml.p3.2xlarge',
        instance_count=1,
--- a/docs/source/task_summary.rst
+++ b/docs/source/task_summary.rst
@@ -55,10 +55,10 @@ Sequence Classification
 Sequence classification is the task of classifying sequences according to a given number of classes. An example of
 sequence classification is the GLUE dataset, which is entirely based on that task. If you would like to fine-tune a
 model on a GLUE sequence classification task, you may leverage the :prefix_link:`run_glue.py
-<examples/text-classification/run_glue.py>`, :prefix_link:`run_tf_glue.py
-<examples/text-classification/run_tf_glue.py>`, :prefix_link:`run_tf_text_classification.py
-<examples/text-classification/run_tf_text_classification.py>` or :prefix_link:`run_xnli.py
-<examples/text-classification/run_xnli.py>` scripts.
+<examples/pytorch/text-classification/run_glue.py>`, :prefix_link:`run_tf_glue.py
+<examples/tensorflow/text-classification/run_tf_glue.py>`, :prefix_link:`run_tf_text_classification.py
+<examples/tensorflow/text-classification/run_tf_text_classification.py>` or :prefix_link:`run_xnli.py
+<examples/pytorch/text-classification/run_xnli.py>` scripts.

 Here is an example of using pipelines to do sentiment analysis: identifying if a sequence is positive or negative. It
 leverages a fine-tuned model on sst2, which is a GLUE task.
@@ -168,8 +168,10 @@ Extractive Question Answering
 Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
 question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune a
 model on a SQuAD task, you may leverage the `run_qa.py
-<https://github.com/huggingface/transformers/tree/master/examples/question-answering/run_qa.py>`__ and `run_tf_squad.py
-<https://github.com/huggingface/transformers/tree/master/examples/question-answering/run_tf_squad.py>`__ scripts.
+<https://github.com/huggingface/transformers/tree/master/examples/pytorch/question-answering/run_qa.py>`__ and
+`run_tf_squad.py
+<https://github.com/huggingface/transformers/tree/master/examples/tensorflow/question-answering/run_tf_squad.py>`__
+scripts.


 Here is an example of using pipelines to do question answering: extracting an answer from a text given a question. It
@@ -184,7 +186,7 @@ leverages a fine-tuned model on SQuAD.
    >>> context = r"""
    ... Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
    ... question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
-    ... a model on a SQuAD task, you may leverage the examples/question-answering/run_squad.py script.
+    ... a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
    ... """

 This returns an answer extracted from the text, a confidence score, alongside "start" and "end" values, which are the
@@ -325,8 +327,7 @@ fill that mask with an appropriate token. This allows the model to attend to bot
 right of the mask) and the left context (tokens on the left of the mask). Such a training creates a strong basis for
 downstream tasks requiring bi-directional context, such as SQuAD (question answering, see `Lewis, Lui, Goyal et al.
 <https://arxiv.org/abs/1910.13461>`__, part 4.2). If you would like to fine-tune a model on a masked language modeling
-task, you may leverage the `run_mlm.py
-<https://github.com/huggingface/transformers/tree/master/examples/language-modeling/run_mlm.py>`__ script.
+task, you may leverage the :prefix_link:`run_mlm.py <examples/pytorch/language-modeling/run_mlm.py>` script.

 Here is an example of using pipelines to replace a mask from a sequence:

@@ -435,7 +436,7 @@ Causal Language Modeling
 Causal language modeling is the task of predicting the token following a sequence of tokens. In this situation, the
 model only attends to the left context (tokens on the left of the mask). Such a training is particularly interesting
 for generation tasks. If you would like to fine-tune a model on a causal language modeling task, you may leverage the
-`run_clm.py <https://github.com/huggingface/transformers/tree/master/examples/language-modeling/run_clm.py>`__ script.
+:prefix_link:`run_clm.py <examples/pytorch/language-modeling/run_clm.py>` script.

 Usually, the next token is predicted by sampling from the logits of the last hidden state the model produces from the
 input sequence.
@@ -602,8 +603,7 @@ Named Entity Recognition
 Named Entity Recognition (NER) is the task of classifying tokens according to a class, for example, identifying a token
 as a person, an organisation or a location. An example of a named entity recognition dataset is the CoNLL-2003 dataset,
 which is entirely based on that task. If you would like to fine-tune a model on an NER task, you may leverage the
-`run_ner.py <https://github.com/huggingface/transformers/tree/master/examples/token-classification/run_ner.py>`__
-script.
+:prefix_link:`run_ner.py <examples/pytorch/token-classification/run_ner.py>` script.

 Here is an example of using pipelines to do named entity recognition, specifically, trying to identify tokens as
 belonging to one of 9 classes:
@@ -743,11 +743,12 @@ Summarization

 Summarization is the task of summarizing a document or an article into a shorter text. If you would like to fine-tune a
 model on a summarization task, you may leverage the `run_summarization.py
-<https://github.com/huggingface/transformers/tree/master/examples/seq2seq/run_summarization.py>`__ script.
+<https://github.com/huggingface/transformers/tree/master/examples/pytorch/summarization/run_summarization.py>`__
+script.

 An example of a summarization dataset is the CNN / Daily Mail dataset, which consists of long news articles and was
 created for the task of summarization. If you would like to fine-tune a model on a summarization task, various
-approaches are described in this :prefix_link:`document <examples/seq2seq/README.md>`.
+approaches are described in this :prefix_link:`document <examples/pytorch/summarization/README.md>`.

 Here is an example of using the pipelines to do summarization. It leverages a Bart model that was fine-tuned on the CNN
 / Daily Mail data set.
@@ -794,7 +795,7 @@ Here is an example of doing summarization using a model and a tokenizer. The pro
 3. Add the T5 specific prefix "summarize: ".
 4. Use the ``PreTrainedModel.generate()`` method to generate the summary.

-In this example we use Google`s T5 model. Even though it was pre-trained only on a multi-task mixed dataset (including
+In this example we use Google's T5 model. Even though it was pre-trained only on a multi-task mixed dataset (including
 CNN / Daily Mail), it yields very good results.

 .. code-block::
@@ -823,11 +824,12 @@ Translation

 Translation is the task of translating a text from one language to another. If you would like to fine-tune a model on a
 translation task, you may leverage the `run_translation.py
-<https://github.com/huggingface/transformers/tree/master/examples/seq2seq/run_translation.py>`__ script.
+<https://github.com/huggingface/transformers/tree/master/examples/pytorch/translation/run_translation.py>`__ script.

 An example of a translation dataset is the WMT English to German dataset, which has sentences in English as the input
 data and the corresponding sentences in German as the target data. If you would like to fine-tune a model on a
-translation task, various approaches are described in this :prefix_link:`document <examples/seq2seq/README.md>`.
+translation task, various approaches are described in this :prefix_link:`document
+<examples/pytorch.translation/README.md>`.

 Here is an example of using the pipelines to do translation. It leverages a T5 model that was only pre-trained on a
 multi-task mixture dataset (including WMT), yet, yielding impressive translation results.