split seq2seq script into summarization & translation (#10611)
* split seq2seq script, update docs * needless diff * fix readme * remove test diff * s/summarization/translation Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * cr * fix arguments & better mbart/t5 refs * copyright Co-authored-by: Suraj Patil <surajp815@gmail.com> * reword readme Co-authored-by: Suraj Patil <surajp815@gmail.com> * s/summarization/translation * short script names * fix tests * fix isort, include mbart doc * delete old script, update tests * automate source prefix * automate source prefix for translation * s/translation/trans Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * fix script name (short version) * typos Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * exact parameter Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * remove superfluous source_prefix calls in docs * rename scripts & warn for source prefix * black * flake8 Co-authored-by: theo <theo@matussie.re> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
This commit is contained in:
@@ -168,13 +168,13 @@ Here is an example of how this can be used on a filesystem that is shared betwee
|
||||
On the instance with the normal network run your program which will download and cache models (and optionally datasets if you use 🤗 Datasets). For example:
|
||||
|
||||
```
|
||||
python examples/seq2seq/run_seq2seq.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
|
||||
python examples/seq2seq/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
|
||||
```
|
||||
|
||||
and then with the same filesystem you can now run the same program on a firewalled instance:
|
||||
```
|
||||
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
|
||||
python examples/seq2seq/run_seq2seq.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
|
||||
python examples/seq2seq/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
|
||||
```
|
||||
and it should succeed without any hanging waiting to timeout.
|
||||
|
||||
|
||||
@@ -279,16 +279,16 @@ To deploy this feature:
|
||||
and make sure you have added the distributed launcher ``-m torch.distributed.launch
|
||||
--nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE`` if you haven't been using it already.
|
||||
|
||||
For example here is how you could use it for ``run_seq2seq.py`` with 2 GPUs:
|
||||
For example here is how you could use it for ``run_translation.py`` with 2 GPUs:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python -m torch.distributed.launch --nproc_per_node=2 examples/seq2seq/run_seq2seq.py \
|
||||
python -m torch.distributed.launch --nproc_per_node=2 examples/seq2seq/run_translation.py \
|
||||
--model_name_or_path t5-small --per_device_train_batch_size 1 \
|
||||
--output_dir output_dir --overwrite_output_dir \
|
||||
--do_train --max_train_samples 500 --num_train_epochs 1 \
|
||||
--dataset_name wmt16 --dataset_config "ro-en" \
|
||||
--task translation_en_to_ro --source_prefix "translate English to Romanian: " \
|
||||
--source_lang en --target_lang ro \
|
||||
--fp16 --sharded_ddp simple
|
||||
|
||||
Notes:
|
||||
@@ -304,16 +304,16 @@ Notes:
|
||||
to the command line arguments, and make sure you have added the distributed launcher ``-m torch.distributed.launch
|
||||
--nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE`` if you haven't been using it already.
|
||||
|
||||
For example here is how you could use it for ``run_seq2seq.py`` with 2 GPUs:
|
||||
For example here is how you could use it for ``run_translation.py`` with 2 GPUs:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python -m torch.distributed.launch --nproc_per_node=2 examples/seq2seq/run_seq2seq.py \
|
||||
python -m torch.distributed.launch --nproc_per_node=2 examples/seq2seq/run_translation.py \
|
||||
--model_name_or_path t5-small --per_device_train_batch_size 1 \
|
||||
--output_dir output_dir --overwrite_output_dir \
|
||||
--do_train --max_train_samples 500 --num_train_epochs 1 \
|
||||
--dataset_name wmt16 --dataset_config "ro-en" \
|
||||
--task translation_en_to_ro --source_prefix "translate English to Romanian: " \
|
||||
--source_lang en --target_lang ro \
|
||||
--fp16 --sharded_ddp zero_dp_2
|
||||
|
||||
:obj:`zero_dp_2` is an optimized version of the simple wrapper, while :obj:`zero_dp_3` fully shards model weights,
|
||||
@@ -333,7 +333,7 @@ Notes:
|
||||
|
||||
Known caveats:
|
||||
|
||||
- This feature is incompatible with :obj:`--predict_with_generate` in the `run_seq2seq.py` script.
|
||||
- This feature is incompatible with :obj:`--predict_with_generate` in the `run_translation.py` script.
|
||||
- Using :obj:`--sharded_ddp zero_dp_3` requires wrapping each layer of the model in the special container
|
||||
:obj:`FullyShardedDataParallelism` of fairscale. It should be used with the option :obj:`auto_wrap` if you are not
|
||||
doing this yourself: :obj:`--sharded_ddp "zero_dp_3 auto_wrap"`.
|
||||
@@ -402,17 +402,17 @@ In fact, you can continue using ``-m torch.distributed.launch`` with DeepSpeed a
|
||||
the ``deepspeed`` launcher. But since in the DeepSpeed documentation it'll be used everywhere, for consistency we will
|
||||
use it here as well.
|
||||
|
||||
Here is an example of running ``run_seq2seq.py`` under DeepSpeed deploying all available GPUs:
|
||||
Here is an example of running ``run_translation.py`` under DeepSpeed deploying all available GPUs:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
deepspeed examples/seq2seq/run_seq2seq.py \
|
||||
deepspeed examples/seq2seq/run_translation.py \
|
||||
--deepspeed examples/tests/deepspeed/ds_config.json \
|
||||
--model_name_or_path t5-small --per_device_train_batch_size 1 \
|
||||
--output_dir output_dir --overwrite_output_dir --fp16 \
|
||||
--do_train --max_train_samples 500 --num_train_epochs 1 \
|
||||
--dataset_name wmt16 --dataset_config "ro-en" \
|
||||
--task translation_en_to_ro --source_prefix "translate English to Romanian: "
|
||||
--source_lang en --target_lang ro
|
||||
|
||||
|
||||
Note that in the DeepSpeed documentation you are likely to see ``--deepspeed --deepspeed_config ds_config.json`` - i.e.
|
||||
@@ -431,13 +431,13 @@ To deploy DeepSpeed with one GPU adjust the :class:`~transformers.Trainer` comma
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
deepspeed --num_gpus=1 examples/seq2seq/run_seq2seq.py \
|
||||
deepspeed --num_gpus=1 examples/seq2seq/run_translation.py \
|
||||
--deepspeed examples/tests/deepspeed/ds_config.json \
|
||||
--model_name_or_path t5-small --per_device_train_batch_size 1 \
|
||||
--output_dir output_dir --overwrite_output_dir --fp16 \
|
||||
--do_train --max_train_samples 500 --num_train_epochs 1 \
|
||||
--dataset_name wmt16 --dataset_config "ro-en" \
|
||||
--task translation_en_to_ro --source_prefix "translate English to Romanian: "
|
||||
--source_lang en --target_lang ro
|
||||
|
||||
This is almost the same as with multiple-GPUs, but here we tell DeepSpeed explicitly to use just one GPU. By default,
|
||||
DeepSpeed deploys all GPUs it can see. If you have only 1 GPU to start with, then you don't need this argument. The
|
||||
@@ -483,7 +483,7 @@ Notes:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
deepspeed --include localhost:1 examples/seq2seq/run_seq2seq.py ...
|
||||
deepspeed --include localhost:1 examples/seq2seq/run_translation.py ...
|
||||
|
||||
In this example, we tell DeepSpeed to use GPU 1 (second gpu).
|
||||
|
||||
@@ -574,7 +574,7 @@ with:
|
||||
|
||||
.. code-block::
|
||||
|
||||
!deepspeed examples/seq2seq/run_seq2seq.py ...
|
||||
!deepspeed examples/seq2seq/run_translation.py ...
|
||||
|
||||
or with bash magic, where you can write a multi-line code for the shell to run:
|
||||
|
||||
@@ -583,7 +583,7 @@ or with bash magic, where you can write a multi-line code for the shell to run:
|
||||
%%bash
|
||||
|
||||
cd /somewhere
|
||||
deepspeed examples/seq2seq/run_seq2seq.py ...
|
||||
deepspeed examples/seq2seq/run_translation.py ...
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -742,8 +742,8 @@ Summarization
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Summarization is the task of summarizing a document or an article into a shorter text. If you would like to fine-tune a
|
||||
model on a summarization task, you may leverage the `run_seq2seq.py
|
||||
<https://github.com/huggingface/transformers/tree/master/examples/seq2seq/run_seq2seq.py>`__ script.
|
||||
model on a summarization task, you may leverage the `run_summarization.py
|
||||
<https://github.com/huggingface/transformers/tree/master/examples/seq2seq/run_summarization.py>`__ script.
|
||||
|
||||
An example of a summarization dataset is the CNN / Daily Mail dataset, which consists of long news articles and was
|
||||
created for the task of summarization. If you would like to fine-tune a model on a summarization task, various
|
||||
@@ -822,8 +822,8 @@ Translation
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Translation is the task of translating a text from one language to another. If you would like to fine-tune a model on a
|
||||
translation task, you may leverage the `run_seq2seq.py
|
||||
<https://github.com/huggingface/transformers/tree/master/examples/seq2seq/run_seq2seq.py>`__ script.
|
||||
translation task, you may leverage the `run_translation.py
|
||||
<https://github.com/huggingface/transformers/tree/master/examples/seq2seq/run_translation.py>`__ script.
|
||||
|
||||
An example of a translation dataset is the WMT English to German dataset, which has sentences in English as the input
|
||||
data and the corresponding sentences in German as the target data. If you would like to fine-tune a model on a
|
||||
|
||||
Reference in New Issue
Block a user