Indent code block in the documentation (#11233)
* Indent code block * Indent code blocks version 2 * Quality
This commit is contained in:
@@ -388,7 +388,7 @@ Next, you can finally start adding new code to 🤗 Transformers. Go into the cl
|
||||
|
||||
::
|
||||
|
||||
cd transformers
|
||||
cd transformers
|
||||
|
||||
In the special case that you are adding a model whose architecture exactly matches the model architecture of an
|
||||
existing model you only have to add a conversion script as described in `this section <#write-a-conversion-script>`__.
|
||||
@@ -417,27 +417,27 @@ You should do the following:
|
||||
|
||||
::
|
||||
|
||||
git checkout -b add_brand_new_bert
|
||||
git checkout -b add_brand_new_bert
|
||||
|
||||
2. Commit the automatically generated code:
|
||||
|
||||
::
|
||||
|
||||
git add .
|
||||
git commit
|
||||
git add .
|
||||
git commit
|
||||
|
||||
3. Fetch and rebase to current master
|
||||
|
||||
::
|
||||
|
||||
git fetch upstream
|
||||
git rebase upstream/master
|
||||
git fetch upstream
|
||||
git rebase upstream/master
|
||||
|
||||
4. Push the changes to your account using:
|
||||
|
||||
::
|
||||
|
||||
git push -u origin a-descriptive-name-for-my-changes
|
||||
git push -u origin a-descriptive-name-for-my-changes
|
||||
|
||||
5. Once you are satisfied, go to the webpage of your fork on GitHub. Click on “Pull request”. Make sure to add the
|
||||
GitHub handle of some members of the Hugging Face team as reviewers, so that the Hugging Face team gets notified for
|
||||
@@ -451,8 +451,8 @@ time to time by doing:
|
||||
|
||||
::
|
||||
|
||||
git fetch upstream
|
||||
git merge upstream/master
|
||||
git fetch upstream
|
||||
git merge upstream/master
|
||||
|
||||
In general, all questions you might have regarding the model or your implementation should be asked in your PR and
|
||||
discussed/solved in the PR. This way, the Hugging Face team will always be notified when you are committing new code or
|
||||
|
||||
@@ -47,12 +47,12 @@ Here is an example of the conversion process for a pre-trained ``BERT-Base Uncas
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
|
||||
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
|
||||
|
||||
transformers-cli convert --model_type bert \
|
||||
--tf_checkpoint $BERT_BASE_DIR/bert_model.ckpt \
|
||||
--config $BERT_BASE_DIR/bert_config.json \
|
||||
--pytorch_dump_output $BERT_BASE_DIR/pytorch_model.bin
|
||||
transformers-cli convert --model_type bert \
|
||||
--tf_checkpoint $BERT_BASE_DIR/bert_model.ckpt \
|
||||
--config $BERT_BASE_DIR/bert_config.json \
|
||||
--pytorch_dump_output $BERT_BASE_DIR/pytorch_model.bin
|
||||
|
||||
You can download Google's pre-trained models for the conversion `here
|
||||
<https://github.com/google-research/bert#pre-trained-models>`__.
|
||||
@@ -72,12 +72,12 @@ Here is an example of the conversion process for the pre-trained ``ALBERT Base``
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
export ALBERT_BASE_DIR=/path/to/albert/albert_base
|
||||
export ALBERT_BASE_DIR=/path/to/albert/albert_base
|
||||
|
||||
transformers-cli convert --model_type albert \
|
||||
--tf_checkpoint $ALBERT_BASE_DIR/model.ckpt-best \
|
||||
--config $ALBERT_BASE_DIR/albert_config.json \
|
||||
--pytorch_dump_output $ALBERT_BASE_DIR/pytorch_model.bin
|
||||
transformers-cli convert --model_type albert \
|
||||
--tf_checkpoint $ALBERT_BASE_DIR/model.ckpt-best \
|
||||
--config $ALBERT_BASE_DIR/albert_config.json \
|
||||
--pytorch_dump_output $ALBERT_BASE_DIR/pytorch_model.bin
|
||||
|
||||
You can download Google's pre-trained models for the conversion `here
|
||||
<https://github.com/google-research/albert#pre-trained-models>`__.
|
||||
@@ -91,13 +91,13 @@ save as the same format than OpenAI pretrained model (see `here <https://github.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
export OPENAI_GPT_CHECKPOINT_FOLDER_PATH=/path/to/openai/pretrained/numpy/weights
|
||||
export OPENAI_GPT_CHECKPOINT_FOLDER_PATH=/path/to/openai/pretrained/numpy/weights
|
||||
|
||||
transformers-cli convert --model_type gpt \
|
||||
--tf_checkpoint $OPENAI_GPT_CHECKPOINT_FOLDER_PATH \
|
||||
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
|
||||
[--config OPENAI_GPT_CONFIG] \
|
||||
[--finetuning_task_name OPENAI_GPT_FINETUNED_TASK] \
|
||||
transformers-cli convert --model_type gpt \
|
||||
--tf_checkpoint $OPENAI_GPT_CHECKPOINT_FOLDER_PATH \
|
||||
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
|
||||
[--config OPENAI_GPT_CONFIG] \
|
||||
[--finetuning_task_name OPENAI_GPT_FINETUNED_TASK] \
|
||||
|
||||
|
||||
OpenAI GPT-2
|
||||
@@ -108,13 +108,13 @@ Here is an example of the conversion process for a pre-trained OpenAI GPT-2 mode
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/gpt2/pretrained/weights
|
||||
export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/gpt2/pretrained/weights
|
||||
|
||||
transformers-cli convert --model_type gpt2 \
|
||||
--tf_checkpoint $OPENAI_GPT2_CHECKPOINT_PATH \
|
||||
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
|
||||
[--config OPENAI_GPT2_CONFIG] \
|
||||
[--finetuning_task_name OPENAI_GPT2_FINETUNED_TASK]
|
||||
transformers-cli convert --model_type gpt2 \
|
||||
--tf_checkpoint $OPENAI_GPT2_CHECKPOINT_PATH \
|
||||
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
|
||||
[--config OPENAI_GPT2_CONFIG] \
|
||||
[--finetuning_task_name OPENAI_GPT2_FINETUNED_TASK]
|
||||
|
||||
Transformer-XL
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
@@ -124,13 +124,13 @@ Here is an example of the conversion process for a pre-trained Transformer-XL mo
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
export TRANSFO_XL_CHECKPOINT_FOLDER_PATH=/path/to/transfo/xl/checkpoint
|
||||
export TRANSFO_XL_CHECKPOINT_FOLDER_PATH=/path/to/transfo/xl/checkpoint
|
||||
|
||||
transformers-cli convert --model_type transfo_xl \
|
||||
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_FOLDER_PATH \
|
||||
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
|
||||
[--config TRANSFO_XL_CONFIG] \
|
||||
[--finetuning_task_name TRANSFO_XL_FINETUNED_TASK]
|
||||
transformers-cli convert --model_type transfo_xl \
|
||||
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_FOLDER_PATH \
|
||||
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
|
||||
[--config TRANSFO_XL_CONFIG] \
|
||||
[--finetuning_task_name TRANSFO_XL_FINETUNED_TASK]
|
||||
|
||||
|
||||
XLNet
|
||||
@@ -140,14 +140,14 @@ Here is an example of the conversion process for a pre-trained XLNet model:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
export TRANSFO_XL_CHECKPOINT_PATH=/path/to/xlnet/checkpoint
|
||||
export TRANSFO_XL_CONFIG_PATH=/path/to/xlnet/config
|
||||
export TRANSFO_XL_CHECKPOINT_PATH=/path/to/xlnet/checkpoint
|
||||
export TRANSFO_XL_CONFIG_PATH=/path/to/xlnet/config
|
||||
|
||||
transformers-cli convert --model_type xlnet \
|
||||
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_PATH \
|
||||
--config $TRANSFO_XL_CONFIG_PATH \
|
||||
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
|
||||
[--finetuning_task_name XLNET_FINETUNED_TASK] \
|
||||
transformers-cli convert --model_type xlnet \
|
||||
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_PATH \
|
||||
--config $TRANSFO_XL_CONFIG_PATH \
|
||||
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
|
||||
[--finetuning_task_name XLNET_FINETUNED_TASK] \
|
||||
|
||||
|
||||
XLM
|
||||
@@ -157,13 +157,13 @@ Here is an example of the conversion process for a pre-trained XLM model:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
export XLM_CHECKPOINT_PATH=/path/to/xlm/checkpoint
|
||||
export XLM_CHECKPOINT_PATH=/path/to/xlm/checkpoint
|
||||
|
||||
transformers-cli convert --model_type xlm \
|
||||
--tf_checkpoint $XLM_CHECKPOINT_PATH \
|
||||
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT
|
||||
[--config XML_CONFIG] \
|
||||
[--finetuning_task_name XML_FINETUNED_TASK]
|
||||
transformers-cli convert --model_type xlm \
|
||||
--tf_checkpoint $XLM_CHECKPOINT_PATH \
|
||||
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT
|
||||
[--config XML_CONFIG] \
|
||||
[--finetuning_task_name XML_FINETUNED_TASK]
|
||||
|
||||
|
||||
T5
|
||||
@@ -173,9 +173,9 @@ Here is an example of the conversion process for a pre-trained T5 model:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
export T5=/path/to/t5/uncased_L-12_H-768_A-12
|
||||
export T5=/path/to/t5/uncased_L-12_H-768_A-12
|
||||
|
||||
transformers-cli convert --model_type t5 \
|
||||
--tf_checkpoint $T5/t5_model.ckpt \
|
||||
--config $T5/t5_config.json \
|
||||
--pytorch_dump_output $T5/pytorch_model.bin
|
||||
transformers-cli convert --model_type t5 \
|
||||
--tf_checkpoint $T5/t5_model.ckpt \
|
||||
--config $T5/t5_config.json \
|
||||
--pytorch_dump_output $T5/pytorch_model.bin
|
||||
|
||||
@@ -182,7 +182,7 @@ such:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> # [CLS] SEQUENCE_A [SEP] SEQUENCE_B [SEP]
|
||||
>>> # [CLS] SEQUENCE_A [SEP] SEQUENCE_B [SEP]
|
||||
|
||||
We can use our tokenizer to automatically generate such a sentence by passing the two sequences to ``tokenizer`` as two
|
||||
arguments (and not a list, like before) like this:
|
||||
|
||||
@@ -293,33 +293,33 @@ with it, you may want to try one of:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install fairscale --no-build-isolation .
|
||||
pip install fairscale --no-build-isolation .
|
||||
|
||||
or:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
git clone https://github.com/facebookresearch/fairscale/
|
||||
cd fairscale
|
||||
rm -r dist build
|
||||
python setup.py bdist_wheel
|
||||
pip uninstall -y fairscale
|
||||
pip install dist/fairscale-*.whl
|
||||
git clone https://github.com/facebookresearch/fairscale/
|
||||
cd fairscale
|
||||
rm -r dist build
|
||||
python setup.py bdist_wheel
|
||||
pip uninstall -y fairscale
|
||||
pip install dist/fairscale-*.whl
|
||||
|
||||
``fairscale`` also has issues with building against pytorch-nightly, so if you use it you may have to try one of:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip uninstall -y fairscale; pip install fairscale --pre \
|
||||
-f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html \
|
||||
--no-cache --no-build-isolation
|
||||
pip uninstall -y fairscale; pip install fairscale --pre \
|
||||
-f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html \
|
||||
--no-cache --no-build-isolation
|
||||
|
||||
or:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install -v --disable-pip-version-check . \
|
||||
-f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html --pre
|
||||
pip install -v --disable-pip-version-check . \
|
||||
-f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html --pre
|
||||
|
||||
Of course, adjust the urls to match the cuda version you use.
|
||||
|
||||
@@ -447,12 +447,12 @@ To make a local build for DeepSpeed:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
git clone https://github.com/microsoft/DeepSpeed/
|
||||
cd DeepSpeed
|
||||
rm -rf build
|
||||
TORCH_CUDA_ARCH_LIST="6.1;8.6" DS_BUILD_OPS=1 pip install . \
|
||||
--global-option="build_ext" --global-option="-j8" --no-cache -v \
|
||||
--disable-pip-version-check 2>&1 | tee build.log
|
||||
git clone https://github.com/microsoft/DeepSpeed/
|
||||
cd DeepSpeed
|
||||
rm -rf build
|
||||
TORCH_CUDA_ARCH_LIST="6.1;8.6" DS_BUILD_OPS=1 pip install . \
|
||||
--global-option="build_ext" --global-option="-j8" --no-cache -v \
|
||||
--disable-pip-version-check 2>&1 | tee build.log
|
||||
|
||||
Edit ``TORCH_CUDA_ARCH_LIST`` to insert the code for the architectures of the GPU cards you intend to use.
|
||||
|
||||
@@ -460,11 +460,11 @@ Or if you need to use the same setup on multiple machines, make a binary wheel:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
git clone https://github.com/microsoft/DeepSpeed/
|
||||
cd DeepSpeed
|
||||
rm -rf build
|
||||
TORCH_CUDA_ARCH_LIST="6.1;8.6" DS_BUILD_OPS=1 \
|
||||
python setup.py build_ext -j8 bdist_wheel
|
||||
git clone https://github.com/microsoft/DeepSpeed/
|
||||
cd DeepSpeed
|
||||
rm -rf build
|
||||
TORCH_CUDA_ARCH_LIST="6.1;8.6" DS_BUILD_OPS=1 \
|
||||
python setup.py build_ext -j8 bdist_wheel
|
||||
|
||||
it will generate something like ``dist/deepspeed-0.3.13+8cd046f-cp38-cp38-linux_x86_64.whl`` which now you can install
|
||||
as ``pip install deepspeed-0.3.13+8cd046f-cp38-cp38-linux_x86_64.whl`` locally or on any other machine.
|
||||
@@ -478,20 +478,20 @@ You can check the archs pytorch was built with using:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python -c "import torch; print(torch.cuda.get_arch_list())"
|
||||
python -c "import torch; print(torch.cuda.get_arch_list())"
|
||||
|
||||
Here is how to find out the arch for one of the installed GPU. For example, for GPU 0:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
CUDA_VISIBLE_DEVICES=0 python -c "import torch; \
|
||||
print(torch.cuda.get_device_properties(torch.device('cuda')))"
|
||||
CUDA_VISIBLE_DEVICES=0 python -c "import torch; \
|
||||
print(torch.cuda.get_device_properties(torch.device('cuda')))"
|
||||
|
||||
If the output is:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
_CudaDeviceProperties(name='GeForce RTX 3090', major=8, minor=6, total_memory=24268MB, multi_processor_count=82)
|
||||
_CudaDeviceProperties(name='GeForce RTX 3090', major=8, minor=6, total_memory=24268MB, multi_processor_count=82)
|
||||
|
||||
then you know that this card's arch is ``8.6``.
|
||||
|
||||
@@ -591,18 +591,18 @@ with DeepSpeed is to have at least the following configuration in the configurat
|
||||
|
||||
.. code-block:: json
|
||||
|
||||
{
|
||||
"zero_optimization": {
|
||||
"stage": 2,
|
||||
"allgather_partitions": true,
|
||||
"allgather_bucket_size": 2e8,
|
||||
"reduce_scatter": true,
|
||||
"reduce_bucket_size": 2e8,
|
||||
"overlap_comm": true,
|
||||
"contiguous_gradients": true,
|
||||
"cpu_offload": true
|
||||
},
|
||||
}
|
||||
{
|
||||
"zero_optimization": {
|
||||
"stage": 2,
|
||||
"allgather_partitions": true,
|
||||
"allgather_bucket_size": 2e8,
|
||||
"reduce_scatter": true,
|
||||
"reduce_bucket_size": 2e8,
|
||||
"overlap_comm": true,
|
||||
"contiguous_gradients": true,
|
||||
"cpu_offload": true
|
||||
},
|
||||
}
|
||||
|
||||
which enables ``cpu_offload`` and some other important features. You may experiment with the buffer sizes, you will
|
||||
find more details in the discussion below.
|
||||
@@ -710,18 +710,18 @@ shell from a cell. For example, to use ``run_translation.py`` you would launch i
|
||||
|
||||
.. code-block::
|
||||
|
||||
!git clone https://github.com/huggingface/transformers
|
||||
!cd transformers; deepspeed examples/seq2seq/run_translation.py ...
|
||||
!git clone https://github.com/huggingface/transformers
|
||||
!cd transformers; deepspeed examples/seq2seq/run_translation.py ...
|
||||
|
||||
or with ``%%bash`` magic, where you can write a multi-line code for the shell program to run:
|
||||
|
||||
.. code-block::
|
||||
|
||||
%%bash
|
||||
%%bash
|
||||
|
||||
git clone https://github.com/huggingface/transformers
|
||||
cd transformers
|
||||
deepspeed examples/seq2seq/run_translation.py ...
|
||||
git clone https://github.com/huggingface/transformers
|
||||
cd transformers
|
||||
deepspeed examples/seq2seq/run_translation.py ...
|
||||
|
||||
In such case you don't need any of the code presented at the beginning of this section.
|
||||
|
||||
@@ -743,16 +743,16 @@ repo <https://github.com/microsoft/DeepSpeedExamples>`__:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
git clone https://github.com/microsoft/DeepSpeedExamples
|
||||
cd DeepSpeedExamples
|
||||
find . -name '*json'
|
||||
git clone https://github.com/microsoft/DeepSpeedExamples
|
||||
cd DeepSpeedExamples
|
||||
find . -name '*json'
|
||||
|
||||
Continuing the code from above, let's say you're looking to configure the Lamb optimizer. So you can search through the
|
||||
example ``.json`` files with:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
grep -i Lamb $(find . -name '*json')
|
||||
grep -i Lamb $(find . -name '*json')
|
||||
|
||||
Some more examples are to be found in the `main repo <https://github.com/microsoft/DeepSpeed>`__ as well.
|
||||
|
||||
@@ -1020,49 +1020,49 @@ Here is a full ZeRO-2 all-enabled configuration file ``ds_config_zero2.json``:
|
||||
|
||||
.. code-block:: json
|
||||
|
||||
{
|
||||
"fp16": {
|
||||
"enabled": true,
|
||||
"loss_scale": 0,
|
||||
"loss_scale_window": 1000,
|
||||
"initial_scale_power": 16,
|
||||
"hysteresis": 2,
|
||||
"min_loss_scale": 1
|
||||
},
|
||||
{
|
||||
"fp16": {
|
||||
"enabled": true,
|
||||
"loss_scale": 0,
|
||||
"loss_scale_window": 1000,
|
||||
"initial_scale_power": 16,
|
||||
"hysteresis": 2,
|
||||
"min_loss_scale": 1
|
||||
},
|
||||
|
||||
"zero_optimization": {
|
||||
"stage": 2,
|
||||
"allgather_partitions": true,
|
||||
"allgather_bucket_size": 2e8,
|
||||
"overlap_comm": true,
|
||||
"reduce_scatter": true,
|
||||
"reduce_bucket_size": 2e8,
|
||||
"contiguous_gradients": true,
|
||||
"cpu_offload": true
|
||||
},
|
||||
"zero_optimization": {
|
||||
"stage": 2,
|
||||
"allgather_partitions": true,
|
||||
"allgather_bucket_size": 2e8,
|
||||
"overlap_comm": true,
|
||||
"reduce_scatter": true,
|
||||
"reduce_bucket_size": 2e8,
|
||||
"contiguous_gradients": true,
|
||||
"cpu_offload": true
|
||||
},
|
||||
|
||||
"optimizer": {
|
||||
"type": "AdamW",
|
||||
"params": {
|
||||
"lr": 3e-5,
|
||||
"betas": [0.8, 0.999],
|
||||
"eps": 1e-8,
|
||||
"weight_decay": 3e-7
|
||||
}
|
||||
},
|
||||
"optimizer": {
|
||||
"type": "AdamW",
|
||||
"params": {
|
||||
"lr": 3e-5,
|
||||
"betas": [0.8, 0.999],
|
||||
"eps": 1e-8,
|
||||
"weight_decay": 3e-7
|
||||
}
|
||||
},
|
||||
|
||||
"scheduler": {
|
||||
"type": "WarmupLR",
|
||||
"params": {
|
||||
"warmup_min_lr": 0,
|
||||
"warmup_max_lr": 3e-5,
|
||||
"warmup_num_steps": 500
|
||||
}
|
||||
},
|
||||
"scheduler": {
|
||||
"type": "WarmupLR",
|
||||
"params": {
|
||||
"warmup_min_lr": 0,
|
||||
"warmup_max_lr": 3e-5,
|
||||
"warmup_num_steps": 500
|
||||
}
|
||||
},
|
||||
|
||||
"steps_per_print": 2000,
|
||||
"wall_clock_breakdown": false
|
||||
}
|
||||
"steps_per_print": 2000,
|
||||
"wall_clock_breakdown": false
|
||||
}
|
||||
|
||||
|
||||
|
||||
@@ -1073,54 +1073,54 @@ Here is a full ZeRO-3 all-enabled configuration file ``ds_config_zero3.json``:
|
||||
|
||||
.. code-block:: json
|
||||
|
||||
{
|
||||
"fp16": {
|
||||
"enabled": true,
|
||||
"loss_scale": 0,
|
||||
"loss_scale_window": 1000,
|
||||
"initial_scale_power": 16,
|
||||
"hysteresis": 2,
|
||||
"min_loss_scale": 1
|
||||
},
|
||||
{
|
||||
"fp16": {
|
||||
"enabled": true,
|
||||
"loss_scale": 0,
|
||||
"loss_scale_window": 1000,
|
||||
"initial_scale_power": 16,
|
||||
"hysteresis": 2,
|
||||
"min_loss_scale": 1
|
||||
},
|
||||
|
||||
"zero_optimization": {
|
||||
"stage": 3,
|
||||
"cpu_offload": true,
|
||||
"cpu_offload_params": true,
|
||||
"cpu_offload_use_pin_memory" : true,
|
||||
"overlap_comm": true,
|
||||
"contiguous_gradients": true,
|
||||
"sub_group_size": 1e14,
|
||||
"reduce_bucket_size": 1e6,
|
||||
"stage3_prefetch_bucket_size": 0.94e6,
|
||||
"stage3_param_persistence_threshold": 1e4,
|
||||
"stage3_max_live_parameters": 1e9,
|
||||
"stage3_max_reuse_distance": 1e9,
|
||||
"stage3_gather_fp16_weights_on_model_save": true
|
||||
},
|
||||
"zero_optimization": {
|
||||
"stage": 3,
|
||||
"cpu_offload": true,
|
||||
"cpu_offload_params": true,
|
||||
"cpu_offload_use_pin_memory" : true,
|
||||
"overlap_comm": true,
|
||||
"contiguous_gradients": true,
|
||||
"sub_group_size": 1e14,
|
||||
"reduce_bucket_size": 1e6,
|
||||
"stage3_prefetch_bucket_size": 0.94e6,
|
||||
"stage3_param_persistence_threshold": 1e4,
|
||||
"stage3_max_live_parameters": 1e9,
|
||||
"stage3_max_reuse_distance": 1e9,
|
||||
"stage3_gather_fp16_weights_on_model_save": true
|
||||
},
|
||||
|
||||
"optimizer": {
|
||||
"type": "AdamW",
|
||||
"params": {
|
||||
"lr": 3e-5,
|
||||
"betas": [0.8, 0.999],
|
||||
"eps": 1e-8,
|
||||
"weight_decay": 3e-7
|
||||
}
|
||||
},
|
||||
"optimizer": {
|
||||
"type": "AdamW",
|
||||
"params": {
|
||||
"lr": 3e-5,
|
||||
"betas": [0.8, 0.999],
|
||||
"eps": 1e-8,
|
||||
"weight_decay": 3e-7
|
||||
}
|
||||
},
|
||||
|
||||
"scheduler": {
|
||||
"type": "WarmupLR",
|
||||
"params": {
|
||||
"warmup_min_lr": 0,
|
||||
"warmup_max_lr": 3e-5,
|
||||
"warmup_num_steps": 500
|
||||
}
|
||||
},
|
||||
"scheduler": {
|
||||
"type": "WarmupLR",
|
||||
"params": {
|
||||
"warmup_min_lr": 0,
|
||||
"warmup_max_lr": 3e-5,
|
||||
"warmup_num_steps": 500
|
||||
}
|
||||
},
|
||||
|
||||
"steps_per_print": 2000,
|
||||
"wall_clock_breakdown": false
|
||||
}
|
||||
"steps_per_print": 2000,
|
||||
"wall_clock_breakdown": false
|
||||
}
|
||||
|
||||
|
||||
Optimizer and Scheduler
|
||||
@@ -1367,26 +1367,26 @@ Let's say your checkpoint folder looks like this:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ ls -l output_dir/checkpoint-1/
|
||||
-rw-rw-r-- 1 stas stas 1.4K Mar 27 20:42 config.json
|
||||
drwxrwxr-x 2 stas stas 4.0K Mar 25 19:52 global_step1/
|
||||
-rw-rw-r-- 1 stas stas 12 Mar 27 13:16 latest
|
||||
-rw-rw-r-- 1 stas stas 827K Mar 27 20:42 optimizer.pt
|
||||
-rw-rw-r-- 1 stas stas 231M Mar 27 20:42 pytorch_model.bin
|
||||
-rw-rw-r-- 1 stas stas 623 Mar 27 20:42 scheduler.pt
|
||||
-rw-rw-r-- 1 stas stas 1.8K Mar 27 20:42 special_tokens_map.json
|
||||
-rw-rw-r-- 1 stas stas 774K Mar 27 20:42 spiece.model
|
||||
-rw-rw-r-- 1 stas stas 1.9K Mar 27 20:42 tokenizer_config.json
|
||||
-rw-rw-r-- 1 stas stas 339 Mar 27 20:42 trainer_state.json
|
||||
-rw-rw-r-- 1 stas stas 2.3K Mar 27 20:42 training_args.bin
|
||||
-rwxrw-r-- 1 stas stas 5.5K Mar 27 13:16 zero_to_fp32.py*
|
||||
$ ls -l output_dir/checkpoint-1/
|
||||
-rw-rw-r-- 1 stas stas 1.4K Mar 27 20:42 config.json
|
||||
drwxrwxr-x 2 stas stas 4.0K Mar 25 19:52 global_step1/
|
||||
-rw-rw-r-- 1 stas stas 12 Mar 27 13:16 latest
|
||||
-rw-rw-r-- 1 stas stas 827K Mar 27 20:42 optimizer.pt
|
||||
-rw-rw-r-- 1 stas stas 231M Mar 27 20:42 pytorch_model.bin
|
||||
-rw-rw-r-- 1 stas stas 623 Mar 27 20:42 scheduler.pt
|
||||
-rw-rw-r-- 1 stas stas 1.8K Mar 27 20:42 special_tokens_map.json
|
||||
-rw-rw-r-- 1 stas stas 774K Mar 27 20:42 spiece.model
|
||||
-rw-rw-r-- 1 stas stas 1.9K Mar 27 20:42 tokenizer_config.json
|
||||
-rw-rw-r-- 1 stas stas 339 Mar 27 20:42 trainer_state.json
|
||||
-rw-rw-r-- 1 stas stas 2.3K Mar 27 20:42 training_args.bin
|
||||
-rwxrw-r-- 1 stas stas 5.5K Mar 27 13:16 zero_to_fp32.py*
|
||||
|
||||
In this example there is just one DeepSpeed checkpoint sub-folder `global_step1`. Therefore to reconstruct the fp32
|
||||
weights just run:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python zero_to_fp32.py global_step1 pytorch_model.bin
|
||||
python zero_to_fp32.py global_step1 pytorch_model.bin
|
||||
|
||||
The script will automatically handle either ZeRO-2 or ZeRO-3 checkpoint.
|
||||
|
||||
@@ -1416,18 +1416,18 @@ be seen in the following example:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
class ModuleZ3(torch.nn.Module):
|
||||
def __init__(self, *args):
|
||||
super().__init__(self, *args)
|
||||
self.layer1 = SomeLayer()
|
||||
self.layer2 = OtherLayer()
|
||||
deepspeed.zero.register_external_parameter(self, self.layer1.weight)
|
||||
class ModuleZ3(torch.nn.Module):
|
||||
def __init__(self, *args):
|
||||
super().__init__(self, *args)
|
||||
self.layer1 = SomeLayer()
|
||||
self.layer2 = OtherLayer()
|
||||
deepspeed.zero.register_external_parameter(self, self.layer1.weight)
|
||||
|
||||
def forward(self, input):
|
||||
x = self.layer1(input)
|
||||
# self.layer1.weight is needed in ModuleZ3.forward
|
||||
y = self.layer2(x, self.layer1.weight)
|
||||
return y
|
||||
def forward(self, input):
|
||||
x = self.layer1(input)
|
||||
# self.layer1.weight is needed in ModuleZ3.forward
|
||||
y = self.layer2(x, self.layer1.weight)
|
||||
return y
|
||||
|
||||
In general ``transformers`` models don't use this style of referring to other layer's weights so most likely you won't
|
||||
need to use it.
|
||||
@@ -1494,7 +1494,7 @@ Also under ZeRO-3, if you write your own code and run into a model parameter wei
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
tensor([1.], device='cuda:0', dtype=torch.float16, requires_grad=True)
|
||||
tensor([1.], device='cuda:0', dtype=torch.float16, requires_grad=True)
|
||||
|
||||
stress on ``tensor([1.])``, or if you get an error where it says the parameter is of size ``1``, instead of some much
|
||||
larger multi-dimensional shape, this means that the parameter is partitioned and what you see is a ZeRO-3 placeholder.
|
||||
|
||||
@@ -33,38 +33,38 @@ Example of using a model with MeCab and WordPiece tokenization:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> import torch
|
||||
>>> from transformers import AutoModel, AutoTokenizer
|
||||
>>> import torch
|
||||
>>> from transformers import AutoModel, AutoTokenizer
|
||||
|
||||
>>> bertjapanese = AutoModel.from_pretrained("cl-tohoku/bert-base-japanese")
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese")
|
||||
>>> bertjapanese = AutoModel.from_pretrained("cl-tohoku/bert-base-japanese")
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese")
|
||||
|
||||
>>> ## Input Japanese Text
|
||||
>>> line = "吾輩は猫である。"
|
||||
>>> ## Input Japanese Text
|
||||
>>> line = "吾輩は猫である。"
|
||||
|
||||
>>> inputs = tokenizer(line, return_tensors="pt")
|
||||
>>> inputs = tokenizer(line, return_tensors="pt")
|
||||
|
||||
>>> print(tokenizer.decode(inputs['input_ids'][0]))
|
||||
[CLS] 吾輩 は 猫 で ある 。 [SEP]
|
||||
>>> print(tokenizer.decode(inputs['input_ids'][0]))
|
||||
[CLS] 吾輩 は 猫 で ある 。 [SEP]
|
||||
|
||||
>>> outputs = bertjapanese(**inputs)
|
||||
>>> outputs = bertjapanese(**inputs)
|
||||
|
||||
Example of using a model with Character tokenization:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> bertjapanese = AutoModel.from_pretrained("cl-tohoku/bert-base-japanese-char")
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese-char")
|
||||
>>> bertjapanese = AutoModel.from_pretrained("cl-tohoku/bert-base-japanese-char")
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese-char")
|
||||
|
||||
>>> ## Input Japanese Text
|
||||
>>> line = "吾輩は猫である。"
|
||||
>>> ## Input Japanese Text
|
||||
>>> line = "吾輩は猫である。"
|
||||
|
||||
>>> inputs = tokenizer(line, return_tensors="pt")
|
||||
>>> inputs = tokenizer(line, return_tensors="pt")
|
||||
|
||||
>>> print(tokenizer.decode(inputs['input_ids'][0]))
|
||||
[CLS] 吾 輩 は 猫 で あ る 。 [SEP]
|
||||
>>> print(tokenizer.decode(inputs['input_ids'][0]))
|
||||
[CLS] 吾 輩 は 猫 で あ る 。 [SEP]
|
||||
|
||||
>>> outputs = bertjapanese(**inputs)
|
||||
>>> outputs = bertjapanese(**inputs)
|
||||
|
||||
Tips:
|
||||
|
||||
|
||||
@@ -38,22 +38,22 @@ Usage:
|
||||
|
||||
.. code-block::
|
||||
|
||||
# leverage checkpoints for Bert2Bert model...
|
||||
# use BERT's cls token as BOS token and sep token as EOS token
|
||||
encoder = BertGenerationEncoder.from_pretrained("bert-large-uncased", bos_token_id=101, eos_token_id=102)
|
||||
# add cross attention layers and use BERT's cls token as BOS token and sep token as EOS token
|
||||
decoder = BertGenerationDecoder.from_pretrained("bert-large-uncased", add_cross_attention=True, is_decoder=True, bos_token_id=101, eos_token_id=102)
|
||||
bert2bert = EncoderDecoderModel(encoder=encoder, decoder=decoder)
|
||||
# leverage checkpoints for Bert2Bert model...
|
||||
# use BERT's cls token as BOS token and sep token as EOS token
|
||||
encoder = BertGenerationEncoder.from_pretrained("bert-large-uncased", bos_token_id=101, eos_token_id=102)
|
||||
# add cross attention layers and use BERT's cls token as BOS token and sep token as EOS token
|
||||
decoder = BertGenerationDecoder.from_pretrained("bert-large-uncased", add_cross_attention=True, is_decoder=True, bos_token_id=101, eos_token_id=102)
|
||||
bert2bert = EncoderDecoderModel(encoder=encoder, decoder=decoder)
|
||||
|
||||
# create tokenizer...
|
||||
tokenizer = BertTokenizer.from_pretrained("bert-large-uncased")
|
||||
# create tokenizer...
|
||||
tokenizer = BertTokenizer.from_pretrained("bert-large-uncased")
|
||||
|
||||
input_ids = tokenizer('This is a long article to summarize', add_special_tokens=False, return_tensors="pt").input_ids
|
||||
labels = tokenizer('This is a short summary', return_tensors="pt").input_ids
|
||||
input_ids = tokenizer('This is a long article to summarize', add_special_tokens=False, return_tensors="pt").input_ids
|
||||
labels = tokenizer('This is a short summary', return_tensors="pt").input_ids
|
||||
|
||||
# train...
|
||||
loss = bert2bert(input_ids=input_ids, decoder_input_ids=labels, labels=labels).loss
|
||||
loss.backward()
|
||||
# train...
|
||||
loss = bert2bert(input_ids=input_ids, decoder_input_ids=labels, labels=labels).loss
|
||||
loss.backward()
|
||||
|
||||
|
||||
- Pretrained :class:`~transformers.EncoderDecoderModel` are also directly available in the model hub, e.g.,
|
||||
@@ -61,15 +61,15 @@ Usage:
|
||||
|
||||
.. code-block::
|
||||
|
||||
# instantiate sentence fusion model
|
||||
sentence_fuser = EncoderDecoderModel.from_pretrained("google/roberta2roberta_L-24_discofuse")
|
||||
tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_discofuse")
|
||||
# instantiate sentence fusion model
|
||||
sentence_fuser = EncoderDecoderModel.from_pretrained("google/roberta2roberta_L-24_discofuse")
|
||||
tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_discofuse")
|
||||
|
||||
input_ids = tokenizer('This is the first sentence. This is the second sentence.', add_special_tokens=False, return_tensors="pt").input_ids
|
||||
input_ids = tokenizer('This is the first sentence. This is the second sentence.', add_special_tokens=False, return_tensors="pt").input_ids
|
||||
|
||||
outputs = sentence_fuser.generate(input_ids)
|
||||
outputs = sentence_fuser.generate(input_ids)
|
||||
|
||||
print(tokenizer.decode(outputs[0]))
|
||||
print(tokenizer.decode(outputs[0]))
|
||||
|
||||
|
||||
Tips:
|
||||
|
||||
@@ -31,28 +31,28 @@ Example of use:
|
||||
|
||||
.. code-block::
|
||||
|
||||
import torch
|
||||
from transformers import AutoModel, AutoTokenizer
|
||||
import torch
|
||||
from transformers import AutoModel, AutoTokenizer
|
||||
|
||||
bertweet = AutoModel.from_pretrained("vinai/bertweet-base")
|
||||
bertweet = AutoModel.from_pretrained("vinai/bertweet-base")
|
||||
|
||||
# For transformers v4.x+:
|
||||
tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base", use_fast=False)
|
||||
# For transformers v4.x+:
|
||||
tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base", use_fast=False)
|
||||
|
||||
# For transformers v3.x:
|
||||
# tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base")
|
||||
# For transformers v3.x:
|
||||
# tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base")
|
||||
|
||||
# INPUT TWEET IS ALREADY NORMALIZED!
|
||||
line = "SC has first two presumptive cases of coronavirus , DHEC confirms HTTPURL via @USER :cry:"
|
||||
# INPUT TWEET IS ALREADY NORMALIZED!
|
||||
line = "SC has first two presumptive cases of coronavirus , DHEC confirms HTTPURL via @USER :cry:"
|
||||
|
||||
input_ids = torch.tensor([tokenizer.encode(line)])
|
||||
input_ids = torch.tensor([tokenizer.encode(line)])
|
||||
|
||||
with torch.no_grad():
|
||||
features = bertweet(input_ids) # Models outputs are now tuples
|
||||
with torch.no_grad():
|
||||
features = bertweet(input_ids) # Models outputs are now tuples
|
||||
|
||||
## With TensorFlow 2.0+:
|
||||
# from transformers import TFAutoModel
|
||||
# bertweet = TFAutoModel.from_pretrained("vinai/bertweet-base")
|
||||
## With TensorFlow 2.0+:
|
||||
# from transformers import TFAutoModel
|
||||
# bertweet = TFAutoModel.from_pretrained("vinai/bertweet-base")
|
||||
|
||||
|
||||
The original code can be found `here <https://github.com/VinAIResearch/BERTweet>`__.
|
||||
|
||||
@@ -40,20 +40,20 @@ Examples of use:
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import HerbertTokenizer, RobertaModel
|
||||
from transformers import HerbertTokenizer, RobertaModel
|
||||
|
||||
tokenizer = HerbertTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
|
||||
model = RobertaModel.from_pretrained("allegro/herbert-klej-cased-v1")
|
||||
tokenizer = HerbertTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
|
||||
model = RobertaModel.from_pretrained("allegro/herbert-klej-cased-v1")
|
||||
|
||||
encoded_input = tokenizer.encode("Kto ma lepszą sztukę, ma lepszy rząd – to jasne.", return_tensors='pt')
|
||||
outputs = model(encoded_input)
|
||||
encoded_input = tokenizer.encode("Kto ma lepszą sztukę, ma lepszy rząd – to jasne.", return_tensors='pt')
|
||||
outputs = model(encoded_input)
|
||||
|
||||
# HerBERT can also be loaded using AutoTokenizer and AutoModel:
|
||||
import torch
|
||||
from transformers import AutoModel, AutoTokenizer
|
||||
# HerBERT can also be loaded using AutoTokenizer and AutoModel:
|
||||
import torch
|
||||
from transformers import AutoModel, AutoTokenizer
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
|
||||
model = AutoModel.from_pretrained("allegro/herbert-klej-cased-v1")
|
||||
tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
|
||||
model = AutoModel.from_pretrained("allegro/herbert-klej-cased-v1")
|
||||
|
||||
|
||||
The original code can be found `here <https://github.com/allegro/HerBERT>`__.
|
||||
|
||||
@@ -56,24 +56,24 @@ Tips:
|
||||
|
||||
.. code-block::
|
||||
|
||||
def normalize_bbox(bbox, width, height):
|
||||
return [
|
||||
int(1000 * (bbox[0] / width)),
|
||||
int(1000 * (bbox[1] / height)),
|
||||
int(1000 * (bbox[2] / width)),
|
||||
int(1000 * (bbox[3] / height)),
|
||||
]
|
||||
def normalize_bbox(bbox, width, height):
|
||||
return [
|
||||
int(1000 * (bbox[0] / width)),
|
||||
int(1000 * (bbox[1] / height)),
|
||||
int(1000 * (bbox[2] / width)),
|
||||
int(1000 * (bbox[3] / height)),
|
||||
]
|
||||
|
||||
Here, :obj:`width` and :obj:`height` correspond to the width and height of the original document in which the token
|
||||
occurs. Those can be obtained using the Python Image Library (PIL) library for example, as follows:
|
||||
|
||||
.. code-block::
|
||||
|
||||
from PIL import Image
|
||||
from PIL import Image
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.")
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.")
|
||||
|
||||
width, height = image.size
|
||||
width, height = image.size
|
||||
|
||||
- For a demo which shows how to fine-tune :class:`LayoutLMForTokenClassification` on the `FUNSD dataset
|
||||
<https://guillaumejaume.github.io/FUNSD/>`__ (a collection of annotated forms), see `this notebook
|
||||
|
||||
@@ -53,15 +53,15 @@ BERT-345M-uncased::
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_bert_345m/versions/v0.1_uncased/zip
|
||||
-O megatron_bert_345m_v0_1_uncased.zip
|
||||
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_bert_345m/versions/v0.1_uncased/zip
|
||||
-O megatron_bert_345m_v0_1_uncased.zip
|
||||
|
||||
BERT-345M-cased::
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_bert_345m/versions/v0.1_cased/zip -O
|
||||
megatron_bert_345m_v0_1_cased.zip
|
||||
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_bert_345m/versions/v0.1_cased/zip -O
|
||||
megatron_bert_345m_v0_1_cased.zip
|
||||
|
||||
Once you have obtained the checkpoints from NVIDIA GPU Cloud (NGC), you have to convert them to a format that will
|
||||
easily be loaded by Hugging Face Transformers and our port of the BERT code.
|
||||
@@ -71,11 +71,11 @@ The following commands allow you to do the conversion. We assume that the folder
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py megatron_bert_345m_v0_1_uncased.zip
|
||||
python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py megatron_bert_345m_v0_1_uncased.zip
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py megatron_bert_345m_v0_1_cased.zip
|
||||
python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py megatron_bert_345m_v0_1_cased.zip
|
||||
|
||||
The original code can be found `here <https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU
|
||||
and multi-node implementation of the Megatron Language models. In particular, it contains a hybrid model parallel
|
||||
|
||||
@@ -51,8 +51,8 @@ Alternatively, you can directly download the checkpoints using::
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_lm_345m/versions/v0.0/zip -O
|
||||
megatron_gpt2_345m_v0_0.zip
|
||||
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_lm_345m/versions/v0.0/zip -O
|
||||
megatron_gpt2_345m_v0_0.zip
|
||||
|
||||
Once you have obtained the checkpoint from NVIDIA GPU Cloud (NGC), you have to convert it to a format that will easily
|
||||
be loaded by Hugging Face Transformers GPT2 implementation.
|
||||
@@ -62,7 +62,7 @@ The following command allows you to do the conversion. We assume that the folder
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python3 $PATH_TO_TRANSFORMERS/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py megatron_gpt2_345m_v0_0.zip
|
||||
python3 $PATH_TO_TRANSFORMERS/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py megatron_gpt2_345m_v0_0.zip
|
||||
|
||||
The original code can be found `here <https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU
|
||||
and multi-node implementation of the Megatron Language models. In particular, it contains a hybrid model parallel
|
||||
|
||||
@@ -31,23 +31,23 @@ Example of use:
|
||||
|
||||
.. code-block::
|
||||
|
||||
import torch
|
||||
from transformers import AutoModel, AutoTokenizer
|
||||
import torch
|
||||
from transformers import AutoModel, AutoTokenizer
|
||||
|
||||
phobert = AutoModel.from_pretrained("vinai/phobert-base")
|
||||
tokenizer = AutoTokenizer.from_pretrained("vinai/phobert-base")
|
||||
phobert = AutoModel.from_pretrained("vinai/phobert-base")
|
||||
tokenizer = AutoTokenizer.from_pretrained("vinai/phobert-base")
|
||||
|
||||
# INPUT TEXT MUST BE ALREADY WORD-SEGMENTED!
|
||||
line = "Tôi là sinh_viên trường đại_học Công_nghệ ."
|
||||
# INPUT TEXT MUST BE ALREADY WORD-SEGMENTED!
|
||||
line = "Tôi là sinh_viên trường đại_học Công_nghệ ."
|
||||
|
||||
input_ids = torch.tensor([tokenizer.encode(line)])
|
||||
input_ids = torch.tensor([tokenizer.encode(line)])
|
||||
|
||||
with torch.no_grad():
|
||||
features = phobert(input_ids) # Models outputs are now tuples
|
||||
with torch.no_grad():
|
||||
features = phobert(input_ids) # Models outputs are now tuples
|
||||
|
||||
## With TensorFlow 2.0+:
|
||||
# from transformers import TFAutoModel
|
||||
# phobert = TFAutoModel.from_pretrained("vinai/phobert-base")
|
||||
## With TensorFlow 2.0+:
|
||||
# from transformers import TFAutoModel
|
||||
# phobert = TFAutoModel.from_pretrained("vinai/phobert-base")
|
||||
|
||||
|
||||
The original code can be found `here <https://github.com/VinAIResearch/PhoBERT>`__.
|
||||
|
||||
@@ -145,8 +145,8 @@ For training, the :class:`~transformers.ReformerModelWithLMHead` should be used
|
||||
|
||||
.. code-block::
|
||||
|
||||
input_ids = tokenizer.encode('This is a sentence from the training data', return_tensors='pt')
|
||||
loss = model(input_ids, labels=input_ids)[0]
|
||||
input_ids = tokenizer.encode('This is a sentence from the training data', return_tensors='pt')
|
||||
loss = model(input_ids, labels=input_ids)[0]
|
||||
|
||||
|
||||
ReformerConfig
|
||||
|
||||
@@ -73,10 +73,10 @@ token. T5 can be trained / fine-tuned both in a supervised and unsupervised fash
|
||||
|
||||
.. code-block::
|
||||
|
||||
input_ids = tokenizer('The <extra_id_0> walks in <extra_id_1> park', return_tensors='pt').input_ids
|
||||
labels = tokenizer('<extra_id_0> cute dog <extra_id_1> the <extra_id_2>', return_tensors='pt').input_ids
|
||||
# the forward function automatically creates the correct decoder_input_ids
|
||||
loss = model(input_ids=input_ids, labels=labels).loss
|
||||
input_ids = tokenizer('The <extra_id_0> walks in <extra_id_1> park', return_tensors='pt').input_ids
|
||||
labels = tokenizer('<extra_id_0> cute dog <extra_id_1> the <extra_id_2>', return_tensors='pt').input_ids
|
||||
# the forward function automatically creates the correct decoder_input_ids
|
||||
loss = model(input_ids=input_ids, labels=labels).loss
|
||||
|
||||
- Supervised training
|
||||
|
||||
@@ -86,10 +86,10 @@ token. T5 can be trained / fine-tuned both in a supervised and unsupervised fash
|
||||
|
||||
.. code-block::
|
||||
|
||||
input_ids = tokenizer('translate English to German: The house is wonderful.', return_tensors='pt').input_ids
|
||||
labels = tokenizer('Das Haus ist wunderbar.', return_tensors='pt').input_ids
|
||||
# the forward function automatically creates the correct decoder_input_ids
|
||||
loss = model(input_ids=input_ids, labels=labels).loss
|
||||
input_ids = tokenizer('translate English to German: The house is wonderful.', return_tensors='pt').input_ids
|
||||
labels = tokenizer('Das Haus ist wunderbar.', return_tensors='pt').input_ids
|
||||
# the forward function automatically creates the correct decoder_input_ids
|
||||
loss = model(input_ids=input_ids, labels=labels).loss
|
||||
|
||||
|
||||
T5Config
|
||||
|
||||
@@ -70,19 +70,19 @@ Run all:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
pytest
|
||||
pytest
|
||||
|
||||
or:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
make test
|
||||
make test
|
||||
|
||||
Note that the latter is defined as:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python -m pytest -n auto --dist=loadfile -s -v ./tests/
|
||||
python -m pytest -n auto --dist=loadfile -s -v ./tests/
|
||||
|
||||
which tells pytest to:
|
||||
|
||||
@@ -100,13 +100,13 @@ All tests of the test suite:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest --collect-only -q
|
||||
pytest --collect-only -q
|
||||
|
||||
All tests of a given test file:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest tests/test_optimization.py --collect-only -q
|
||||
pytest tests/test_optimization.py --collect-only -q
|
||||
|
||||
|
||||
|
||||
@@ -117,7 +117,7 @@ To run an individual test module:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest tests/test_logging.py
|
||||
pytest tests/test_logging.py
|
||||
|
||||
|
||||
Run specific tests
|
||||
@@ -128,7 +128,7 @@ class containing those tests. For example, it could be:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest tests/test_optimization.py::OptimizationTest::test_adam_w
|
||||
pytest tests/test_optimization.py::OptimizationTest::test_adam_w
|
||||
|
||||
Here:
|
||||
|
||||
@@ -140,7 +140,7 @@ If the file contains multiple classes, you can choose to run only tests of a giv
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest tests/test_optimization.py::OptimizationTest
|
||||
pytest tests/test_optimization.py::OptimizationTest
|
||||
|
||||
|
||||
will run all the tests inside that class.
|
||||
@@ -149,7 +149,7 @@ As mentioned earlier you can see what tests are contained inside the ``Optimizat
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest tests/test_optimization.py::OptimizationTest --collect-only -q
|
||||
pytest tests/test_optimization.py::OptimizationTest --collect-only -q
|
||||
|
||||
You can run tests by keyword expressions.
|
||||
|
||||
@@ -157,7 +157,7 @@ To run only tests whose name contains ``adam``:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest -k adam tests/test_optimization.py
|
||||
pytest -k adam tests/test_optimization.py
|
||||
|
||||
Logical ``and`` and ``or`` can be used to indicate whether all keywords should match or either. ``not`` can be used to
|
||||
negate.
|
||||
@@ -166,19 +166,19 @@ To run all tests except those whose name contains ``adam``:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest -k "not adam" tests/test_optimization.py
|
||||
pytest -k "not adam" tests/test_optimization.py
|
||||
|
||||
And you can combine the two patterns in one:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest -k "ada and not adam" tests/test_optimization.py
|
||||
pytest -k "ada and not adam" tests/test_optimization.py
|
||||
|
||||
For example to run both ``test_adafactor`` and ``test_adam_w`` you can use:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest -k "test_adam_w or test_adam_w" tests/test_optimization.py
|
||||
pytest -k "test_adam_w or test_adam_w" tests/test_optimization.py
|
||||
|
||||
Note that we use ``or`` here, since we want either of the keywords to match to include both.
|
||||
|
||||
@@ -186,7 +186,7 @@ If you want to include only tests that include both patterns, ``and`` is to be u
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest -k "test and ada" tests/test_optimization.py
|
||||
pytest -k "test and ada" tests/test_optimization.py
|
||||
|
||||
|
||||
|
||||
@@ -251,7 +251,7 @@ example, to run all except ``test_modeling_*.py`` tests:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest `ls -1 tests/*py | grep -v test_modeling`
|
||||
pytest `ls -1 tests/*py | grep -v test_modeling`
|
||||
|
||||
|
||||
Clearing state
|
||||
@@ -292,13 +292,13 @@ Repeat tests
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install pytest-flakefinder
|
||||
pip install pytest-flakefinder
|
||||
|
||||
And then run every test multiple times (50 by default):
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest --flake-finder --flake-runs=5 tests/test_failing_test.py
|
||||
pytest --flake-finder --flake-runs=5 tests/test_failing_test.py
|
||||
|
||||
.. note::
|
||||
This plugin doesn't work with ``-n`` flag from ``pytest-xdist``.
|
||||
@@ -322,19 +322,19 @@ As explained earlier this allows detection of coupled tests - where one test's s
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest tests
|
||||
[...]
|
||||
Using --random-order-bucket=module
|
||||
Using --random-order-seed=573663
|
||||
pytest tests
|
||||
[...]
|
||||
Using --random-order-bucket=module
|
||||
Using --random-order-seed=573663
|
||||
|
||||
So that if the given particular sequence fails, you can reproduce it by adding that exact seed, e.g.:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest --random-order-seed=573663
|
||||
[...]
|
||||
Using --random-order-bucket=module
|
||||
Using --random-order-seed=573663
|
||||
pytest --random-order-seed=573663
|
||||
[...]
|
||||
Using --random-order-bucket=module
|
||||
Using --random-order-seed=573663
|
||||
|
||||
It will only reproduce the exact order if you use the exact same list of tests (or no list at all). Once you start to
|
||||
manually narrowing down the list you can no longer rely on the seed, but have to list them manually in the exact order
|
||||
@@ -342,7 +342,7 @@ they failed and tell pytest to not randomize them instead using ``--random-order
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest --random-order-bucket=none tests/test_a.py tests/test_c.py tests/test_b.py
|
||||
pytest --random-order-bucket=none tests/test_a.py tests/test_c.py tests/test_b.py
|
||||
|
||||
To disable the shuffling for all tests:
|
||||
|
||||
@@ -369,7 +369,7 @@ progressbar, and show tests that fail and the assert instantly. It gets activate
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install pytest-sugar
|
||||
pip install pytest-sugar
|
||||
|
||||
To run tests without it, run:
|
||||
|
||||
@@ -388,7 +388,7 @@ For a single or a group of tests via ``pytest`` (after ``pip install pytest-pspe
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest --pspec tests/test_optimization.py
|
||||
pytest --pspec tests/test_optimization.py
|
||||
|
||||
|
||||
|
||||
@@ -490,8 +490,8 @@ Inside tests:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
from transformers.testing_utils import get_gpu_count
|
||||
n_gpu = get_gpu_count() # works with torch and tf
|
||||
from transformers.testing_utils import get_gpu_count
|
||||
n_gpu = get_gpu_count() # works with torch and tf
|
||||
|
||||
|
||||
|
||||
@@ -514,8 +514,8 @@ You will need at least 2 GPUs to see these tests in action:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
CUDA_VISIBLE_DEVICES="0,1" RUN_SLOW=1 pytest -sv examples/seq2seq/test_finetune_trainer.py \
|
||||
examples/seq2seq/test_seq2seq_examples_multi_gpu.py
|
||||
CUDA_VISIBLE_DEVICES="0,1" RUN_SLOW=1 pytest -sv examples/seq2seq/test_finetune_trainer.py \
|
||||
examples/seq2seq/test_seq2seq_examples_multi_gpu.py
|
||||
|
||||
|
||||
Output capture
|
||||
@@ -528,13 +528,13 @@ To disable output capturing and to get the ``stdout`` and ``stderr`` normally, u
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest -s tests/test_logging.py
|
||||
pytest -s tests/test_logging.py
|
||||
|
||||
To send test results to JUnit format output:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
py.test tests --junitxml=result.xml
|
||||
py.test tests --junitxml=result.xml
|
||||
|
||||
|
||||
Color control
|
||||
@@ -544,7 +544,7 @@ To have no color (e.g., yellow on white background is not readable):
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest --color=no tests/test_logging.py
|
||||
pytest --color=no tests/test_logging.py
|
||||
|
||||
|
||||
|
||||
@@ -555,7 +555,7 @@ Creating a URL for each test failure:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest --pastebin=failed tests/test_logging.py
|
||||
pytest --pastebin=failed tests/test_logging.py
|
||||
|
||||
This will submit test run information to a remote Paste service and provide a URL for each failure. You may select
|
||||
tests as usual or add for example -x if you only want to send one particular failure.
|
||||
@@ -564,7 +564,7 @@ Creating a URL for a whole test session log:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest --pastebin=all tests/test_logging.py
|
||||
pytest --pastebin=all tests/test_logging.py
|
||||
|
||||
|
||||
|
||||
@@ -606,13 +606,13 @@ and you could run just the ``negative`` and ``integer`` sets of params with:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest -k "negative and integer" tests/test_mytest.py
|
||||
pytest -k "negative and integer" tests/test_mytest.py
|
||||
|
||||
or all but ``negative`` sub-tests, with:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pytest -k "not negative" tests/test_mytest.py
|
||||
pytest -k "not negative" tests/test_mytest.py
|
||||
|
||||
Besides using the ``-k`` filter that was just mentioned, you can find out the exact name of each sub-test and run any
|
||||
or all of them using their exact names.
|
||||
|
||||
Reference in New Issue
Block a user