From 521a8ffa53b5fc14e5d64b848827f3200ed3b436 Mon Sep 17 00:00:00 2001 From: Maria Khalusova Date: Fri, 28 Apr 2023 09:24:28 -0400 Subject: [PATCH] [docs] Doc TOC updates (#23049) * first draft of toc restructure * polishing based on feedback --- docs/source/en/_toctree.yml | 132 ++++---- .../en/converting_tensorflow_models.mdx | 162 --------- docs/source/en/migration.mdx | 315 ------------------ 3 files changed, 65 insertions(+), 544 deletions(-) delete mode 100644 docs/source/en/converting_tensorflow_models.mdx delete mode 100644 docs/source/en/migration.mdx diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index 6f0c0fe399..f3e51bb014 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -8,45 +8,22 @@ title: Get started - sections: - local: pipeline_tutorial - title: Pipelines for inference + title: Run inference with pipelines - local: autoclass_tutorial - title: Load pretrained instances with an AutoClass + title: Write portable code with AutoClass - local: preprocessing - title: Preprocess + title: Preprocess data - local: training title: Fine-tune a pretrained model + - local: run_scripts + title: Train with a script - local: accelerate - title: Distributed training with 🤗 Accelerate + title: Set up distributed training with 🤗 Accelerate - local: model_sharing - title: Share a model + title: Share your model title: Tutorials - sections: - sections: - - local: create_a_model - title: Create a custom architecture - - local: custom_models - title: Sharing custom models - - local: run_scripts - title: Train with a script - - local: sagemaker - title: Run training on Amazon SageMaker - - local: converting_tensorflow_models - title: Converting from TensorFlow checkpoints - - local: serialization - title: Export to ONNX - - local: torchscript - title: Export to TorchScript - - local: troubleshooting - title: Troubleshoot - title: General usage - - sections: - - local: fast_tokenizers - title: Use tokenizers from 🤗 Tokenizers - - local: multilingual - title: Inference for multilingual models - - local: generation_strategies - title: Text generation strategies - - sections: - local: tasks/sequence_classification title: Text classification - local: tasks/token_classification @@ -63,38 +40,67 @@ title: Summarization - local: tasks/multiple_choice title: Multiple choice - title: Task guides - isExpanded: false title: Natural Language Processing + isExpanded: false - sections: - - local: tasks/audio_classification - title: Audio classification - - local: tasks/asr - title: Automatic speech recognition + - local: tasks/audio_classification + title: Audio classification + - local: tasks/asr + title: Automatic speech recognition title: Audio + isExpanded: false - sections: - - local: tasks/image_classification - title: Image classification - - local: tasks/semantic_segmentation - title: Semantic segmentation - - local: tasks/video_classification - title: Video classification - - local: tasks/object_detection - title: Object detection - - local: tasks/zero_shot_object_detection - title: Zero-shot object detection - - local: tasks/zero_shot_image_classification - title: Zero-shot image classification - - local: tasks/monocular_depth_estimation - title: Depth estimation + - local: tasks/image_classification + title: Image classification + - local: tasks/semantic_segmentation + title: Semantic segmentation + - local: tasks/video_classification + title: Video classification + - local: tasks/object_detection + title: Object detection + - local: tasks/zero_shot_object_detection + title: Zero-shot object detection + - local: tasks/zero_shot_image_classification + title: Zero-shot image classification + - local: tasks/monocular_depth_estimation + title: Depth estimation title: Computer Vision + isExpanded: false - sections: - - local: tasks/image_captioning - title: Image captioning - - local: tasks/document_question_answering - title: Document Question Answering + - local: tasks/image_captioning + title: Image captioning + - local: tasks/document_question_answering + title: Document Question Answering title: Multimodal - - sections: + isExpanded: false + title: Task Guides +- sections: + - local: fast_tokenizers + title: Use fast tokenizers from 🤗 Tokenizers + - local: multilingual + title: Run inference with multilingual models + - local: generation_strategies + title: Customize text generation strategy + - local: create_a_model + title: Use model-specific APIs + - local: custom_models + title: Share a custom model + - local: sagemaker + title: Run training on Amazon SageMaker + - local: serialization + title: Export to ONNX + - local: torchscript + title: Export to TorchScript + - local: benchmarks + title: Benchmarks + - local: notebooks + title: Notebooks with examples + - local: community + title: Community resources + - local: troubleshooting + title: Troubleshoot + title: Developer guides +- sections: - local: performance title: Overview - local: perf_train_gpu_one @@ -129,8 +135,8 @@ title: Hyperparameter Search using Trainer API - local: tf_xla title: XLA Integration for TensorFlow Models - title: Performance and scalability - - sections: + title: Performance and scalability +- sections: - local: contributing title: How to contribute to transformers? - local: add_new_model @@ -143,16 +149,8 @@ title: Testing - local: pr_checks title: Checks on a Pull Request - title: Contribute - - local: notebooks - title: 🤗 Transformers Notebooks - - local: community - title: Community resources - - local: benchmarks - title: Benchmarks - - local: migration - title: Migrating from previous packages - title: How-to guides + title: Contribute + - sections: - local: philosophy title: Philosophy diff --git a/docs/source/en/converting_tensorflow_models.mdx b/docs/source/en/converting_tensorflow_models.mdx deleted file mode 100644 index 8dc51dd616..0000000000 --- a/docs/source/en/converting_tensorflow_models.mdx +++ /dev/null @@ -1,162 +0,0 @@ - - -# Converting From Tensorflow Checkpoints - -A command-line interface is provided to convert original Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM checkpoints to models -that can be loaded using the `from_pretrained` methods of the library. - - - -Since 2.3.0 the conversion script is now part of the transformers CLI (**transformers-cli**) available in any -transformers >= 2.3.0 installation. - -The documentation below reflects the **transformers-cli convert** command format. - - - -## BERT - -You can convert any TensorFlow checkpoint for BERT (in particular [the pre-trained models released by Google](https://github.com/google-research/bert#pre-trained-models)) in a PyTorch save file by using the -[convert_bert_original_tf_checkpoint_to_pytorch.py](https://github.com/huggingface/transformers/tree/main/src/transformers/models/bert/convert_bert_original_tf_checkpoint_to_pytorch.py) script. - -This CLI takes as input a TensorFlow checkpoint (three files starting with `bert_model.ckpt`) and the associated -configuration file (`bert_config.json`), and creates a PyTorch model for this configuration, loads the weights from -the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can -be imported using `from_pretrained()` (see example in [quicktour](quicktour) , [run_glue.py](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification/run_glue.py) ). - -You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow -checkpoint (the three files starting with `bert_model.ckpt`) but be sure to keep the configuration file (\ -`bert_config.json`) and the vocabulary file (`vocab.txt`) as these are needed for the PyTorch model too. - -To run this specific conversion script you will need to have TensorFlow and PyTorch installed (`pip install tensorflow`). The rest of the repository only requires PyTorch. - -Here is an example of the conversion process for a pre-trained `BERT-Base Uncased` model: - -```bash -export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12 - -transformers-cli convert --model_type bert \ - --tf_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ - --config $BERT_BASE_DIR/bert_config.json \ - --pytorch_dump_output $BERT_BASE_DIR/pytorch_model.bin -``` - -You can download Google's pre-trained models for the conversion [here](https://github.com/google-research/bert#pre-trained-models). - -## ALBERT - -Convert TensorFlow model checkpoints of ALBERT to PyTorch using the -[convert_albert_original_tf_checkpoint_to_pytorch.py](https://github.com/huggingface/transformers/tree/main/src/transformers/models/albert/convert_albert_original_tf_checkpoint_to_pytorch.py) script. - -The CLI takes as input a TensorFlow checkpoint (three files starting with `model.ckpt-best`) and the accompanying -configuration file (`albert_config.json`), then creates and saves a PyTorch model. To run this conversion you will -need to have TensorFlow and PyTorch installed. - -Here is an example of the conversion process for the pre-trained `ALBERT Base` model: - -```bash -export ALBERT_BASE_DIR=/path/to/albert/albert_base - -transformers-cli convert --model_type albert \ - --tf_checkpoint $ALBERT_BASE_DIR/model.ckpt-best \ - --config $ALBERT_BASE_DIR/albert_config.json \ - --pytorch_dump_output $ALBERT_BASE_DIR/pytorch_model.bin -``` - -You can download Google's pre-trained models for the conversion [here](https://github.com/google-research/albert#pre-trained-models). - -## OpenAI GPT - -Here is an example of the conversion process for a pre-trained OpenAI GPT model, assuming that your NumPy checkpoint -save as the same format than OpenAI pretrained model (see [here](https://github.com/openai/finetune-transformer-lm)\ -) - -```bash -export OPENAI_GPT_CHECKPOINT_FOLDER_PATH=/path/to/openai/pretrained/numpy/weights - -transformers-cli convert --model_type gpt \ - --tf_checkpoint $OPENAI_GPT_CHECKPOINT_FOLDER_PATH \ - --pytorch_dump_output $PYTORCH_DUMP_OUTPUT \ - [--config OPENAI_GPT_CONFIG] \ - [--finetuning_task_name OPENAI_GPT_FINETUNED_TASK] \ -``` - -## OpenAI GPT-2 - -Here is an example of the conversion process for a pre-trained OpenAI GPT-2 model (see [here](https://github.com/openai/gpt-2)) - -```bash -export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/gpt2/pretrained/weights - -transformers-cli convert --model_type gpt2 \ - --tf_checkpoint $OPENAI_GPT2_CHECKPOINT_PATH \ - --pytorch_dump_output $PYTORCH_DUMP_OUTPUT \ - [--config OPENAI_GPT2_CONFIG] \ - [--finetuning_task_name OPENAI_GPT2_FINETUNED_TASK] -``` - -## Transformer-XL - -Here is an example of the conversion process for a pre-trained Transformer-XL model (see [here](https://github.com/kimiyoung/transformer-xl/tree/master/tf#obtain-and-evaluate-pretrained-sota-models)) - -```bash -export TRANSFO_XL_CHECKPOINT_FOLDER_PATH=/path/to/transfo/xl/checkpoint - -transformers-cli convert --model_type transfo_xl \ - --tf_checkpoint $TRANSFO_XL_CHECKPOINT_FOLDER_PATH \ - --pytorch_dump_output $PYTORCH_DUMP_OUTPUT \ - [--config TRANSFO_XL_CONFIG] \ - [--finetuning_task_name TRANSFO_XL_FINETUNED_TASK] -``` - -## XLNet - -Here is an example of the conversion process for a pre-trained XLNet model: - -```bash -export TRANSFO_XL_CHECKPOINT_PATH=/path/to/xlnet/checkpoint -export TRANSFO_XL_CONFIG_PATH=/path/to/xlnet/config - -transformers-cli convert --model_type xlnet \ - --tf_checkpoint $TRANSFO_XL_CHECKPOINT_PATH \ - --config $TRANSFO_XL_CONFIG_PATH \ - --pytorch_dump_output $PYTORCH_DUMP_OUTPUT \ - [--finetuning_task_name XLNET_FINETUNED_TASK] \ -``` - -## XLM - -Here is an example of the conversion process for a pre-trained XLM model: - -```bash -export XLM_CHECKPOINT_PATH=/path/to/xlm/checkpoint - -transformers-cli convert --model_type xlm \ - --tf_checkpoint $XLM_CHECKPOINT_PATH \ - --pytorch_dump_output $PYTORCH_DUMP_OUTPUT - [--config XML_CONFIG] \ - [--finetuning_task_name XML_FINETUNED_TASK] -``` - -## T5 - -Here is an example of the conversion process for a pre-trained T5 model: - -```bash -export T5=/path/to/t5/uncased_L-12_H-768_A-12 - -transformers-cli convert --model_type t5 \ - --tf_checkpoint $T5/t5_model.ckpt \ - --config $T5/t5_config.json \ - --pytorch_dump_output $T5/pytorch_model.bin -``` diff --git a/docs/source/en/migration.mdx b/docs/source/en/migration.mdx deleted file mode 100644 index 7abf958751..0000000000 --- a/docs/source/en/migration.mdx +++ /dev/null @@ -1,315 +0,0 @@ - - -# Migrating from previous packages - -## Migrating from transformers `v3.x` to `v4.x` - -A couple of changes were introduced when the switch from version 3 to version 4 was done. Below is a summary of the -expected changes: - -#### 1. AutoTokenizers and pipelines now use fast (rust) tokenizers by default. - -The python and rust tokenizers have roughly the same API, but the rust tokenizers have a more complete feature set. - -This introduces two breaking changes: -- The handling of overflowing tokens between the python and rust tokenizers is different. -- The rust tokenizers do not accept integers in the encoding methods. - -##### How to obtain the same behavior as v3.x in v4.x - -- The pipelines now contain additional features out of the box. See the [token-classification pipeline with the `grouped_entities` flag](main_classes/pipelines#transformers.TokenClassificationPipeline). -- The auto-tokenizers now return rust tokenizers. In order to obtain the python tokenizers instead, the user may use the `use_fast` flag by setting it to `False`: - -In version `v3.x`: -```py -from transformers import AutoTokenizer - -tokenizer = AutoTokenizer.from_pretrained("bert-base-cased") -``` -to obtain the same in version `v4.x`: -```py -from transformers import AutoTokenizer - -tokenizer = AutoTokenizer.from_pretrained("bert-base-cased", use_fast=False) -``` - -#### 2. SentencePiece is removed from the required dependencies - -The requirement on the SentencePiece dependency has been lifted from the `setup.py`. This is done so that we may have a channel on anaconda cloud without relying on `conda-forge`. This means that the tokenizers that depend on the SentencePiece library will not be available with a standard `transformers` installation. - -This includes the **slow** versions of: -- `XLNetTokenizer` -- `AlbertTokenizer` -- `CamembertTokenizer` -- `MBartTokenizer` -- `PegasusTokenizer` -- `T5Tokenizer` -- `ReformerTokenizer` -- `XLMRobertaTokenizer` - -##### How to obtain the same behavior as v3.x in v4.x - -In order to obtain the same behavior as version `v3.x`, you should install `sentencepiece` additionally: - -In version `v3.x`: -```bash -pip install transformers -``` -to obtain the same in version `v4.x`: -```bash -pip install transformers[sentencepiece] -``` -or -```bash -pip install transformers sentencepiece -``` -#### 3. The architecture of the repo has been updated so that each model resides in its folder - -The past and foreseeable addition of new models means that the number of files in the directory `src/transformers` keeps growing and becomes harder to navigate and understand. We made the choice to put each model and the files accompanying it in their own sub-directories. - -This is a breaking change as importing intermediary layers using a model's module directly needs to be done via a different path. - -##### How to obtain the same behavior as v3.x in v4.x - -In order to obtain the same behavior as version `v3.x`, you should update the path used to access the layers. - -In version `v3.x`: -```bash -from transformers.modeling_bert import BertLayer -``` -to obtain the same in version `v4.x`: -```bash -from transformers.models.bert.modeling_bert import BertLayer -``` - -#### 4. Switching the `return_dict` argument to `True` by default - -The [`return_dict` argument](main_classes/output) enables the return of dict-like python objects containing the model outputs, instead of the standard tuples. This object is self-documented as keys can be used to retrieve values, while also behaving as a tuple as users may retrieve objects by index or by slice. - -This is a breaking change as the limitation of that tuple is that it cannot be unpacked: `value0, value1 = outputs` will not work. - -##### How to obtain the same behavior as v3.x in v4.x - -In order to obtain the same behavior as version `v3.x`, you should specify the `return_dict` argument to `False`, either in the model configuration or during the forward pass. - -In version `v3.x`: -```bash -model = BertModel.from_pretrained("bert-base-cased") -outputs = model(**inputs) -``` -to obtain the same in version `v4.x`: -```bash -model = BertModel.from_pretrained("bert-base-cased") -outputs = model(**inputs, return_dict=False) -``` -or -```bash -model = BertModel.from_pretrained("bert-base-cased", return_dict=False) -outputs = model(**inputs) -``` - -#### 5. Removed some deprecated attributes - -Attributes that were deprecated have been removed if they had been deprecated for at least a month. The full list of deprecated attributes can be found in [#8604](https://github.com/huggingface/transformers/pull/8604). - -Here is a list of these attributes/methods/arguments and what their replacements should be: - -In several models, the labels become consistent with the other models: -- `masked_lm_labels` becomes `labels` in `AlbertForMaskedLM` and `AlbertForPreTraining`. -- `masked_lm_labels` becomes `labels` in `BertForMaskedLM` and `BertForPreTraining`. -- `masked_lm_labels` becomes `labels` in `DistilBertForMaskedLM`. -- `masked_lm_labels` becomes `labels` in `ElectraForMaskedLM`. -- `masked_lm_labels` becomes `labels` in `LongformerForMaskedLM`. -- `masked_lm_labels` becomes `labels` in `MobileBertForMaskedLM`. -- `masked_lm_labels` becomes `labels` in `RobertaForMaskedLM`. -- `lm_labels` becomes `labels` in `BartForConditionalGeneration`. -- `lm_labels` becomes `labels` in `GPT2DoubleHeadsModel`. -- `lm_labels` becomes `labels` in `OpenAIGPTDoubleHeadsModel`. -- `lm_labels` becomes `labels` in `T5ForConditionalGeneration`. - -In several models, the caching mechanism becomes consistent with the other models: -- `decoder_cached_states` becomes `past_key_values` in all BART-like, FSMT and T5 models. -- `decoder_past_key_values` becomes `past_key_values` in all BART-like, FSMT and T5 models. -- `past` becomes `past_key_values` in all CTRL models. -- `past` becomes `past_key_values` in all GPT-2 models. - -Regarding the tokenizer classes: -- The tokenizer attribute `max_len` becomes `model_max_length`. -- The tokenizer attribute `return_lengths` becomes `return_length`. -- The tokenizer encoding argument `is_pretokenized` becomes `is_split_into_words`. - -Regarding the `Trainer` class: -- The `Trainer` argument `tb_writer` is removed in favor of the callback `TensorBoardCallback(tb_writer=...)`. -- The `Trainer` argument `prediction_loss_only` is removed in favor of the class argument `args.prediction_loss_only`. -- The `Trainer` attribute `data_collator` should be a callable. -- The `Trainer` method `_log` is deprecated in favor of `log`. -- The `Trainer` method `_training_step` is deprecated in favor of `training_step`. -- The `Trainer` method `_prediction_loop` is deprecated in favor of `prediction_loop`. -- The `Trainer` method `is_local_master` is deprecated in favor of `is_local_process_zero`. -- The `Trainer` method `is_world_master` is deprecated in favor of `is_world_process_zero`. - -Regarding the `TFTrainer` class: -- The `TFTrainer` argument `prediction_loss_only` is removed in favor of the class argument `args.prediction_loss_only`. -- The `Trainer` method `_log` is deprecated in favor of `log`. -- The `TFTrainer` method `_prediction_loop` is deprecated in favor of `prediction_loop`. -- The `TFTrainer` method `_setup_wandb` is deprecated in favor of `setup_wandb`. -- The `TFTrainer` method `_run_model` is deprecated in favor of `run_model`. - -Regarding the `TrainingArguments` class: -- The `TrainingArguments` argument `evaluate_during_training` is deprecated in favor of `evaluation_strategy`. - -Regarding the Transfo-XL model: -- The Transfo-XL configuration attribute `tie_weight` becomes `tie_words_embeddings`. -- The Transfo-XL modeling method `reset_length` becomes `reset_memory_length`. - -Regarding pipelines: -- The `FillMaskPipeline` argument `topk` becomes `top_k`. - - - -## Migrating from pytorch-transformers to 🤗 Transformers - -Here is a quick summary of what you should take care of when migrating from `pytorch-transformers` to 🤗 Transformers. - -### Positional order of some models' keywords inputs (`attention_mask`, `token_type_ids`...) changed - -To be able to use Torchscript (see #1010, #1204 and #1195) the specific order of some models **keywords inputs** (`attention_mask`, `token_type_ids`...) has been changed. - -If you used to call the models with keyword names for keyword arguments, e.g. `model(inputs_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)`, this should not cause any change. - -If you used to call the models with positional inputs for keyword arguments, e.g. `model(inputs_ids, attention_mask, token_type_ids)`, you may have to double check the exact order of input arguments. - -## Migrating from pytorch-pretrained-bert - -Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to 🤗 Transformers - -### Models always output `tuples` - -The main breaking change when migrating from `pytorch-pretrained-bert` to 🤗 Transformers is that the models forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters. - -The exact content of the tuples for each model are detailed in the models' docstrings and the [documentation](https://huggingface.co/transformers/). - -In pretty much every case, you will be fine by taking the first element of the output as the output you previously used in `pytorch-pretrained-bert`. - -Here is a `pytorch-pretrained-bert` to 🤗 Transformers conversion example for a `BertForSequenceClassification` classification model: - -```python -# Let's load our model -model = BertForSequenceClassification.from_pretrained("bert-base-uncased") - -# If you used to have this line in pytorch-pretrained-bert: -loss = model(input_ids, labels=labels) - -# Now just use this line in 🤗 Transformers to extract the loss from the output tuple: -outputs = model(input_ids, labels=labels) -loss = outputs[0] - -# In 🤗 Transformers you can also have access to the logits: -loss, logits = outputs[:2] - -# And even the attention weights if you configure the model to output them (and other outputs too, see the docstrings and documentation) -model = BertForSequenceClassification.from_pretrained("bert-base-uncased", output_attentions=True) -outputs = model(input_ids, labels=labels) -loss, logits, attentions = outputs -``` - -### Serialization - -Breaking change in the `from_pretrained()`method: - -1. Models are now set in evaluation mode by default when instantiated with the `from_pretrained()` method. To train them don't forget to set them back in training mode (`model.train()`) to activate the dropout modules. - -2. The additional `*inputs` and `**kwargs` arguments supplied to the `from_pretrained()` method used to be directly passed to the underlying model's class `__init__()` method. They are now used to update the model configuration attribute first which can break derived model classes build based on the previous `BertForSequenceClassification` examples. More precisely, the positional arguments `*inputs` provided to `from_pretrained()` are directly forwarded the model `__init__()` method while the keyword arguments `**kwargs` (i) which match configuration class attributes are used to update said attributes (ii) which don't match any configuration class attributes are forwarded to the model `__init__()` method. - -Also, while not a breaking change, the serialization methods have been standardized and you probably should switch to the new method `save_pretrained(save_directory)` if you were using any other serialization method before. - -Here is an example: - -```python -### Let's load a model and tokenizer -model = BertForSequenceClassification.from_pretrained("bert-base-uncased") -tokenizer = BertTokenizer.from_pretrained("bert-base-uncased") - -### Do some stuff to our model and tokenizer -# Ex: add new tokens to the vocabulary and embeddings of our model -tokenizer.add_tokens(["[SPECIAL_TOKEN_1]", "[SPECIAL_TOKEN_2]"]) -model.resize_token_embeddings(len(tokenizer)) -# Train our model -train(model) - -### Now let's save our model and tokenizer to a directory -model.save_pretrained("./my_saved_model_directory/") -tokenizer.save_pretrained("./my_saved_model_directory/") - -### Reload the model and the tokenizer -model = BertForSequenceClassification.from_pretrained("./my_saved_model_directory/") -tokenizer = BertTokenizer.from_pretrained("./my_saved_model_directory/") -``` - -### Optimizers: BertAdam & OpenAIAdam are now AdamW, schedules are standard PyTorch schedules - -The two optimizers previously included, `BertAdam` and `OpenAIAdam`, have been replaced by a single `AdamW` optimizer which has a few differences: - -- it only implements weights decay correction, -- schedules are now externals (see below), -- gradient clipping is now also external (see below). - -The new optimizer `AdamW` matches PyTorch `Adam` optimizer API and let you use standard PyTorch or apex methods for the schedule and clipping. - -The schedules are now standard [PyTorch learning rate schedulers](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate) and not part of the optimizer anymore. - -Here is a conversion examples from `BertAdam` with a linear warmup and decay schedule to `AdamW` and the same schedule: - -```python -# Parameters: -lr = 1e-3 -max_grad_norm = 1.0 -num_training_steps = 1000 -num_warmup_steps = 100 -warmup_proportion = float(num_warmup_steps) / float(num_training_steps) # 0.1 - -### Previously BertAdam optimizer was instantiated like this: -optimizer = BertAdam( - model.parameters(), - lr=lr, - schedule="warmup_linear", - warmup=warmup_proportion, - num_training_steps=num_training_steps, -) -### and used like this: -for batch in train_data: - loss = model(batch) - loss.backward() - optimizer.step() - -### In 🤗 Transformers, optimizer and schedules are split and instantiated like this: -optimizer = AdamW( - model.parameters(), lr=lr, correct_bias=False -) # To reproduce BertAdam specific behavior set correct_bias=False -scheduler = get_linear_schedule_with_warmup( - optimizer, num_warmup_steps=num_warmup_steps, num_training_steps=num_training_steps -) # PyTorch scheduler -### and used like this: -for batch in train_data: - loss = model(batch) - loss.backward() - torch.nn.utils.clip_grad_norm_( - model.parameters(), max_grad_norm - ) # Gradient clipping is not in AdamW anymore (so you can use amp without issue) - optimizer.step() - scheduler.step() -```