[docs] Redesign (#31757)

* toctree * not-doctested.txt * collapse sections * feedback * update * rewrite get started sections * fixes * fix * loading models * fix * customize models * share * fix link * contribute part 1 * contribute pt 2 * fix toctree * tokenization pt 1 * Add new model (#32615) * v1 - working version * fix * fix * fix * fix * rename to correct name * fix title * fixup * rename files * fix * add copied from on tests * rename to `FalconMamba` everywhere and fix bugs * fix quantization + accelerate * fix copies * add `torch.compile` support * fix tests * fix tests and add slow tests * copies on config * merge the latest changes * fix tests * add few lines about instruct * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * fix tests --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * "to be not" -> "not to be" (#32636) * "to be not" -> "not to be" * Update sam.md * Update trainer.py * Update modeling_utils.py * Update test_modeling_utils.py * Update test_modeling_utils.py * fix hfoption tag * tokenization pt. 2 * image processor * fix toctree * backbones * feature extractor * fix file name * processor * update not-doctested * update * make style * fix toctree * revision * make fixup * fix toctree * fix * make style * fix hfoption tag * pipeline * pipeline gradio * pipeline web server * add pipeline * fix toctree * not-doctested * prompting * llm optims * fix toctree * fixes * cache * text generation * fix * chat pipeline * chat stuff * xla * torch.compile * cpu inference * toctree * gpu inference * agents and tools * gguf/tiktoken * finetune * toctree * trainer * trainer pt 2 * optims * optimizers * accelerate * parallelism * fsdp * update * distributed cpu * hardware training * gpu training * gpu training 2 * peft * distrib debug * deepspeed 1 * deepspeed 2 * chat toctree * quant pt 1 * quant pt 2 * fix toctree * fix * fix * quant pt 3 * quant pt 4 * serialization * torchscript * scripts * tpu * review * model addition timeline * modular * more reviews * reviews * fix toctree * reviews reviews * continue reviews * more reviews * modular transformers * more review * zamba2 * fix * all frameworks * pytorch * supported model frameworks * flashattention * rm check_table * not-doctested.txt * rm check_support_list.py * feedback * updates/feedback * review * feedback * fix * update * feedback * updates * update --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-03-03 10:33:46 -08:00
parent 6aa9888463
commit c0f8d055ce
423 changed files with 10925 additions and 14569 deletions
--- a/docs/source/en/serialization.md
+++ b/docs/source/en/serialization.md
@@ -14,69 +14,41 @@ rendered properly in your Markdown viewer.

 -->

-# Export to ONNX
+# ONNX

-Deploying 🤗 Transformers models in production environments often requires, or can benefit from exporting the models into 
-a serialized format that can be loaded and executed on specialized runtimes and hardware.
+[ONNX](http://onnx.ai) is an open standard that defines a common set of operators and a file format to represent deep learning models in different frameworks, including PyTorch and TensorFlow. When a model is exported to ONNX, the operators construct a computational graph (or *intermediate representation*) which represents the flow of data through the model. Standardized operators and data types makes it easy to switch between frameworks.

-🤗 Optimum is an extension of Transformers that enables exporting models from PyTorch or TensorFlow to serialized formats 
-such as ONNX and TFLite through its `exporters` module. 🤗 Optimum also provides a set of performance optimization tools to train 
-and run models on targeted hardware with maximum efficiency.
+The [Optimum](https://huggingface.co/docs/optimum/index) library exports a model to ONNX with configuration objects which are supported for [many architectures]((https://huggingface.co/docs/optimum/exporters/onnx/overview)) and can be easily extended. If a model isn't supported, feel free to make a [contribution](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute) to Optimum.

-This guide demonstrates how you can export 🤗 Transformers models to ONNX with 🤗 Optimum, for the guide on exporting models to TFLite, 
-please refer to the [Export to TFLite page](tflite).
+The benefits of exporting to ONNX include the following.

-## Export to ONNX 
+- [Graph optimization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization) and [quantization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/quantization) for improving inference.
+- Use the [`~optimum.onnxruntime.ORTModel`] API to run a model with [ONNX Runtime](https://onnxruntime.ai/).
+- Use [optimized inference pipelines](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/pipelines) for ONNX models.

-[ONNX (Open Neural Network eXchange)](http://onnx.ai) is an open standard that defines a common set of operators and a 
-common file format to represent deep learning models in a wide variety of frameworks, including PyTorch and
-TensorFlow. When a model is exported to the ONNX format, these operators are used to
-construct a computational graph (often called an _intermediate representation_) which
-represents the flow of data through the neural network.
+Export a Transformers model to ONNX with the Optimum CLI or the `optimum.onnxruntime` module.

-By exposing a graph with standardized operators and data types, ONNX makes it easy to
-switch between frameworks. For example, a model trained in PyTorch can be exported to
-ONNX format and then imported in TensorFlow (and vice versa).
+## Optimum CLI

-Once exported to ONNX format, a model can be:
- optimized for inference via techniques such as [graph optimization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization) and [quantization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/quantization). 
- run with ONNX Runtime via [`ORTModelForXXX` classes](https://huggingface.co/docs/optimum/onnxruntime/package_reference/modeling_ort),
-which follow the same `AutoModel` API as the one you are used to in 🤗 Transformers.
- run with [optimized inference pipelines](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/pipelines),
-which has the same API as the [`pipeline`] function in 🤗 Transformers. 
-
-🤗 Optimum provides support for the ONNX export by leveraging configuration objects. These configuration objects come 
-ready-made for a number of model architectures, and are designed to be easily extendable to other architectures.
-
-For the list of ready-made configurations, please refer to [🤗 Optimum documentation](https://huggingface.co/docs/optimum/exporters/onnx/overview).
-
-There are two ways to export a 🤗 Transformers model to ONNX, here we show both:
-
- export with 🤗 Optimum via CLI.
- export with 🤗 Optimum with `optimum.onnxruntime`.
-
-### Exporting a 🤗 Transformers model to ONNX with CLI
-
-To export a 🤗 Transformers model to ONNX, first install an extra dependency:
+Run the command below to install Optimum and the [exporters](https://huggingface.co/docs/optimum/exporters/overview) module.

 ```bash
 pip install optimum[exporters]
 ```

-To check out all available arguments, refer to the [🤗 Optimum docs](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli), 
-or view help in command line:
+> [!TIP]
+> Refer to the [Export a model to ONNX with optimum.exporters.onnx](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli) guide for all available arguments or with the command below.
+> ```bash
+> optimum-cli export onnx --help
+> ```

-```bash
-optimum-cli export onnx --help
-```
-
-To export a model's checkpoint from the 🤗 Hub, for example, `distilbert/distilbert-base-uncased-distilled-squad`, run the following command: 
+Set the `--model` argument to export a PyTorch or TensorFlow model from the Hub.

 ```bash
 optimum-cli export onnx --model distilbert/distilbert-base-uncased-distilled-squad distilbert_base_uncased_squad_onnx/
 ```

-You should see the logs indicating progress and showing where the resulting `model.onnx` is saved, like this:
+You should see logs indicating the progress and showing where the resulting `model.onnx` is saved.

 ```bash
 Validating ONNX model distilbert_base_uncased_squad_onnx/model.onnx...
@@ -90,20 +62,13 @@ Validating ONNX model distilbert_base_uncased_squad_onnx/model.onnx...
 The ONNX export succeeded and the exported model was saved at: distilbert_base_uncased_squad_onnx
 ```

-The example above illustrates exporting a checkpoint from 🤗 Hub. When exporting a local model, first make sure that you 
-saved both the model's weights and tokenizer files in the same directory (`local_path`). When using CLI, pass the 
-`local_path` to the `model` argument instead of the checkpoint name on 🤗 Hub and provide the `--task` argument. 
-You can review the list of supported tasks in the [🤗 Optimum documentation](https://huggingface.co/docs/optimum/exporters/task_manager).
-If `task` argument is not provided, it will default to the model architecture without any task specific head.
+For local models, make sure the model weights and tokenizer files are saved in the same directory, for example `local_path`. Pass the directory to the `--model` argument and use `--task` to indicate the [task](https://huggingface.co/docs/optimum/exporters/task_manager) a model can perform. If `--task` isn't provided, the model architecture without a task-specific head is used.

 ```bash
 optimum-cli export onnx --model local_path --task question-answering distilbert_base_uncased_squad_onnx/
 ```

-The resulting `model.onnx` file can then be run on one of the [many
-accelerators](https://onnx.ai/supported-tools.html#deployModel) that support the ONNX
-standard. For example, we can load and run the model with [ONNX
-Runtime](https://onnxruntime.ai/) as follows:
+The `model.onnx` file can be deployed with any [accelerator](https://onnx.ai/supported-tools.html#deployModel) that supports ONNX. The example below demonstrates loading and running a model with ONNX Runtime.

 ```python
 >>> from transformers import AutoTokenizer
@@ -115,16 +80,9 @@ Runtime](https://onnxruntime.ai/) as follows:
 >>> outputs = model(**inputs)
 ```

-The process is identical for TensorFlow checkpoints on the Hub. For instance, here's how you would
-export a pure TensorFlow checkpoint from the [Keras organization](https://huggingface.co/keras-io):
+## optimum.onnxruntime

-```bash
-optimum-cli export onnx --model keras-io/transformers-qa distilbert_base_cased_squad_onnx/
-```
-
-### Exporting a 🤗 Transformers model to ONNX with `optimum.onnxruntime`
-
-Alternative to CLI, you can export a 🤗 Transformers model to ONNX programmatically like so: 
+The `optimum.onnxruntime` module supports programmatically exporting a Transformers model. Instantiate a [`~optimum.onnxruntime.ORTModel`] for a task and set `export=True`. Use [`~OptimizedModel.save_pretrained`] to save the ONNX model.

 ```python
 >>> from optimum.onnxruntime import ORTModelForSequenceClassification
@@ -133,78 +91,9 @@ Alternative to CLI, you can export a 🤗 Transformers model to ONNX programmati
 >>> model_checkpoint = "distilbert/distilbert-base-uncased-distilled-squad"
 >>> save_directory = "onnx/"

->>> # Load a model from transformers and export it to ONNX
 >>> ort_model = ORTModelForSequenceClassification.from_pretrained(model_checkpoint, export=True)
 >>> tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

->>> # Save the onnx model and tokenizer
 >>> ort_model.save_pretrained(save_directory)
 >>> tokenizer.save_pretrained(save_directory)
 ```
-
-### Exporting a model for an unsupported architecture
-
-If you wish to contribute by adding support for a model that cannot be currently exported, you should first check if it is
-supported in [`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview),
-and if it is not, [contribute to 🤗 Optimum](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute)
-directly.
-
-### Exporting a model with `transformers.onnx`
-
-<Tip warning={true}>
-
-`transformers.onnx` is no longer maintained, please export models with 🤗 Optimum as described above. This section will be removed in the future versions.
-
-</Tip>
-
-To export a 🤗 Transformers model to ONNX with `transformers.onnx`, install extra dependencies:
-
-```bash
-pip install transformers[onnx]
-```
-
-Use `transformers.onnx` package as a Python module to export a checkpoint using a ready-made configuration:
-
-```bash
-python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
-```
-
-This exports an ONNX graph of the checkpoint defined by the `--model` argument. Pass any checkpoint on the 🤗 Hub or one that's stored locally.
-The resulting `model.onnx` file can then be run on one of the many accelerators that support the ONNX standard. For example, 
-load and run the model with ONNX Runtime as follows:
-
-```python
->>> from transformers import AutoTokenizer
->>> from onnxruntime import InferenceSession
-
->>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
->>> session = InferenceSession("onnx/model.onnx")
->>> # ONNX Runtime expects NumPy arrays as input
->>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
->>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
-```
-
-The required output names (like `["last_hidden_state"]`) can be obtained by taking a look at the ONNX configuration of 
-each model. For example, for DistilBERT we have:
-
-```python
->>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
-
->>> config = DistilBertConfig()
->>> onnx_config = DistilBertOnnxConfig(config)
->>> print(list(onnx_config.outputs.keys()))
-["last_hidden_state"]
-```
-
-The process is identical for TensorFlow checkpoints on the Hub. For example, export a pure TensorFlow checkpoint like so:
-
-```bash
-python -m transformers.onnx --model=keras-io/transformers-qa onnx/
-```
-
-To export a model that's stored locally, save the model's weights and tokenizer files in the same directory (e.g. `local-pt-checkpoint`), 
-then export it to ONNX by pointing the `--model` argument of the `transformers.onnx` package to the desired directory:
-
-```bash
-python -m transformers.onnx --model=local-pt-checkpoint onnx/
-```