@@ -68,8 +68,7 @@ already reported** (use the search bar on GitHub under Issues). Your issue shoul
|
||||
|
||||
Once you've confirmed the bug hasn't already been reported, please include the following information in your issue so we can quickly resolve it:
|
||||
|
||||
* Your **OS type and version** and **Python**, **PyTorch** and
|
||||
**TensorFlow** versions when applicable.
|
||||
* Your **OS type and version** and **Python**, and **PyTorch** versions when applicable.
|
||||
* A short, self-contained, code snippet that allows us to reproduce the bug in
|
||||
less than 30s.
|
||||
* The *full* traceback if an exception is raised.
|
||||
@@ -165,8 +164,7 @@ You'll need **[Python 3.9](https://github.com/huggingface/transformers/blob/main
|
||||
mode with the `-e` flag.
|
||||
|
||||
Depending on your OS, and since the number of optional dependencies of Transformers is growing, you might get a
|
||||
failure with this command. If that's the case make sure to install the Deep Learning framework you are working with
|
||||
(PyTorch, TensorFlow and/or Flax) then do:
|
||||
failure with this command. If that's the case make sure to install Pytorch then do:
|
||||
|
||||
```bash
|
||||
pip install -e ".[quality]"
|
||||
|
||||
@@ -20,7 +20,7 @@ rendered properly in your Markdown viewer.
|
||||
|
||||
# Installation
|
||||
|
||||
Transformers works with [PyTorch](https://pytorch.org/get-started/locally/), [TensorFlow 2.0](https://www.tensorflow.org/install/pip), and [Flax](https://flax.readthedocs.io/en/latest/). It has been tested on Python 3.9+, PyTorch 2.1+, TensorFlow 2.6+, and Flax 0.4.1+.
|
||||
Transformers works with [PyTorch](https://pytorch.org/get-started/locally/). It has been tested on Python 3.9+ and PyTorch 2.2+.
|
||||
|
||||
## Virtual environment
|
||||
|
||||
@@ -74,7 +74,7 @@ uv pip install transformers
|
||||
</hfoption>
|
||||
</hfoptions>
|
||||
|
||||
For GPU acceleration, install the appropriate CUDA drivers for [PyTorch](https://pytorch.org/get-started/locally) and [TensorFlow](https://www.tensorflow.org/install/pip).
|
||||
For GPU acceleration, install the appropriate CUDA drivers for [PyTorch](https://pytorch.org/get-started/locally).
|
||||
|
||||
Run the command below to check if your system detects an NVIDIA GPU.
|
||||
|
||||
@@ -84,42 +84,11 @@ nvidia-smi
|
||||
|
||||
To install a CPU-only version of Transformers and a machine learning framework, run the following command.
|
||||
|
||||
<hfoptions id="cpu-only">
|
||||
<hfoption id="PyTorch">
|
||||
|
||||
```bash
|
||||
pip install 'transformers[torch]'
|
||||
uv pip install 'transformers[torch]'
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
<hfoption id="TensorFlow">
|
||||
|
||||
For Apple M1 hardware, you need to install CMake and pkg-config first.
|
||||
|
||||
```bash
|
||||
brew install cmake
|
||||
brew install pkg-config
|
||||
```
|
||||
|
||||
Install TensorFlow 2.0.
|
||||
|
||||
```bash
|
||||
pip install 'transformers[tf-cpu]'
|
||||
uv pip install 'transformers[tf-cpu]'
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
<hfoption id="Flax">
|
||||
|
||||
```bash
|
||||
pip install 'transformers[flax]'
|
||||
uv pip install 'transformers[flax]'
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
</hfoptions>
|
||||
|
||||
Test whether the install was successful with the following command. It should return a label and score for the provided text.
|
||||
|
||||
```bash
|
||||
|
||||
@@ -73,53 +73,9 @@ A model repository also includes an inference [widget](https://hf.co/docs/hub/mo
|
||||
|
||||
Check out the Hub [Models](https://hf.co/docs/hub/models) documentation to for more information.
|
||||
|
||||
## Model framework conversion
|
||||
|
||||
Reach a wider audience by making a model available in PyTorch, TensorFlow, and Flax. While users can still load a model if they're using a different framework, it is slower because Transformers needs to convert the checkpoint on the fly. It is faster to convert the checkpoint first.
|
||||
|
||||
<hfoptions id="convert">
|
||||
<hfoption id="PyTorch">
|
||||
|
||||
Set `from_tf=True` to convert a checkpoint from TensorFlow to PyTorch and then save it.
|
||||
|
||||
```py
|
||||
from transformers import DistilBertForSequenceClassification
|
||||
|
||||
pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
|
||||
pt_model.save_pretrained("path/to/awesome-name-you-picked")
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
<hfoption id="TensorFlow">
|
||||
|
||||
Set `from_pt=True` to convert a checkpoint from PyTorch to TensorFlow and then save it.
|
||||
|
||||
```py
|
||||
from transformers import TFDistilBertForSequenceClassification
|
||||
|
||||
tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
|
||||
tf_model.save_pretrained("path/to/awesome-name-you-picked")
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
<hfoption id="Flax">
|
||||
|
||||
Set `from_pt=True` to convert a checkpoint from PyTorch to Flax and then save it.
|
||||
|
||||
```py
|
||||
from transformers import FlaxDistilBertForSequenceClassification
|
||||
flax_model = FlaxDistilBertForSequenceClassification.from_pretrained(
|
||||
"path/to/awesome-name-you-picked", from_pt=True
|
||||
)
|
||||
flax_model.save_pretrained("path/to/awesome-name-you-picked")
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
</hfoptions>
|
||||
|
||||
## Uploading a model
|
||||
|
||||
There are several ways to upload a model to the Hub depending on your workflow preference. You can push a model with [`Trainer`], a callback for TensorFlow models, call [`~PreTrainedModel.push_to_hub`] directly on a model, or use the Hub web interface.
|
||||
There are several ways to upload a model to the Hub depending on your workflow preference. You can push a model with [`Trainer`], call [`~PreTrainedModel.push_to_hub`] directly on a model, or use the Hub web interface.
|
||||
|
||||
<Youtube id="Z1-XMy-GNLQ"/>
|
||||
|
||||
@@ -143,19 +99,6 @@ trainer = Trainer(
|
||||
trainer.push_to_hub()
|
||||
```
|
||||
|
||||
### PushToHubCallback
|
||||
|
||||
For TensorFlow models, add the [`PushToHubCallback`] to the [fit](https://keras.io/api/models/model_training_apis/#fit-method) method.
|
||||
|
||||
```py
|
||||
from transformers import PushToHubCallback
|
||||
|
||||
push_to_hub_callback = PushToHubCallback(
|
||||
output_dir="./your_model_save_path", tokenizer=tokenizer, hub_model_id="your-username/my-awesome-model"
|
||||
)
|
||||
model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3, callbacks=push_to_hub_callback)
|
||||
```
|
||||
|
||||
### PushToHubMixin
|
||||
|
||||
The [`~utils.PushToHubMixin`] provides functionality for pushing a model or tokenizer to the Hub.
|
||||
@@ -166,7 +109,7 @@ Call [`~utils.PushToHubMixin.push_to_hub`] directly on a model to upload it to t
|
||||
model.push_to_hub("my-awesome-model")
|
||||
```
|
||||
|
||||
Other objects like a tokenizer or TensorFlow model are also pushed to the Hub in the same way.
|
||||
Other objects like a tokenizer are also pushed to the Hub in the same way.
|
||||
|
||||
```py
|
||||
tokenizer.push_to_hub("my-awesome-model")
|
||||
|
||||
@@ -45,43 +45,6 @@ There are two general types of models you can load:
|
||||
1. A barebones model, like [`AutoModel`] or [`LlamaModel`], that outputs hidden states.
|
||||
2. A model with a specific *head* attached, like [`AutoModelForCausalLM`] or [`LlamaForCausalLM`], for performing specific tasks.
|
||||
|
||||
For each model type, there is a separate class for each machine learning framework (PyTorch, TensorFlow, Flax). Pick the corresponding prefix for the framework you're using.
|
||||
|
||||
<hfoptions id="backend">
|
||||
<hfoption id="PyTorch">
|
||||
|
||||
```py
|
||||
from transformers import AutoModelForCausalLM, MistralForCausalLM
|
||||
|
||||
# load with AutoClass or model-specific class
|
||||
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", dtype="auto", device_map="auto")
|
||||
model = MistralForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", dtype="auto", device_map="auto")
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
<hfoption id="TensorFlow">
|
||||
|
||||
```py
|
||||
from transformers import TFAutoModelForCausalLM, TFMistralForCausalLM
|
||||
|
||||
# load with AutoClass or model-specific class
|
||||
model = TFAutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
|
||||
model = TFMistralForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
<hfoption id="Flax">
|
||||
|
||||
```py
|
||||
from transformers import FlaxAutoModelForCausalLM, FlaxMistralForCausalLM
|
||||
|
||||
# load with AutoClass or model-specific class
|
||||
model = FlaxAutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
|
||||
model = FlaxMistralForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
</hfoptions>
|
||||
|
||||
## Model classes
|
||||
|
||||
|
||||
@@ -34,9 +34,9 @@ The library was designed with two strong goals in mind:
|
||||
loads the related class instance and associated data (configurations' hyperparameters, tokenizers' vocabulary,
|
||||
and models' weights) from a pretrained checkpoint provided on [Hugging Face Hub](https://huggingface.co/models) or your own saved checkpoint.
|
||||
- On top of those three base classes, the library provides two APIs: [`pipeline`] for quickly
|
||||
using a model for inference on a given task and [`Trainer`] to quickly train or fine-tune a PyTorch model (all TensorFlow models are compatible with `Keras.fit`).
|
||||
using a model for inference on a given task and [`Trainer`] to quickly train or fine-tune a PyTorch model.
|
||||
- As a consequence, this library is NOT a modular toolbox of building blocks for neural nets. If you want to
|
||||
extend or build upon the library, just use regular Python, PyTorch, TensorFlow, Keras modules and inherit from the base
|
||||
extend or build upon the library, just use regular Python or PyTorch and inherit from the base
|
||||
classes of the library to reuse functionalities like model loading and saving. If you'd like to learn more about our coding philosophy for models, check out our [Repeat Yourself](https://huggingface.co/blog/transformers-design-philosophy) blog post.
|
||||
|
||||
2. Provide state-of-the-art models with performances as close as possible to the original models:
|
||||
@@ -44,7 +44,7 @@ The library was designed with two strong goals in mind:
|
||||
- We provide at least one example for each architecture which reproduces a result provided by the official authors
|
||||
of said architecture.
|
||||
- The code is usually as close to the original code base as possible which means some PyTorch code may be not as
|
||||
*pytorchic* as it could be as a result of being converted TensorFlow code and vice versa.
|
||||
*pytorchic* as it could be as a result of being converted from other Deep Learning frameworks.
|
||||
|
||||
A few other goals:
|
||||
|
||||
@@ -58,13 +58,11 @@ A few other goals:
|
||||
- A simple and consistent way to add new tokens to the vocabulary and embeddings for fine-tuning.
|
||||
- Simple ways to mask and prune Transformer heads.
|
||||
|
||||
- Easily switch between PyTorch, TensorFlow 2.0 and Flax, allowing training with one framework and inference with another.
|
||||
|
||||
## Main concepts
|
||||
|
||||
The library is built around three types of classes for each model:
|
||||
|
||||
- **Model classes** can be PyTorch models ([torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module)), Keras models ([tf.keras.Model](https://www.tensorflow.org/api_docs/python/tf/keras/Model)) or JAX/Flax models ([flax.linen.Module](https://flax.readthedocs.io/en/latest/api_reference/flax.linen/module.html)) that work with the pretrained weights provided in the library.
|
||||
- **Model classes** are be PyTorch models ([torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module)).
|
||||
- **Configuration classes** store the hyperparameters required to build a model (such as the number of layers and hidden size). You don't always need to instantiate these yourself. In particular, if you are using a pretrained model without any modification, creating the model will automatically take care of instantiating the configuration (which is part of the model).
|
||||
- **Preprocessing classes** convert the raw data into a format accepted by the model. A [tokenizer](main_classes/tokenizer) stores the vocabulary for each model and provide methods for encoding and decoding strings in a list of token embedding indices to be fed to a model. [Image processors](main_classes/image_processor) preprocess vision inputs, [feature extractors](main_classes/feature_extractor) preprocess audio inputs, and a [processor](main_classes/processors) handles multimodal inputs.
|
||||
|
||||
@@ -76,4 +74,3 @@ All these classes can be instantiated from pretrained instances, saved locally,
|
||||
- `save_pretrained()` lets you save a model, configuration, and preprocessing class locally so that it can be reloaded using
|
||||
`from_pretrained()`.
|
||||
- `push_to_hub()` lets you share a model, configuration, and a preprocessing class to the Hub, so it is easily accessible to everyone.
|
||||
|
||||
|
||||
@@ -40,7 +40,7 @@ or for an editable install:
|
||||
pip install -e .[dev]
|
||||
```
|
||||
|
||||
inside the Transformers repo. Since the number of optional dependencies of Transformers has grown a lot, it's possible you don't manage to get all of them. If the dev install fails, make sure to install the Deep Learning framework you are working with (PyTorch, TensorFlow and/or Flax) then do
|
||||
inside the Transformers repo. Since the number of optional dependencies of Transformers has grown a lot, it's possible you don't manage to get all of them. If the dev install fails, make sure to install PyTorch then do
|
||||
|
||||
```bash
|
||||
pip install transformers[quality]
|
||||
@@ -55,7 +55,7 @@ pip install -e .[quality]
|
||||
|
||||
## Tests
|
||||
|
||||
All the jobs that begin with `ci/circleci: run_tests_` run parts of the Transformers testing suite. Each of those jobs focuses on a part of the library in a certain environment: for instance `ci/circleci: run_tests_pipelines_tf` runs the pipelines test in an environment where TensorFlow only is installed.
|
||||
All the jobs that begin with `ci/circleci: run_tests_` run parts of the Transformers testing suite. Each of those jobs focuses on a part of the library in a certain environment: for instance `ci/circleci: run_tests_pipelines` runs the pipeline tests in an environment where all pipeline-related requirements are installed.
|
||||
|
||||
Note that to avoid running tests when there is no real change in the modules they are testing, only part of the test suite is run each time: a utility is run to determine the differences in the library between before and after the PR (what GitHub shows you in the "Files changes" tab) and picks the tests impacted by that diff. That utility can be run locally with:
|
||||
|
||||
|
||||
@@ -16,13 +16,13 @@ rendered properly in your Markdown viewer.
|
||||
|
||||
# Training scripts
|
||||
|
||||
Transformers provides many example training scripts for deep learning frameworks (PyTorch, TensorFlow, Flax) and tasks in [transformers/examples](https://github.com/huggingface/transformers/tree/main/examples). There are additional scripts in [transformers/research projects](https://github.com/huggingface/transformers-research-projects/) and [transformers/legacy](https://github.com/huggingface/transformers/tree/main/examples/legacy), but these aren't actively maintained and requires a specific version of Transformers.
|
||||
Transformers provides many example training scripts for PyTorch and tasks in [transformers/examples](https://github.com/huggingface/transformers/tree/main/examples). There are additional scripts in [transformers/research projects](https://github.com/huggingface/transformers-research-projects/) and [transformers/legacy](https://github.com/huggingface/transformers/tree/main/examples/legacy), but these aren't actively maintained and requires a specific version of Transformers.
|
||||
|
||||
Example scripts are only examples and you may need to adapt the script to your use-case. To help you with this, most scripts are very transparent in how data is preprocessed, allowing you to edit it as necessary.
|
||||
|
||||
For any feature you'd like to implement in an example script, please discuss it on the [forum](https://discuss.huggingface.co/) or in an [issue](https://github.com/huggingface/transformers/issues) before submitting a pull request. While we welcome contributions, it is unlikely a pull request that adds more functionality is added at the cost of readability.
|
||||
|
||||
This guide will show you how to run an example summarization training script in [PyTorch](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization) and [TensorFlow](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/summarization).
|
||||
This guide will show you how to run an example summarization training script in [PyTorch](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization).
|
||||
|
||||
## Setup
|
||||
|
||||
@@ -58,10 +58,7 @@ Start with a smaller dataset by including the `max_train_samples`, `max_eval_sam
|
||||
|
||||
The example below fine-tunes [T5-small](https://huggingface.co/google-t5/t5-small) on the [CNN/DailyMail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset. T5 requires an additional `source_prefix` parameter to prompt it to summarize.
|
||||
|
||||
<hfoptions id="script">
|
||||
<hfoption id="PyTorch">
|
||||
|
||||
The example script downloads and preprocesses a dataset, and then fine-tunes it with [`Trainer`] with a supported model architecture.
|
||||
The example script downloads and preprocesses a dataset, and then fine-tunes it with [`Trainer`] with a supported model architecture.
|
||||
|
||||
Resuming training from a checkpoint is very useful if training is interrupted because you don't have to start over again. There are two ways to resume training from a checkpoint.
|
||||
|
||||
@@ -116,40 +113,6 @@ python xla_spawn.py --num_cores 8 pytorch/summarization/run_summarization.py \
|
||||
...
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
<hfoption id="TensorFlow">
|
||||
|
||||
```bash
|
||||
python examples/tensorflow/summarization/run_summarization.py \
|
||||
--model_name_or_path google-t5/t5-small \
|
||||
# remove the `max_train_samples`, `max_eval_samples` and `max_predict_samples` if everything works
|
||||
--max_train_samples 50 \
|
||||
--max_eval_samples 50 \
|
||||
--max_predict_samples 50 \
|
||||
--dataset_name cnn_dailymail \
|
||||
--dataset_config "3.0.0" \
|
||||
--output_dir /tmp/tst-summarization \
|
||||
--per_device_train_batch_size 8 \
|
||||
--per_device_eval_batch_size 16 \
|
||||
--num_train_epochs 3 \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
```
|
||||
|
||||
TensorFlow uses the [MirroredStrategy](https://www.tensorflow.org/guide/distributed_training#mirroredstrategy) for distributed training and doesn't require adding any additional parameters. The script uses multiple GPUs by default if they are available.
|
||||
|
||||
For TPU training, TensorFlow scripts use the [TPUStrategy](https://www.tensorflow.org/guide/distributed_training#tpustrategy). Pass the TPU resource name to the `--tpu` parameter.
|
||||
|
||||
```bash
|
||||
python run_summarization.py \
|
||||
--tpu name_of_tpu_resource \
|
||||
...
|
||||
...
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
</hfoptions>
|
||||
|
||||
## Accelerate
|
||||
|
||||
[Accelerate](https://huggingface.co/docs/accelerate) is designed to simplify distributed training while offering complete visibility into the PyTorch training loop. If you're planning on training with a script with Accelerate, use the `_no_trainer.py` version of the script.
|
||||
@@ -160,7 +123,7 @@ Install Accelerate from source to ensure you have the latest version.
|
||||
pip install git+https://github.com/huggingface/accelerate
|
||||
```
|
||||
|
||||
Run the [accelerate config](https://huggingface.co/docs/accelerate/package_reference/cli#accelerate-config) command to answer a few questions about your training setup. This creates and saves a config file about your system.
|
||||
Run the [accelerate config](https://huggingface.co/docs/accelerate/package_reference/cli#accelerate-config) command to answer a few questions about your training setup. This creates and saves a config file about your system.
|
||||
|
||||
```bash
|
||||
accelerate config
|
||||
|
||||
Reference in New Issue
Block a user