From 5c30f7e390429904ecf0749c2e9fd9a3f29cc714 Mon Sep 17 00:00:00 2001 From: Parag Ekbote Date: Fri, 11 Jul 2025 23:53:08 +0530 Subject: [PATCH] Update Model Card for Encoder Decoder Model (#39272) * update model card. * add back the model contributors for mamba and mamba2. * update the model card. * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update batches with correct alignment. * update examples and remove quantization example. * update the examples. * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update example. * correct the example. --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/model_doc/encoder-decoder.md | 180 ++++++++++---------- docs/source/en/model_doc/mamba.md | 1 + docs/source/en/model_doc/mamba2.md | 1 + 3 files changed, 94 insertions(+), 88 deletions(-) diff --git a/docs/source/en/model_doc/encoder-decoder.md b/docs/source/en/model_doc/encoder-decoder.md index f697c213b7..f01d4c1a67 100644 --- a/docs/source/en/model_doc/encoder-decoder.md +++ b/docs/source/en/model_doc/encoder-decoder.md @@ -14,115 +14,88 @@ rendered properly in your Markdown viewer. --> -# Encoder Decoder Models - -
-PyTorch -TensorFlow -Flax -SDPA +
+
+ PyTorch + TensorFlow + Flax + SDPA +
-## Overview +# Encoder Decoder Models -The [`EncoderDecoderModel`] can be used to initialize a sequence-to-sequence model with any -pretrained autoencoding model as the encoder and any pretrained autoregressive model as the decoder. +[`EncoderDecoderModel`](https://huggingface.co/papers/1706.03762) initializes a sequence-to-sequence model with any pretrained autoencoder and pretrained autoregressive model. It is effective for sequence generation tasks as demonstrated in [Text Summarization with Pretrained Encoders](https://huggingface.co/papers/1908.08345) which uses [`BertModel`] as the encoder and decoder. -The effectiveness of initializing sequence-to-sequence models with pretrained checkpoints for sequence generation tasks -was shown in [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://huggingface.co/papers/1907.12461) by -Sascha Rothe, Shashi Narayan, Aliaksei Severyn. +> [!TIP] +> This model was contributed by [thomwolf](https://huggingface.co/thomwolf) and the TensorFlow/Flax version by [ydshieh](https://huggingface.co/ydshieh). +> +> Click on the Encoder Decoder models in the right sidebar for more examples of how to apply Encoder Decoder to different language tasks. -After such an [`EncoderDecoderModel`] has been trained/fine-tuned, it can be saved/loaded just like -any other models (see the examples for more information). +The example below demonstrates how to generate text with [`Pipeline`], [`AutoModel`], and from the command line. -An application of this architecture could be to leverage two pretrained [`BertModel`] as the encoder -and decoder for a summarization model as was shown in: [Text Summarization with Pretrained Encoders](https://huggingface.co/papers/1908.08345) by Yang Liu and Mirella Lapata. - -## Randomly initializing `EncoderDecoderModel` from model configurations. - -[`EncoderDecoderModel`] can be randomly initialized from an encoder and a decoder config. In the following example, we show how to do this using the default [`BertModel`] configuration for the encoder and the default [`BertForCausalLM`] configuration for the decoder. + + ```python ->>> from transformers import BertConfig, EncoderDecoderConfig, EncoderDecoderModel +from transformers import pipeline ->>> config_encoder = BertConfig() ->>> config_decoder = BertConfig() +summarizer = pipeline( + "summarization", + model="patrickvonplaten/bert2bert-cnn_dailymail-fp16", + device=0 +) ->>> config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder) ->>> model = EncoderDecoderModel(config=config) +text = "Plants create energy through a process known as photosynthesis. This involves capturing sunlight and converting carbon dioxide and water into glucose and oxygen." +print(summarizer(text)) ``` -## Initialising `EncoderDecoderModel` from a pretrained encoder and a pretrained decoder. - -[`EncoderDecoderModel`] can be initialized from a pretrained encoder checkpoint and a pretrained decoder checkpoint. Note that any pretrained auto-encoding model, *e.g.* BERT, can serve as the encoder and both pretrained auto-encoding models, *e.g.* BERT, pretrained causal language models, *e.g.* GPT2, as well as the pretrained decoder part of sequence-to-sequence models, *e.g.* decoder of BART, can be used as the decoder. -Depending on which architecture you choose as the decoder, the cross-attention layers might be randomly initialized. -Initializing [`EncoderDecoderModel`] from a pretrained encoder and decoder checkpoint requires the model to be fine-tuned on a downstream task, as has been shown in [the *Warm-starting-encoder-decoder blog post*](https://huggingface.co/blog/warm-starting-encoder-decoder). -To do so, the `EncoderDecoderModel` class provides a [`EncoderDecoderModel.from_encoder_decoder_pretrained`] method. + + ```python ->>> from transformers import EncoderDecoderModel, BertTokenizer +import torch +from transformers import AutoModelForCausalLM, AutoTokenizer ->>> tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased") ->>> model = EncoderDecoderModel.from_encoder_decoder_pretrained("google-bert/bert-base-uncased", "google-bert/bert-base-uncased") +tokenizer = AutoTokenizer.from_pretrained("patrickvonplaten/bert2bert-cnn_dailymail-fp16") +model = AutoModelForCausalLM.from_pretrained("patrickvonplaten/bert2bert-cnn_dailymail-fp16", torch_dtype=torch.bfloat16, device_map="auto",attn_implementation="sdpa") + +text = "Plants create energy through a process known as photosynthesis. This involves capturing sunlight and converting carbon dioxide and water into glucose and oxygen." + +inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(model.device) + +summary = model.generate(**inputs, max_length=60, num_beams=4, early_stopping=True) +print(tokenizer.decode(summary[0], skip_special_tokens=True)) ``` -## Loading an existing `EncoderDecoderModel` checkpoint and perform inference. + + -To load fine-tuned checkpoints of the `EncoderDecoderModel` class, [`EncoderDecoderModel`] provides the `from_pretrained(...)` method just like any other model architecture in Transformers. +```bash +echo -e "Plants create energy through a process known as photosynthesis. This involves capturing sunlight and converting carbon dioxide and water into glucose and oxygen." | transformers-cli run --task summarization --model "patrickvonplaten/bert2bert-cnn_dailymail-fp16" --device 0 +``` -To perform inference, one uses the [`generate`] method, which allows to autoregressively generate text. This method supports various forms of decoding, such as greedy, beam search and multinomial sampling. + + + +## Notes + +- [`EncoderDecoderModel`] can be initialized using any pretrained encoder and decoder. But depending on the decoder architecture, the cross-attention layers may be randomly initialized. + +These models require downstream fine-tuning, as discussed in this [blog post](https://huggingface.co/blog/warm-starting-encoder-decoder). Use [`~EncoderDecoderModel.from_encoder_decoder_pretrained`] to combine encoder and decoder checkpoints. ```python ->>> from transformers import AutoTokenizer, EncoderDecoderModel +from transformers import EncoderDecoderModel, BertTokenizer ->>> # load a fine-tuned seq2seq model and corresponding tokenizer ->>> model = EncoderDecoderModel.from_pretrained("patrickvonplaten/bert2bert_cnn_daily_mail") ->>> tokenizer = AutoTokenizer.from_pretrained("patrickvonplaten/bert2bert_cnn_daily_mail") - ->>> # let's perform inference on a long piece of text ->>> ARTICLE_TO_SUMMARIZE = ( -... "PG&E stated it scheduled the blackouts in response to forecasts for high winds " -... "amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were " -... "scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow." -... ) ->>> input_ids = tokenizer(ARTICLE_TO_SUMMARIZE, return_tensors="pt").input_ids - ->>> # autoregressively generate summary (uses greedy decoding by default) ->>> generated_ids = model.generate(input_ids) ->>> generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ->>> print(generated_text) -nearly 800 thousand customers were affected by the shutoffs. the aim is to reduce the risk of wildfires. nearly 800, 000 customers were expected to be affected by high winds amid dry conditions. pg & e said it scheduled the blackouts to last through at least midday tomorrow. +tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased") +model = EncoderDecoderModel.from_encoder_decoder_pretrained( + "google-bert/bert-base-uncased", + "google-bert/bert-base-uncased" +) ``` -## Loading a PyTorch checkpoint into `TFEncoderDecoderModel`. - -[`TFEncoderDecoderModel.from_pretrained`] currently doesn't support initializing the model from a -pytorch checkpoint. Passing `from_pt=True` to this method will throw an exception. If there are only pytorch -checkpoints for a particular encoder-decoder model, a workaround is: - -```python ->>> # a workaround to load from pytorch checkpoint ->>> from transformers import EncoderDecoderModel, TFEncoderDecoderModel - ->>> _model = EncoderDecoderModel.from_pretrained("patrickvonplaten/bert2bert-cnn_dailymail-fp16") - ->>> _model.encoder.save_pretrained("./encoder") ->>> _model.decoder.save_pretrained("./decoder") - ->>> model = TFEncoderDecoderModel.from_encoder_decoder_pretrained( -... "./encoder", "./decoder", encoder_from_pt=True, decoder_from_pt=True -... ) ->>> # This is only for copying some specific attributes of this particular model. ->>> model.config = _model.config -``` - -## Training - -Once the model is created, it can be fine-tuned similar to BART, T5 or any other encoder-decoder model. -As you can see, only 2 inputs are required for the model in order to compute a loss: `input_ids` (which are the -`input_ids` of the encoded input sequence) and `labels` (which are the `input_ids` of the encoded -target sequence). +- Encoder Decoder models can be fine-tuned like BART, T5 or any other encoder-decoder model. Only 2 inputs are required to compute a loss, `input_ids` and `labels`. Refer to this [notebook](https://colab.research.google.com/drive/1WIk2bxglElfZewOHboPFNj8H44_VAyKE?usp=sharing#scrollTo=ZwQIEhKOrJpl) for a more detailed training example. ```python >>> from transformers import BertTokenizer, EncoderDecoderModel @@ -147,11 +120,42 @@ target sequence). >>> loss = model(input_ids=input_ids, labels=labels).loss ``` -Detailed [colab](https://colab.research.google.com/drive/1WIk2bxglElfZewOHboPFNj8H44_VAyKE?usp=sharing#scrollTo=ZwQIEhKOrJpl) for training. +- [`EncoderDecoderModel`] can be randomly initialized from an encoder and a decoder config as shown below. -This model was contributed by [thomwolf](https://github.com/thomwolf). This model's TensorFlow and Flax versions -were contributed by [ydshieh](https://github.com/ydshieh). +```python +>>> from transformers import BertConfig, EncoderDecoderConfig, EncoderDecoderModel +>>> config_encoder = BertConfig() +>>> config_decoder = BertConfig() + +>>> config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder) +>>> model = EncoderDecoderModel(config=config) +``` + +- The Encoder Decoder Model can also be used for translation as shown below. + +```python +from transformers import AutoTokenizer, EncoderDecoderModel + +# Load a pre-trained translation model +model_name = "google/bert2bert_L-24_wmt_en_de" +tokenizer = AutoTokenizer.from_pretrained(model_name, pad_token="", eos_token="", bos_token="") +model = EncoderDecoderModel.from_pretrained(model_name) + +# Input sentence to translate +input_text = "Plants create energy through a process known as" + +# Encode the input text +inputs = tokenizer(input_text, return_tensors="pt", add_special_tokens=False).input_ids + +# Generate the translated output +outputs = model.generate(inputs)[0] + +# Decode the output tokens to get the translated sentence +translated_text = tokenizer.decode(outputs, skip_special_tokens=True) + +print("Translated text:", translated_text) +``` ## EncoderDecoderConfig diff --git a/docs/source/en/model_doc/mamba.md b/docs/source/en/model_doc/mamba.md index 9ce98d8516..1e30e9af8b 100644 --- a/docs/source/en/model_doc/mamba.md +++ b/docs/source/en/model_doc/mamba.md @@ -28,6 +28,7 @@ You can find all the original Mamba checkpoints under the [State Space Models](h > [!TIP] +> This model was contributed by [Molbap](https://huggingface.co/Molbap) and [AntonV](https://huggingface.co/AntonV). > Click on the Mamba models in the right sidebar for more examples of how to apply Mamba to different language tasks. The example below demonstrates how to generate text with [`Pipeline`], [`AutoModel`], and from the command line. diff --git a/docs/source/en/model_doc/mamba2.md b/docs/source/en/model_doc/mamba2.md index a209407022..4d7de552d4 100644 --- a/docs/source/en/model_doc/mamba2.md +++ b/docs/source/en/model_doc/mamba2.md @@ -26,6 +26,7 @@ rendered properly in your Markdown viewer. You can find all the original Mamba 2 checkpoints under the [State Space Models](https://huggingface.co/state-spaces) organization, but the examples shown below use [mistralai/Mamba-Codestral-7B-v0.1](https://huggingface.co/mistralai/Mamba-Codestral-7B-v0.1) because a Hugging Face implementation isn't supported yet for the original checkpoints. > [!TIP] +> This model was contributed by [ArthurZ](https://huggingface.co/ArthurZ). > Click on the Mamba models in the right sidebar for more examples of how to apply Mamba to different language tasks. The example below demonstrates how to generate text with [`Pipeline`], [`AutoModel`], and from the command line.