From be7aa3210b4d78f382fd1831680143a4a6846c79 Mon Sep 17 00:00:00 2001 From: RogerSinghChugh <35698080+RogerSinghChugh@users.noreply.github.com> Date: Wed, 28 May 2025 00:21:41 +0530 Subject: [PATCH] New bart model card (#37858) * Modified BART documentation wrt to issue #36979. * Modified BART documentation wrt to issue #36979. * fixed a typo. * Update docs/source/en/model_doc/bart.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bart.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bart.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bart.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bart.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bart.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * blank commit. --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/model_doc/bart.md | 151 +++++++++++++------------------ 1 file changed, 61 insertions(+), 90 deletions(-) diff --git a/docs/source/en/model_doc/bart.md b/docs/source/en/model_doc/bart.md index b24daa3e6e..d269b391cc 100644 --- a/docs/source/en/model_doc/bart.md +++ b/docs/source/en/model_doc/bart.md @@ -14,116 +14,87 @@ rendered properly in your Markdown viewer. --> -# BART -
-PyTorch -TensorFlow -Flax -FlashAttention -SDPA +
+
+ PyTorch + TensorFlow + Flax + FlashAttention + SDPA
-## Overview +# BART +[BART](https://huggingface.co/papers/1910.13461) is a sequence-to-sequence model that combines the pretraining objectives from BERT and GPT. It’s pretrained by corrupting text in different ways like deleting words, shuffling sentences, or masking tokens and learning how to fix it. The encoder encodes the corrupted document and the corrupted text is fixed by the decoder. As it learns to recover the original text, BART gets really good at both understanding and generating language. -The Bart model was proposed in [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, -Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan -Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer on 29 Oct, 2019. +You can find all the original BART checkpoints under the [AI at Meta](https://huggingface.co/facebook?search_models=bart) organization. -According to the abstract, +The example below demonstrates how to predict the `[MASK]` token with [`Pipeline`], [`AutoModel`], and from the command line. -- Bart uses a standard seq2seq/machine translation architecture with a bidirectional encoder (like BERT) and a - left-to-right decoder (like GPT). -- The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme, - where spans of text are replaced with a single mask token. -- BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It - matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new - state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains - of up to 6 ROUGE. + + -This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The authors' code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/bart). +```py +import torch +from transformers import pipeline -## Usage tips: +pipeline = pipeline( + task="fill-mask", + model="facebook/bart-large", + torch_dtype=torch.float16, + device=0 +) +pipeline("Plants create through a process known as photosynthesis.") -- BART is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than - the left. -- Sequence-to-sequence model with an encoder and a decoder. Encoder is fed a corrupted version of the tokens, decoder is fed the original tokens (but has a mask to hide the future words like a regular transformers decoder). A composition of the following transformations are applied on the pretraining tasks for the encoder: +``` + + - * mask random tokens (like in BERT) - * delete random tokens - * mask a span of k tokens with a single mask token (a span of 0 tokens is an insertion of a mask token) - * permute sentences - * rotate the document to make it start at a specific token -- The `head_mask` argument is ignored when using all attention implementation other than "eager". If you have a `head_mask` and want it to have effect, load the model with `XXXModel.from_pretrained(model_id, attn_implementation="eager")` +```py +import torch +from transformers import AutoModelForMaskedLM, AutoTokenizer -## Implementation Notes +tokenizer = AutoTokenizer.from_pretrained( + "facebook/bart-large", +) +model = AutoModelForMaskedLM.from_pretrained( + "facebook/bart-large", + torch_dtype=torch.float16, + device_map="auto", + attn_implementation="sdpa" +) +inputs = tokenizer("Plants create through a process known as photosynthesis.", return_tensors="pt").to("cuda") -- Bart doesn't use `token_type_ids` for sequence classification. Use [`BartTokenizer`] or - [`~BartTokenizer.encode`] to get the proper splitting. -- The forward pass of [`BartModel`] will create the `decoder_input_ids` if they are not passed. - This is different than some other modeling APIs. A typical use case of this feature is mask filling. -- Model predictions are intended to be identical to the original implementation when - `forced_bos_token_id=0`. This only works, however, if the string you pass to - [`fairseq.encode`] starts with a space. -- [`~generation.GenerationMixin.generate`] should be used for conditional generation tasks like - summarization, see the example in that docstrings. -- Models that load the *facebook/bart-large-cnn* weights will not have a `mask_token_id`, or be able to perform - mask-filling tasks. +with torch.no_grad(): + outputs = model(**inputs) + predictions = outputs.logits -## Mask Filling +masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1] +predicted_token_id = predictions[0, masked_index].argmax(dim=-1) +predicted_token = tokenizer.decode(predicted_token_id) -The `facebook/bart-base` and `facebook/bart-large` checkpoints can be used to fill multi-token masks. - -```python -from transformers import BartForConditionalGeneration, BartTokenizer - -model = BartForConditionalGeneration.from_pretrained("facebook/bart-large", forced_bos_token_id=0) -tok = BartTokenizer.from_pretrained("facebook/bart-large") -example_english_phrase = "UN Chief Says There Is No in Syria" -batch = tok(example_english_phrase, return_tensors="pt") -generated_ids = model.generate(batch["input_ids"]) -assert tok.batch_decode(generated_ids, skip_special_tokens=True) == [ - "UN Chief Says There Is No Plan to Stop Chemical Weapons in Syria" -] +print(f"The predicted token is: {predicted_token}") ``` -## Resources + + -A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with BART. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. +```bash +echo -e "Plants create through a process known as photosynthesis." | transformers-cli run --task fill-mask --model facebook/bart-large --device 0 +``` - + + -- A blog post on [Distributed Training: Train BART/T5 for Summarization using πŸ€— Transformers and Amazon SageMaker](https://huggingface.co/blog/sagemaker-distributed-training-seq2seq). -- A notebook on how to [finetune BART for summarization with fastai using blurr](https://colab.research.google.com/github/ohmeow/ohmeow_website/blob/master/posts/2021-05-25-mbart-sequence-classification-with-blurr.ipynb). 🌎 -- A notebook on how to [finetune BART for summarization in two languages with Trainer class](https://colab.research.google.com/github/elsanns/xai-nlp-notebooks/blob/master/fine_tune_bart_summarization_two_langs.ipynb). 🌎 -- [`BartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization.ipynb). -- [`TFBartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/summarization) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization-tf.ipynb). -- [`FlaxBartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/summarization). -- An example of how to train [`BartForConditionalGeneration`] with a Hugging Face `datasets` object can be found in this [forum discussion](https://discuss.huggingface.co/t/train-bart-for-conditional-generation-e-g-summarization/1904) -- [Summarization](https://huggingface.co/course/chapter7/5?fw=pt#summarization) chapter of the πŸ€— Hugging Face course. -- [Summarization task guide](../tasks/summarization) +## Notes - - -- [`BartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling#robertabertdistilbert-and-masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb). -- [`TFBartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/language-modeling#run_mlmpy) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb). -- [`FlaxBartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/language-modeling#masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/masked_language_modeling_flax.ipynb). -- [Masked language modeling](https://huggingface.co/course/chapter7/3?fw=pt) chapter of the πŸ€— Hugging Face Course. -- [Masked language modeling task guide](../tasks/masked_language_modeling) - - - -- A notebook on how to [finetune mBART using Seq2SeqTrainer for Hindi to English translation](https://colab.research.google.com/github/vasudevgupta7/huggingface-tutorials/blob/main/translation_training.ipynb). 🌎 -- [`BartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/translation) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/translation.ipynb). -- [`TFBartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/translation) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/translation-tf.ipynb). -- [Translation task guide](../tasks/translation) - -See also: -- [Text classification task guide](../tasks/sequence_classification) -- [Question answering task guide](../tasks/question_answering) -- [Causal language modeling task guide](../tasks/language_modeling) -- [Distilled checkpoints](https://huggingface.co/models?search=distilbart) are described in this [paper](https://arxiv.org/abs/2010.13002). +- Inputs should be padded on the right because BERT uses absolute position embeddings. +- The [facebook/bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn) checkpoint doesn't include `mask_token_id` which means it can't perform mask-filling tasks. +- BART doesn’t use `token_type_ids` for sequence classification. Use [`BartTokenizer`] or [`~PreTrainedTokenizerBase.encode`] to get the proper splitting. +- The forward pass of [`BartModel`] creates the `decoder_input_ids` if they're not passed. This can be different from other model APIs, but it is a useful feature for mask-filling tasks. +- Model predictions are intended to be identical to the original implementation when `forced_bos_token_id=0`. This only works if the text passed to `fairseq.encode` begins with a space. +- [`~GenerationMixin.generate`] should be used for conditional generation tasks like summarization. ## BartConfig