Generate: move generation_*.py src files into generation/*.py (#20096)
* move generation_*.py src files into generation/*.py * populate generation.__init__ with lazy loading * move imports and references from generation.xxx.object to generation.object
This commit is contained in:
@@ -12,22 +12,22 @@ specific language governing permissions and limitations under the License.
|
||||
|
||||
# Utilities for Generation
|
||||
|
||||
This page lists all the utility functions used by [`~generation_utils.GenerationMixin.generate`],
|
||||
[`~generation_utils.GenerationMixin.greedy_search`],
|
||||
[`~generation_utils.GenerationMixin.contrastive_search`],
|
||||
[`~generation_utils.GenerationMixin.sample`],
|
||||
[`~generation_utils.GenerationMixin.beam_search`],
|
||||
[`~generation_utils.GenerationMixin.beam_sample`],
|
||||
[`~generation_utils.GenerationMixin.group_beam_search`], and
|
||||
[`~generation_utils.GenerationMixin.constrained_beam_search`].
|
||||
This page lists all the utility functions used by [`~generation.GenerationMixin.generate`],
|
||||
[`~generation.GenerationMixin.greedy_search`],
|
||||
[`~generation.GenerationMixin.contrastive_search`],
|
||||
[`~generation.GenerationMixin.sample`],
|
||||
[`~generation.GenerationMixin.beam_search`],
|
||||
[`~generation.GenerationMixin.beam_sample`],
|
||||
[`~generation.GenerationMixin.group_beam_search`], and
|
||||
[`~generation.GenerationMixin.constrained_beam_search`].
|
||||
|
||||
Most of those are only useful if you are studying the code of the generate methods in the library.
|
||||
|
||||
## Generate Outputs
|
||||
|
||||
The output of [`~generation_utils.GenerationMixin.generate`] is an instance of a subclass of
|
||||
The output of [`~generation.GenerationMixin.generate`] is an instance of a subclass of
|
||||
[`~utils.ModelOutput`]. This output is a data structure containing all the information returned
|
||||
by [`~generation_utils.GenerationMixin.generate`], but that can also be used as tuple or dictionary.
|
||||
by [`~generation.GenerationMixin.generate`], but that can also be used as tuple or dictionary.
|
||||
|
||||
Here's an example:
|
||||
|
||||
@@ -41,7 +41,7 @@ inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
|
||||
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
|
||||
```
|
||||
|
||||
The `generation_output` object is a [`~generation_utils.GreedySearchDecoderOnlyOutput`], as we can
|
||||
The `generation_output` object is a [`~generation.GreedySearchDecoderOnlyOutput`], as we can
|
||||
see in the documentation of that class below, it means it has the following attributes:
|
||||
|
||||
- `sequences`: the generated sequences of tokens
|
||||
@@ -73,31 +73,31 @@ We document here all output types.
|
||||
|
||||
### GreedySearchOutput
|
||||
|
||||
[[autodoc]] generation_utils.GreedySearchDecoderOnlyOutput
|
||||
[[autodoc]] generation.GreedySearchDecoderOnlyOutput
|
||||
|
||||
[[autodoc]] generation_utils.GreedySearchEncoderDecoderOutput
|
||||
[[autodoc]] generation.GreedySearchEncoderDecoderOutput
|
||||
|
||||
[[autodoc]] generation_flax_utils.FlaxGreedySearchOutput
|
||||
[[autodoc]] generation.FlaxGreedySearchOutput
|
||||
|
||||
### SampleOutput
|
||||
|
||||
[[autodoc]] generation_utils.SampleDecoderOnlyOutput
|
||||
[[autodoc]] generation.SampleDecoderOnlyOutput
|
||||
|
||||
[[autodoc]] generation_utils.SampleEncoderDecoderOutput
|
||||
[[autodoc]] generation.SampleEncoderDecoderOutput
|
||||
|
||||
[[autodoc]] generation_flax_utils.FlaxSampleOutput
|
||||
[[autodoc]] generation.FlaxSampleOutput
|
||||
|
||||
### BeamSearchOutput
|
||||
|
||||
[[autodoc]] generation_utils.BeamSearchDecoderOnlyOutput
|
||||
[[autodoc]] generation.BeamSearchDecoderOnlyOutput
|
||||
|
||||
[[autodoc]] generation_utils.BeamSearchEncoderDecoderOutput
|
||||
[[autodoc]] generation.BeamSearchEncoderDecoderOutput
|
||||
|
||||
### BeamSampleOutput
|
||||
|
||||
[[autodoc]] generation_utils.BeamSampleDecoderOnlyOutput
|
||||
[[autodoc]] generation.BeamSampleDecoderOnlyOutput
|
||||
|
||||
[[autodoc]] generation_utils.BeamSampleEncoderDecoderOutput
|
||||
[[autodoc]] generation.BeamSampleEncoderDecoderOutput
|
||||
|
||||
## LogitsProcessor
|
||||
|
||||
|
||||
@@ -25,9 +25,9 @@ are common among all the models to:
|
||||
|
||||
The other methods that are common to each model are defined in [`~modeling_utils.ModuleUtilsMixin`]
|
||||
(for the PyTorch models) and [`~modeling_tf_utils.TFModuleUtilsMixin`] (for the TensorFlow models) or
|
||||
for text generation, [`~generation_utils.GenerationMixin`] (for the PyTorch models),
|
||||
[`~generation_tf_utils.TFGenerationMixin`] (for the TensorFlow models) and
|
||||
[`~generation_flax_utils.FlaxGenerationMixin`] (for the Flax/JAX models).
|
||||
for text generation, [`~generation.GenerationMixin`] (for the PyTorch models),
|
||||
[`~generation.TFGenerationMixin`] (for the TensorFlow models) and
|
||||
[`~generation.FlaxGenerationMixin`] (for the Flax/JAX models).
|
||||
|
||||
|
||||
## PreTrainedModel
|
||||
|
||||
@@ -14,13 +14,13 @@ specific language governing permissions and limitations under the License.
|
||||
|
||||
Each framework has a generate method for auto-regressive text generation implemented in their respective `GenerationMixin` class:
|
||||
|
||||
- PyTorch [`~generation_utils.GenerationMixin.generate`] is implemented in [`~generation_utils.GenerationMixin`].
|
||||
- TensorFlow [`~generation_tf_utils.TFGenerationMixin.generate`] is implemented in [`~generation_tf_utils.TFGenerationMixin`].
|
||||
- Flax/JAX [`~generation_flax_utils.FlaxGenerationMixin.generate`] is implemented in [`~generation_flax_utils.FlaxGenerationMixin`].
|
||||
- PyTorch [`~generation.GenerationMixin.generate`] is implemented in [`~generation.GenerationMixin`].
|
||||
- TensorFlow [`~generation.TFGenerationMixin.generate`] is implemented in [`~generation.TFGenerationMixin`].
|
||||
- Flax/JAX [`~generation.FlaxGenerationMixin.generate`] is implemented in [`~generation.FlaxGenerationMixin`].
|
||||
|
||||
## GenerationMixin
|
||||
|
||||
[[autodoc]] generation_utils.GenerationMixin
|
||||
[[autodoc]] generation.GenerationMixin
|
||||
- generate
|
||||
- greedy_search
|
||||
- sample
|
||||
@@ -32,10 +32,10 @@ Each framework has a generate method for auto-regressive text generation impleme
|
||||
|
||||
## TFGenerationMixin
|
||||
|
||||
[[autodoc]] generation_tf_utils.TFGenerationMixin
|
||||
[[autodoc]] generation.TFGenerationMixin
|
||||
- generate
|
||||
|
||||
## FlaxGenerationMixin
|
||||
|
||||
[[autodoc]] generation_flax_utils.FlaxGenerationMixin
|
||||
[[autodoc]] generation.FlaxGenerationMixin
|
||||
- generate
|
||||
|
||||
@@ -58,7 +58,7 @@ This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The
|
||||
- Model predictions are intended to be identical to the original implementation when
|
||||
`forced_bos_token_id=0`. This only works, however, if the string you pass to
|
||||
[`fairseq.encode`] starts with a space.
|
||||
- [`~generation_utils.GenerationMixin.generate`] should be used for conditional generation tasks like
|
||||
- [`~generation.GenerationMixin.generate`] should be used for conditional generation tasks like
|
||||
summarization, see the example in that docstrings.
|
||||
- Models that load the *facebook/bart-large-cnn* weights will not have a `mask_token_id`, or be able to perform
|
||||
mask-filling tasks.
|
||||
@@ -188,4 +188,4 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
|
||||
## FlaxBartForCausalLM
|
||||
|
||||
[[autodoc]] FlaxBartForCausalLM
|
||||
- __call__
|
||||
- __call__
|
||||
|
||||
@@ -23,7 +23,7 @@ The abstract from the paper is the following:
|
||||
*Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and a holistic understanding of the document. Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs. Although such OCR-based approaches have shown promising performance, they suffer from 1) high computational costs for using OCR; 2) inflexibility of OCR models on languages or types of document; 3) OCR error propagation to the subsequent process. To address these issues, in this paper, we introduce a novel OCR-free VDU model named Donut, which stands for Document understanding transformer. As the first step in OCR-free VDU research, we propose a simple architecture (i.e., Transformer) with a pre-training objective (i.e., cross-entropy loss). Donut is conceptually simple yet effective. Through extensive experiments and analyses, we show a simple OCR-free VDU model, Donut, achieves state-of-the-art performances on various VDU tasks in terms of both speed and accuracy. In addition, we offer a synthetic data generator that helps the model pre-training to be flexible in various languages and domains.*
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/donut_architecture.jpg"
|
||||
alt="drawing" width="600"/>
|
||||
alt="drawing" width="600"/>
|
||||
|
||||
<small> Donut high-level overview. Taken from the <a href="https://arxiv.org/abs/2111.15664">original paper</a>. </small>
|
||||
|
||||
@@ -40,7 +40,7 @@ Tips:
|
||||
## Inference
|
||||
|
||||
Donut's [`VisionEncoderDecoder`] model accepts images as input and makes use of
|
||||
[`~generation_utils.GenerationMixin.generate`] to autoregressively generate text given the input image.
|
||||
[`~generation.GenerationMixin.generate`] to autoregressively generate text given the input image.
|
||||
|
||||
The [`DonutFeatureExtractor`] class is responsible for preprocessing the input image and
|
||||
[`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`] decodes the generated target tokens to the target string. The
|
||||
@@ -211,4 +211,4 @@ We refer to the [tutorial notebooks](https://github.com/NielsRogge/Transformers-
|
||||
## DonutSwinModel
|
||||
|
||||
[[autodoc]] DonutSwinModel
|
||||
- forward
|
||||
- forward
|
||||
|
||||
@@ -53,7 +53,7 @@ Tips:
|
||||
|
||||
### Generation
|
||||
|
||||
The [`~generation_utils.GenerationMixin.generate`] method can be used to generate text using GPT-J
|
||||
The [`~generation.GenerationMixin.generate`] method can be used to generate text using GPT-J
|
||||
model.
|
||||
|
||||
```python
|
||||
|
||||
@@ -38,7 +38,7 @@ Tips:
|
||||
## Inference
|
||||
|
||||
Speech2Text2's [`SpeechEncoderDecoderModel`] model accepts raw waveform input values from speech and
|
||||
makes use of [`~generation_utils.GenerationMixin.generate`] to translate the input speech
|
||||
makes use of [`~generation.GenerationMixin.generate`] to translate the input speech
|
||||
autoregressively to the target language.
|
||||
|
||||
The [`Wav2Vec2FeatureExtractor`] class is responsible for preprocessing the input speech and
|
||||
|
||||
@@ -225,7 +225,7 @@ batch) leads to very slow training on TPU.
|
||||
|
||||
## Inference
|
||||
|
||||
At inference time, it is recommended to use [`~generation_utils.GenerationMixin.generate`]. This
|
||||
At inference time, it is recommended to use [`~generation.GenerationMixin.generate`]. This
|
||||
method takes care of encoding the input and feeding the encoded hidden states via cross-attention layers to the decoder
|
||||
and auto-regressively generates the decoder output. Check out [this blog post](https://huggingface.co/blog/how-to-generate) to know all the details about generating text with Transformers.
|
||||
There's also [this blog post](https://huggingface.co/blog/encoder-decoder#encoder-decoder) which explains how
|
||||
@@ -244,7 +244,7 @@ Das Haus ist wunderbar.
|
||||
```
|
||||
|
||||
Note that T5 uses the `pad_token_id` as the `decoder_start_token_id`, so when doing generation without using
|
||||
[`~generation_utils.GenerationMixin.generate`], make sure you start it with the `pad_token_id`.
|
||||
[`~generation.GenerationMixin.generate`], make sure you start it with the `pad_token_id`.
|
||||
|
||||
The example above only shows a single example. You can also do batched inference, like so:
|
||||
|
||||
|
||||
@@ -30,7 +30,7 @@ show that the TrOCR model outperforms the current state-of-the-art models on bot
|
||||
tasks.*
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/trocr_architecture.jpg"
|
||||
alt="drawing" width="600"/>
|
||||
alt="drawing" width="600"/>
|
||||
|
||||
<small> TrOCR architecture. Taken from the <a href="https://arxiv.org/abs/2109.10282">original paper</a>. </small>
|
||||
|
||||
@@ -53,7 +53,7 @@ Tips:
|
||||
## Inference
|
||||
|
||||
TrOCR's [`VisionEncoderDecoder`] model accepts images as input and makes use of
|
||||
[`~generation_utils.GenerationMixin.generate`] to autoregressively generate text given the input image.
|
||||
[`~generation.GenerationMixin.generate`] to autoregressively generate text given the input image.
|
||||
|
||||
The [`ViTFeatureExtractor`/`DeiTFeatureExtractor`] class is responsible for preprocessing the input image and
|
||||
[`RobertaTokenizer`/`XLMRobertaTokenizer`] decodes the generated target tokens to the target string. The
|
||||
@@ -64,20 +64,20 @@ into a single instance to both extract the input features and decode the predict
|
||||
|
||||
``` py
|
||||
>>> from transformers import TrOCRProcessor, VisionEncoderDecoderModel
|
||||
>>> import requests
|
||||
>>> import requests
|
||||
>>> from PIL import Image
|
||||
|
||||
>>> processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
|
||||
>>> processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
|
||||
>>> model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-handwritten")
|
||||
|
||||
>>> # load image from the IAM dataset
|
||||
>>> url = "https://fki.tic.heia-fr.ch/static/img/a01-122-02.jpg"
|
||||
>>> # load image from the IAM dataset
|
||||
>>> url = "https://fki.tic.heia-fr.ch/static/img/a01-122-02.jpg"
|
||||
>>> image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
|
||||
|
||||
>>> pixel_values = processor(image, return_tensors="pt").pixel_values
|
||||
>>> pixel_values = processor(image, return_tensors="pt").pixel_values
|
||||
>>> generated_ids = model.generate(pixel_values)
|
||||
|
||||
>>> generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
||||
>>> generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
||||
```
|
||||
|
||||
See the [model hub](https://huggingface.co/models?filter=trocr) to look for TrOCR checkpoints.
|
||||
|
||||
@@ -24,7 +24,7 @@ The abstract from the paper is the following:
|
||||
Tips:
|
||||
|
||||
- The model usually performs well without requiring any finetuning.
|
||||
- The architecture follows a classic encoder-decoder architecture, which means that it relies on the [`~generation_utils.GenerationMixin.generate`] function for inference.
|
||||
- The architecture follows a classic encoder-decoder architecture, which means that it relies on the [`~generation.GenerationMixin.generate`] function for inference.
|
||||
- Inference is currently only implemented for short-form i.e. audio is pre-segmented into <=30s segments. Long-form (including timestamps) will be implemented in a future release.
|
||||
- One can use [`WhisperProcessor`] to prepare audio for the model, and decode the predicted ID's back into text.
|
||||
|
||||
|
||||
@@ -56,7 +56,7 @@ If you have more than one input, pass your input as a list:
|
||||
... ) # doctest: +SKIP
|
||||
```
|
||||
|
||||
Any additional parameters for your task can also be included in the [`pipeline`]. The `text-generation` task has a [`~generation_utils.GenerationMixin.generate`] method with several parameters for controlling the output. For example, if you want to generate more than one output, set the `num_return_sequences` parameter:
|
||||
Any additional parameters for your task can also be included in the [`pipeline`]. The `text-generation` task has a [`~generation.GenerationMixin.generate`] method with several parameters for controlling the output. For example, if you want to generate more than one output, set the `num_return_sequences` parameter:
|
||||
|
||||
```py
|
||||
>>> generator(
|
||||
|
||||
@@ -544,7 +544,7 @@ Hugging Face is based in DUMBO, New York City, and ...
|
||||
This outputs a (hopefully) coherent next token following the original sequence, which in our case is the word *is* or
|
||||
*features*.
|
||||
|
||||
In the next section, we show how [`generation_utils.GenerationMixin.generate`] can be used to
|
||||
In the next section, we show how [`generation.GenerationMixin.generate`] can be used to
|
||||
generate multiple tokens up to a specified length instead of one token at a time.
|
||||
|
||||
### Text Generation
|
||||
@@ -1094,10 +1094,10 @@ The following examples demonstrate how to use a [`pipeline`] and a model and tok
|
||||
... images="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
|
||||
... )
|
||||
>>> print("\n".join([f"Class {d['label']} with score {round(d['score'], 4)}" for d in result]))
|
||||
Class lynx, catamount with score 0.4335
|
||||
Class lynx, catamount with score 0.4335
|
||||
Class cougar, puma, catamount, mountain lion, painter, panther, Felis concolor with score 0.0348
|
||||
Class snow leopard, ounce, Panthera uncia with score 0.0324
|
||||
Class Egyptian cat with score 0.0239
|
||||
Class snow leopard, ounce, Panthera uncia with score 0.0324
|
||||
Class Egyptian cat with score 0.0239
|
||||
Class tiger cat with score 0.0229
|
||||
```
|
||||
|
||||
|
||||
Reference in New Issue
Block a user