Generate: move generation_*.py src files into generation/*.py (#20096)

* move generation_*.py src files into generation/*.py * populate generation.__init__ with lazy loading * move imports and references from generation.xxx.object to generation.object
2022-11-09 15:34:08 +00:00
parent bac2d29a80
commit f270b960d6
116 changed files with 9471 additions and 9095 deletions
--- a/docs/source/en/internal/generation_utils.mdx
+++ b/docs/source/en/internal/generation_utils.mdx
@@ -12,22 +12,22 @@ specific language governing permissions and limitations under the License.

 # Utilities for Generation

-This page lists all the utility functions used by [`~generation_utils.GenerationMixin.generate`],
-[`~generation_utils.GenerationMixin.greedy_search`],
-[`~generation_utils.GenerationMixin.contrastive_search`],
-[`~generation_utils.GenerationMixin.sample`],
-[`~generation_utils.GenerationMixin.beam_search`],
-[`~generation_utils.GenerationMixin.beam_sample`],
-[`~generation_utils.GenerationMixin.group_beam_search`], and
-[`~generation_utils.GenerationMixin.constrained_beam_search`].
+This page lists all the utility functions used by [`~generation.GenerationMixin.generate`],
+[`~generation.GenerationMixin.greedy_search`],
+[`~generation.GenerationMixin.contrastive_search`],
+[`~generation.GenerationMixin.sample`],
+[`~generation.GenerationMixin.beam_search`],
+[`~generation.GenerationMixin.beam_sample`],
+[`~generation.GenerationMixin.group_beam_search`], and
+[`~generation.GenerationMixin.constrained_beam_search`].

 Most of those are only useful if you are studying the code of the generate methods in the library.

 ## Generate Outputs

-The output of [`~generation_utils.GenerationMixin.generate`] is an instance of a subclass of
+The output of [`~generation.GenerationMixin.generate`] is an instance of a subclass of
 [`~utils.ModelOutput`]. This output is a data structure containing all the information returned
-by [`~generation_utils.GenerationMixin.generate`], but that can also be used as tuple or dictionary.
+by [`~generation.GenerationMixin.generate`], but that can also be used as tuple or dictionary.

 Here's an example:

@@ -41,7 +41,7 @@ inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
 generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
 ```

-The `generation_output` object is a [`~generation_utils.GreedySearchDecoderOnlyOutput`], as we can
+The `generation_output` object is a [`~generation.GreedySearchDecoderOnlyOutput`], as we can
 see in the documentation of that class below, it means it has the following attributes:

 - `sequences`: the generated sequences of tokens
@@ -73,31 +73,31 @@ We document here all output types.

 ### GreedySearchOutput

-[[autodoc]] generation_utils.GreedySearchDecoderOnlyOutput
+[[autodoc]] generation.GreedySearchDecoderOnlyOutput

-[[autodoc]] generation_utils.GreedySearchEncoderDecoderOutput
+[[autodoc]] generation.GreedySearchEncoderDecoderOutput

-[[autodoc]] generation_flax_utils.FlaxGreedySearchOutput
+[[autodoc]] generation.FlaxGreedySearchOutput

 ### SampleOutput

-[[autodoc]] generation_utils.SampleDecoderOnlyOutput
+[[autodoc]] generation.SampleDecoderOnlyOutput

-[[autodoc]] generation_utils.SampleEncoderDecoderOutput
+[[autodoc]] generation.SampleEncoderDecoderOutput

-[[autodoc]] generation_flax_utils.FlaxSampleOutput
+[[autodoc]] generation.FlaxSampleOutput

 ### BeamSearchOutput

-[[autodoc]] generation_utils.BeamSearchDecoderOnlyOutput
+[[autodoc]] generation.BeamSearchDecoderOnlyOutput

-[[autodoc]] generation_utils.BeamSearchEncoderDecoderOutput
+[[autodoc]] generation.BeamSearchEncoderDecoderOutput

 ### BeamSampleOutput

-[[autodoc]] generation_utils.BeamSampleDecoderOnlyOutput
+[[autodoc]] generation.BeamSampleDecoderOnlyOutput

-[[autodoc]] generation_utils.BeamSampleEncoderDecoderOutput
+[[autodoc]] generation.BeamSampleEncoderDecoderOutput

 ## LogitsProcessor

--- a/docs/source/en/main_classes/model.mdx
+++ b/docs/source/en/main_classes/model.mdx
@@ -25,9 +25,9 @@ are common among all the models to:

 The other methods that are common to each model are defined in [`~modeling_utils.ModuleUtilsMixin`]
 (for the PyTorch models) and [`~modeling_tf_utils.TFModuleUtilsMixin`] (for the TensorFlow models) or
-for text generation, [`~generation_utils.GenerationMixin`] (for the PyTorch models),
-[`~generation_tf_utils.TFGenerationMixin`] (for the TensorFlow models) and
-[`~generation_flax_utils.FlaxGenerationMixin`] (for the Flax/JAX models).
+for text generation, [`~generation.GenerationMixin`] (for the PyTorch models),
+[`~generation.TFGenerationMixin`] (for the TensorFlow models) and
+[`~generation.FlaxGenerationMixin`] (for the Flax/JAX models).


 ## PreTrainedModel
--- a/docs/source/en/main_classes/text_generation.mdx
+++ b/docs/source/en/main_classes/text_generation.mdx
@@ -14,13 +14,13 @@ specific language governing permissions and limitations under the License.

 Each framework has a generate method for auto-regressive text generation implemented in their respective `GenerationMixin` class:

- PyTorch [`~generation_utils.GenerationMixin.generate`] is implemented in [`~generation_utils.GenerationMixin`].
- TensorFlow [`~generation_tf_utils.TFGenerationMixin.generate`] is implemented in [`~generation_tf_utils.TFGenerationMixin`].
- Flax/JAX [`~generation_flax_utils.FlaxGenerationMixin.generate`] is implemented in [`~generation_flax_utils.FlaxGenerationMixin`].
+- PyTorch [`~generation.GenerationMixin.generate`] is implemented in [`~generation.GenerationMixin`].
+- TensorFlow [`~generation.TFGenerationMixin.generate`] is implemented in [`~generation.TFGenerationMixin`].
+- Flax/JAX [`~generation.FlaxGenerationMixin.generate`] is implemented in [`~generation.FlaxGenerationMixin`].

 ## GenerationMixin

-[[autodoc]] generation_utils.GenerationMixin
+[[autodoc]] generation.GenerationMixin
 	- generate
 	- greedy_search
 	- sample
@@ -32,10 +32,10 @@ Each framework has a generate method for auto-regressive text generation impleme

 ## TFGenerationMixin

-[[autodoc]] generation_tf_utils.TFGenerationMixin
+[[autodoc]] generation.TFGenerationMixin
 	- generate

 ## FlaxGenerationMixin

-[[autodoc]] generation_flax_utils.FlaxGenerationMixin
+[[autodoc]] generation.FlaxGenerationMixin
 	- generate
--- a/docs/source/en/model_doc/bart.mdx
+++ b/docs/source/en/model_doc/bart.mdx
@@ -58,7 +58,7 @@ This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The
 - Model predictions are intended to be identical to the original implementation when
  `forced_bos_token_id=0`. This only works, however, if the string you pass to
  [`fairseq.encode`] starts with a space.
- [`~generation_utils.GenerationMixin.generate`] should be used for conditional generation tasks like
+- [`~generation.GenerationMixin.generate`] should be used for conditional generation tasks like
  summarization, see the example in that docstrings.
 - Models that load the *facebook/bart-large-cnn* weights will not have a `mask_token_id`, or be able to perform
  mask-filling tasks.
@@ -188,4 +188,4 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
 ## FlaxBartForCausalLM

 [[autodoc]] FlaxBartForCausalLM
-    - __call__
+    - __call__
--- a/docs/source/en/model_doc/donut.mdx
+++ b/docs/source/en/model_doc/donut.mdx
@@ -23,7 +23,7 @@ The abstract from the paper is the following:
 *Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and a holistic understanding of the document. Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs. Although such OCR-based approaches have shown promising performance, they suffer from 1) high computational costs for using OCR; 2) inflexibility of OCR models on languages or types of document; 3) OCR error propagation to the subsequent process. To address these issues, in this paper, we introduce a novel OCR-free VDU model named Donut, which stands for Document understanding transformer. As the first step in OCR-free VDU research, we propose a simple architecture (i.e., Transformer) with a pre-training objective (i.e., cross-entropy loss). Donut is conceptually simple yet effective. Through extensive experiments and analyses, we show a simple OCR-free VDU model, Donut, achieves state-of-the-art performances on various VDU tasks in terms of both speed and accuracy. In addition, we offer a synthetic data generator that helps the model pre-training to be flexible in various languages and domains.*

 <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/donut_architecture.jpg"
-alt="drawing" width="600"/> 
+alt="drawing" width="600"/>

 <small> Donut high-level overview. Taken from the <a href="https://arxiv.org/abs/2111.15664">original paper</a>. </small>

@@ -40,7 +40,7 @@ Tips:
 ## Inference

 Donut's [`VisionEncoderDecoder`] model accepts images as input and makes use of
-[`~generation_utils.GenerationMixin.generate`] to autoregressively generate text given the input image.
+[`~generation.GenerationMixin.generate`] to autoregressively generate text given the input image.

 The [`DonutFeatureExtractor`] class is responsible for preprocessing the input image and
 [`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`] decodes the generated target tokens to the target string. The
@@ -211,4 +211,4 @@ We refer to the [tutorial notebooks](https://github.com/NielsRogge/Transformers-
 ## DonutSwinModel

 [[autodoc]] DonutSwinModel
-    - forward
+    - forward
--- a/docs/source/en/model_doc/gptj.mdx
+++ b/docs/source/en/model_doc/gptj.mdx
@@ -53,7 +53,7 @@ Tips:

 ### Generation

-The [`~generation_utils.GenerationMixin.generate`] method can be used to generate text using GPT-J
+The [`~generation.GenerationMixin.generate`] method can be used to generate text using GPT-J
 model.

 ```python
--- a/docs/source/en/model_doc/speech_to_text_2.mdx
+++ b/docs/source/en/model_doc/speech_to_text_2.mdx
@@ -38,7 +38,7 @@ Tips:
 ## Inference

 Speech2Text2's [`SpeechEncoderDecoderModel`] model accepts raw waveform input values from speech and
-makes use of [`~generation_utils.GenerationMixin.generate`] to translate the input speech
+makes use of [`~generation.GenerationMixin.generate`] to translate the input speech
 autoregressively to the target language.

 The [`Wav2Vec2FeatureExtractor`] class is responsible for preprocessing the input speech and
--- a/docs/source/en/model_doc/t5.mdx
+++ b/docs/source/en/model_doc/t5.mdx
@@ -225,7 +225,7 @@ batch) leads to very slow training on TPU.

 ## Inference

-At inference time, it is recommended to use [`~generation_utils.GenerationMixin.generate`]. This
+At inference time, it is recommended to use [`~generation.GenerationMixin.generate`]. This
 method takes care of encoding the input and feeding the encoded hidden states via cross-attention layers to the decoder
 and auto-regressively generates the decoder output. Check out [this blog post](https://huggingface.co/blog/how-to-generate) to know all the details about generating text with Transformers.
 There's also [this blog post](https://huggingface.co/blog/encoder-decoder#encoder-decoder) which explains how
@@ -244,7 +244,7 @@ Das Haus ist wunderbar.
 ```

 Note that T5 uses the `pad_token_id` as the `decoder_start_token_id`, so when doing generation without using
-[`~generation_utils.GenerationMixin.generate`], make sure you start it with the `pad_token_id`.
+[`~generation.GenerationMixin.generate`], make sure you start it with the `pad_token_id`.

 The example above only shows a single example. You can also do batched inference, like so:

--- a/docs/source/en/model_doc/trocr.mdx
+++ b/docs/source/en/model_doc/trocr.mdx
@@ -30,7 +30,7 @@ show that the TrOCR model outperforms the current state-of-the-art models on bot
 tasks.*

 <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/trocr_architecture.jpg"
-alt="drawing" width="600"/> 
+alt="drawing" width="600"/>

 <small> TrOCR architecture. Taken from the <a href="https://arxiv.org/abs/2109.10282">original paper</a>. </small>

@@ -53,7 +53,7 @@ Tips:
 ## Inference

 TrOCR's [`VisionEncoderDecoder`] model accepts images as input and makes use of
-[`~generation_utils.GenerationMixin.generate`] to autoregressively generate text given the input image.
+[`~generation.GenerationMixin.generate`] to autoregressively generate text given the input image.

 The [`ViTFeatureExtractor`/`DeiTFeatureExtractor`] class is responsible for preprocessing the input image and
 [`RobertaTokenizer`/`XLMRobertaTokenizer`] decodes the generated target tokens to the target string. The
@@ -64,20 +64,20 @@ into a single instance to both extract the input features and decode the predict

 ``` py
 >>> from transformers import TrOCRProcessor, VisionEncoderDecoderModel
->>> import requests 
+>>> import requests
 >>> from PIL import Image

->>> processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten") 
+>>> processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
 >>> model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-handwritten")

->>> # load image from the IAM dataset 
->>> url = "https://fki.tic.heia-fr.ch/static/img/a01-122-02.jpg" 
+>>> # load image from the IAM dataset
+>>> url = "https://fki.tic.heia-fr.ch/static/img/a01-122-02.jpg"
 >>> image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

->>> pixel_values = processor(image, return_tensors="pt").pixel_values 
+>>> pixel_values = processor(image, return_tensors="pt").pixel_values
 >>> generated_ids = model.generate(pixel_values)

->>> generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] 
+>>> generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ```

 See the [model hub](https://huggingface.co/models?filter=trocr) to look for TrOCR checkpoints.
--- a/docs/source/en/model_doc/whisper.mdx
+++ b/docs/source/en/model_doc/whisper.mdx
@@ -24,7 +24,7 @@ The abstract from the paper is the following:
 Tips:

 - The model usually performs well without requiring any finetuning.
- The architecture follows a classic encoder-decoder architecture, which means that it relies on the [`~generation_utils.GenerationMixin.generate`] function for inference.
+- The architecture follows a classic encoder-decoder architecture, which means that it relies on the [`~generation.GenerationMixin.generate`] function for inference.
 - Inference is currently only implemented for short-form i.e. audio is pre-segmented into <=30s segments. Long-form (including timestamps) will be implemented in a future release.
 - One can use [`WhisperProcessor`] to prepare audio for the model, and decode the predicted ID's back into text.

--- a/docs/source/en/pipeline_tutorial.mdx
+++ b/docs/source/en/pipeline_tutorial.mdx
@@ -56,7 +56,7 @@ If you have more than one input, pass your input as a list:
 ... )  # doctest: +SKIP
 ```

-Any additional parameters for your task can also be included in the [`pipeline`]. The `text-generation` task has a [`~generation_utils.GenerationMixin.generate`] method with several parameters for controlling the output. For example, if you want to generate more than one output, set the `num_return_sequences` parameter:
+Any additional parameters for your task can also be included in the [`pipeline`]. The `text-generation` task has a [`~generation.GenerationMixin.generate`] method with several parameters for controlling the output. For example, if you want to generate more than one output, set the `num_return_sequences` parameter:

 ```py
 >>> generator(
--- a/docs/source/en/task_summary.mdx
+++ b/docs/source/en/task_summary.mdx
@@ -544,7 +544,7 @@ Hugging Face is based in DUMBO, New York City, and ...
 This outputs a (hopefully) coherent next token following the original sequence, which in our case is the word *is* or
 *features*.

-In the next section, we show how [`generation_utils.GenerationMixin.generate`] can be used to
+In the next section, we show how [`generation.GenerationMixin.generate`] can be used to
 generate multiple tokens up to a specified length instead of one token at a time.

 ### Text Generation
@@ -1094,10 +1094,10 @@ The following examples demonstrate how to use a [`pipeline`] and a model and tok
 ...     images="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
 ... )
 >>> print("\n".join([f"Class {d['label']} with score {round(d['score'], 4)}" for d in result]))
-Class lynx, catamount with score 0.4335                                                    
+Class lynx, catamount with score 0.4335
 Class cougar, puma, catamount, mountain lion, painter, panther, Felis concolor with score 0.0348
-Class snow leopard, ounce, Panthera uncia with score 0.0324          
-Class Egyptian cat with score 0.0239                                                       
+Class snow leopard, ounce, Panthera uncia with score 0.0324
+Class Egyptian cat with score 0.0239
 Class tiger cat with score 0.0229
 ```