Add a check regarding the number of occurrences of ``` (#18389)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
This commit is contained in:
Yih-Dar
2022-08-01 14:23:02 +02:00
committed by GitHub
parent 1cd7c6f154
commit bd6d1b4300
26 changed files with 77 additions and 75 deletions

View File

@@ -1879,7 +1879,7 @@ class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin, TFGenerationMixin, Pu
Increasing the size will add newly initialized vectors at the end. Reducing the size will remove Increasing the size will add newly initialized vectors at the end. Reducing the size will remove
vectors from the end. If not provided or `None`, just returns a pointer to the input tokens vectors from the end. If not provided or `None`, just returns a pointer to the input tokens
``tf.Variable``` module of the model without doing anything. `tf.Variable` module of the model without doing anything.
Return: Return:
`tf.Variable`: Pointer to the resized Embedding Module or the old Embedding Module if `new_num_tokens` is `tf.Variable`: Pointer to the resized Embedding Module or the old Embedding Module if `new_num_tokens` is

View File

@@ -1221,7 +1221,7 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PushToHubMix
Increasing the size will add newly initialized vectors at the end. Reducing the size will remove Increasing the size will add newly initialized vectors at the end. Reducing the size will remove
vectors from the end. If not provided or `None`, just returns a pointer to the input tokens vectors from the end. If not provided or `None`, just returns a pointer to the input tokens
``torch.nn.Embedding``` module of the model without doing anything. `torch.nn.Embedding` module of the model without doing anything.
Return: Return:
`torch.nn.Embedding`: Pointer to the resized Embedding Module or the old Embedding Module if `torch.nn.Embedding`: Pointer to the resized Embedding Module or the old Embedding Module if
@@ -1285,9 +1285,9 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PushToHubMix
Increasing the size will add newly initialized vectors at the end. Reducing the size will remove Increasing the size will add newly initialized vectors at the end. Reducing the size will remove
vectors from the end. If not provided or `None`, just returns a pointer to the input tokens vectors from the end. If not provided or `None`, just returns a pointer to the input tokens
``torch.nn.Linear``` module of the model without doing anything. transposed (`bool`, *optional*, `torch.nn.Linear` module of the model without doing anything. transposed (`bool`, *optional*, defaults
defaults to `False`): Whether `old_lm_head` is transposed or not. If True `old_lm_head.size()` is to `False`): Whether `old_lm_head` is transposed or not. If True `old_lm_head.size()` is `lm_head_dim,
`lm_head_dim, vocab_size` else `vocab_size, lm_head_dim`. vocab_size` else `vocab_size, lm_head_dim`.
Return: Return:
`torch.nn.Linear`: Pointer to the resized Linear Module or the old Linear Module if `new_num_tokens` is `torch.nn.Linear`: Pointer to the resized Linear Module or the old Linear Module if `new_num_tokens` is

View File

@@ -910,11 +910,11 @@ class TFBartDecoder(tf.keras.layers.Layer):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more you can choose to directly pass an embedded representation. This is useful if you want more control
control over how to convert `input_ids` indices into associated vectors than the model's internal over how to convert `input_ids` indices into associated vectors than the model's internal embedding
embedding lookup matrix. lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.

View File

@@ -894,11 +894,11 @@ class TFBlenderbotDecoder(tf.keras.layers.Layer):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more you can choose to directly pass an embedded representation. This is useful if you want more control
control over how to convert `input_ids` indices into associated vectors than the model's internal over how to convert `input_ids` indices into associated vectors than the model's internal embedding
embedding lookup matrix. lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value

View File

@@ -898,11 +898,11 @@ class TFBlenderbotSmallDecoder(tf.keras.layers.Layer):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more you can choose to directly pass an embedded representation. This is useful if you want more control
control over how to convert `input_ids` indices into associated vectors than the model's internal over how to convert `input_ids` indices into associated vectors than the model's internal embedding
embedding lookup matrix. lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value

View File

@@ -825,7 +825,7 @@ DEBERTA_START_DOCSTRING = r"""
This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
and behavior.``` and behavior.
Parameters: Parameters:

View File

@@ -920,7 +920,7 @@ DEBERTA_START_DOCSTRING = r"""
This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
and behavior.``` and behavior.
Parameters: Parameters:

View File

@@ -297,7 +297,7 @@ class CustomDPRReaderTokenizerMixin:
spans in the same passage. It corresponds to the sum of the start and end logits of the span. spans in the same passage. It corresponds to the sum of the start and end logits of the span.
- **relevance_score**: `float` that corresponds to the score of the each passage to answer the question, - **relevance_score**: `float` that corresponds to the score of the each passage to answer the question,
compared to all the other passages. It corresponds to the output of the QA classifier of the DPRReader. compared to all the other passages. It corresponds to the output of the QA classifier of the DPRReader.
- **doc_id**: ``int``` the id of the passage. - **start_index**: `int` the start index of the span - **doc_id**: `int` the id of the passage. - **start_index**: `int` the start index of the span
(inclusive). - **end_index**: `int` the end index of the span (inclusive). (inclusive). - **end_index**: `int` the end index of the span (inclusive).
Examples: Examples:

View File

@@ -297,7 +297,7 @@ class CustomDPRReaderTokenizerMixin:
spans in the same passage. It corresponds to the sum of the start and end logits of the span. spans in the same passage. It corresponds to the sum of the start and end logits of the span.
- **relevance_score**: `float` that corresponds to the score of the each passage to answer the question, - **relevance_score**: `float` that corresponds to the score of the each passage to answer the question,
compared to all the other passages. It corresponds to the output of the QA classifier of the DPRReader. compared to all the other passages. It corresponds to the output of the QA classifier of the DPRReader.
- **doc_id**: ``int``` the id of the passage. - ***start_index**: `int` the start index of the span - **doc_id**: `int` the id of the passage. - ***start_index**: `int` the start index of the span
(inclusive). - **end_index**: `int` the end index of the span (inclusive). (inclusive). - **end_index**: `int` the end index of the span (inclusive).
Examples: Examples:

View File

@@ -2009,8 +2009,8 @@ class LEDDecoder(LEDPreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of
of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
control over how to convert `input_ids` indices into associated vectors than the model's internal control over how to convert `input_ids` indices into associated vectors than the model's internal
embedding lookup matrix. embedding lookup matrix.

View File

@@ -1991,7 +1991,7 @@ class TFLEDDecoder(tf.keras.layers.Layer):
Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up
decoding. If `past_key_values` are used, the user can optionally input only the last decoding. If `past_key_values` are used, the user can optionally input only the last
`decoder_input_ids` (those that don't have their past key value states given to this model) of shape `decoder_input_ids` (those that don't have their past key value states given to this model) of shape
`(batch_size, 1)` instead of all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. `(batch_size, 1)` instead of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
inputs_embeds (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): inputs_embeds (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
This is useful if you want more control over how to convert `input_ids` indices into associated vectors This is useful if you want more control over how to convert `input_ids` indices into associated vectors

View File

@@ -646,11 +646,10 @@ M2M_100_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you
you can choose to directly pass an embedded representation. This is useful if you want more control over can choose to directly pass an embedded representation. This is useful if you want more control over how to
how to convert `input_ids` indices into associated vectors than the model's internal embedding lookup convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix.
matrix.
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
@@ -952,8 +951,8 @@ class M2M100Decoder(M2M100PreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of
of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
control over how to convert `input_ids` indices into associated vectors than the model's internal control over how to convert `input_ids` indices into associated vectors than the model's internal
embedding lookup matrix. embedding lookup matrix.

View File

@@ -937,11 +937,11 @@ class TFMarianDecoder(tf.keras.layers.Layer):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more you can choose to directly pass an embedded representation. This is useful if you want more control
control over how to convert `input_ids` indices into associated vectors than the model's internal over how to convert `input_ids` indices into associated vectors than the model's internal embedding
embedding lookup matrix. lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value

View File

@@ -927,11 +927,11 @@ class TFMBartDecoder(tf.keras.layers.Layer):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more you can choose to directly pass an embedded representation. This is useful if you want more control
control over how to convert `input_ids` indices into associated vectors than the model's internal over how to convert `input_ids` indices into associated vectors than the model's internal embedding
embedding lookup matrix. lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value

View File

@@ -57,8 +57,8 @@ class MBartTokenizer(PreTrainedTokenizer):
Adapted from [`RobertaTokenizer`] and [`XLNetTokenizer`]. Based on Adapted from [`RobertaTokenizer`] and [`XLNetTokenizer`]. Based on
[SentencePiece](https://github.com/google/sentencepiece). [SentencePiece](https://github.com/google/sentencepiece).
The tokenization method is `<tokens> <eos> <language code>` for source language documents, and ``<language code> The tokenization method is `<tokens> <eos> <language code>` for source language documents, and `<language code>
<tokens> <eos>``` for target language documents. <tokens> <eos>` for target language documents.
Examples: Examples:

View File

@@ -68,8 +68,8 @@ class MBartTokenizerFast(PreTrainedTokenizerFast):
This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should
refer to this superclass for more information regarding those methods. refer to this superclass for more information regarding those methods.
The tokenization method is `<tokens> <eos> <language code>` for source language documents, and ``<language code> The tokenization method is `<tokens> <eos> <language code>` for source language documents, and `<language code>
<tokens> <eos>``` for target language documents. <tokens> <eos>` for target language documents.
Examples: Examples:

View File

@@ -598,7 +598,7 @@ class TFOPTDecoder(tf.keras.layers.Layer):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
inputs_embeds (`tf.Tensor` of inputs_embeds (`tf.Tensor` of
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more

View File

@@ -943,11 +943,11 @@ class TFPegasusDecoder(tf.keras.layers.Layer):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more you can choose to directly pass an embedded representation. This is useful if you want more control
control over how to convert `input_ids` indices into associated vectors than the model's internal over how to convert `input_ids` indices into associated vectors than the model's internal embedding
embedding lookup matrix. lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value

View File

@@ -100,8 +100,8 @@ class PLBartTokenizer(PreTrainedTokenizer):
Adapted from [`RobertaTokenizer`] and [`XLNetTokenizer`]. Based on Adapted from [`RobertaTokenizer`] and [`XLNetTokenizer`]. Based on
[SentencePiece](https://github.com/google/sentencepiece). [SentencePiece](https://github.com/google/sentencepiece).
The tokenization method is `<tokens> <eos> <language code>` for source language documents, and ``<language code> The tokenization method is `<tokens> <eos> <language code>` for source language documents, and `<language code>
<tokens> <eos>``` for target language documents. <tokens> <eos>` for target language documents.
Args: Args:
vocab_file (`str`): vocab_file (`str`):

View File

@@ -201,7 +201,7 @@ class RetriBertModel(RetriBertPreTrainedModel):
Indices of input sequence tokens in the vocabulary for the documents in a batch. Indices of input sequence tokens in the vocabulary for the documents in a batch.
attention_mask_doc (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*): attention_mask_doc (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*):
Mask to avoid performing attention on documents padding token indices. Mask to avoid performing attention on documents padding token indices.
checkpoint_batch_size (`int`, *optional*, defaults to ```-1`): checkpoint_batch_size (`int`, *optional*, defaults to `-1`):
If greater than 0, uses gradient checkpointing to only compute sequence representation on If greater than 0, uses gradient checkpointing to only compute sequence representation on
`checkpoint_batch_size` examples at a time on the GPU. All query representations are still compared to `checkpoint_batch_size` examples at a time on the GPU. All query representations are still compared to
all document representations in the batch. all document representations in the batch.

View File

@@ -663,8 +663,8 @@ SPEECH_TO_TEXT_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
``decoder_input_ids``` of shape `(batch_size, sequence_length)`. decoder_inputs_embeds (`torch.FloatTensor` `decoder_input_ids` of shape `(batch_size, sequence_length)`. decoder_inputs_embeds (`torch.FloatTensor` of
of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): Optionally, instead of passing shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
`decoder_input_ids` you can choose to directly pass an embedded representation. If `past_key_values` is `decoder_input_ids` you can choose to directly pass an embedded representation. If `past_key_values` is
used, optionally only the last `decoder_inputs_embeds` have to be input (see `past_key_values`). This is used, optionally only the last `decoder_inputs_embeds` have to be input (see `past_key_values`). This is
useful if you want more control over how to convert `decoder_input_ids` indices into associated vectors useful if you want more control over how to convert `decoder_input_ids` indices into associated vectors
@@ -965,8 +965,8 @@ class Speech2TextDecoder(Speech2TextPreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of
of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
control over how to convert `input_ids` indices into associated vectors than the model's internal control over how to convert `input_ids` indices into associated vectors than the model's internal
embedding lookup matrix. embedding lookup matrix.

View File

@@ -1002,11 +1002,11 @@ class TFSpeech2TextDecoder(tf.keras.layers.Layer):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more you can choose to directly pass an embedded representation. This is useful if you want more control
control over how to convert `input_ids` indices into associated vectors than the model's internal over how to convert `input_ids` indices into associated vectors than the model's internal embedding
embedding lookup matrix. lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.

View File

@@ -572,8 +572,8 @@ class Speech2Text2Decoder(Speech2Text2PreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of
of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
control over how to convert `input_ids` indices into associated vectors than the model's internal control over how to convert `input_ids` indices into associated vectors than the model's internal
embedding lookup matrix. embedding lookup matrix.

View File

@@ -90,11 +90,11 @@ XGLM_INPUTS_DOCSTRING = r"""
blocks) that can be used (see `past_key_values` input) to speed up sequential decoding. blocks) that can be used (see `past_key_values` input) to speed up sequential decoding.
If `past_key_values` are used, the user can optionally input only the last `input_ids` (those that don't If `past_key_values` are used, the user can optionally input only the last `input_ids` (those that don't
have their past key value states given to this model) of shape `(batch_size, 1)` instead of all have their past key value states given to this model) of shape `(batch_size, 1)` instead of all `input_ids`
``input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `(batch_size,
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you can choose to
can choose to directly pass an embedded representation. This is useful if you want more control over how to directly pass an embedded representation. This is useful if you want more control over how to convert
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. `input_ids` indices into associated vectors than the model's internal embedding lookup matrix.
inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. If Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. If
`past_key_values` is used, optionally only the last `inputs_embeds` have to be input (see `past_key_values` is used, optionally only the last `inputs_embeds` have to be input (see

View File

@@ -2136,7 +2136,7 @@ class {{cookiecutter.camelcase_modelname}}PreTrainedModel(PreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids`
(those that don't have their past key value states given to this model) of shape `(batch_size, 1)` (those that don't have their past key value states given to this model) of shape `(batch_size, 1)`
instead of all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert `input_ids` indices into associated instead of all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert `input_ids` indices into associated
vectors than the model's internal embedding lookup matrix. vectors than the model's internal embedding lookup matrix.
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
@@ -2483,7 +2483,7 @@ class {{cookiecutter.camelcase_modelname}}Decoder({{cookiecutter.camelcase_model
If `past_key_values` are used, the user can optionally input only the last If `past_key_values` are used, the user can optionally input only the last
`decoder_input_ids` (those that don't have their past key value states given to this model) of `decoder_input_ids` (those that don't have their past key value states given to this model) of
shape `(batch_size, 1)` instead of all ``decoder_input_ids``` of shape `(batch_size, shape `(batch_size, 1)` instead of all `decoder_input_ids` of shape `(batch_size,
sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert `input_ids` indices sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert `input_ids` indices
into associated vectors than the model's internal embedding lookup matrix. into associated vectors than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):

View File

@@ -92,6 +92,9 @@ def process_doc_file(code_file, add_new_line=True):
# fmt: off # fmt: off
splits = code.split("```") splits = code.split("```")
if len(splits) % 2 != 1:
raise ValueError("The number of occurrences of ``` should be an even number.")
splits = [s if i % 2 == 0 else process_code_block(s, add_new_line=add_new_line) for i, s in enumerate(splits)] splits = [s if i % 2 == 0 else process_code_block(s, add_new_line=add_new_line) for i, s in enumerate(splits)]
clean_code = "```".join(splits) clean_code = "```".join(splits)
# fmt: on # fmt: on