Add a check regarding the number of occurrences of ``` (#18389)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
This commit is contained in:
@@ -1879,7 +1879,7 @@ class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin, TFGenerationMixin, Pu
|
||||
|
||||
Increasing the size will add newly initialized vectors at the end. Reducing the size will remove
|
||||
vectors from the end. If not provided or `None`, just returns a pointer to the input tokens
|
||||
``tf.Variable``` module of the model without doing anything.
|
||||
`tf.Variable` module of the model without doing anything.
|
||||
|
||||
Return:
|
||||
`tf.Variable`: Pointer to the resized Embedding Module or the old Embedding Module if `new_num_tokens` is
|
||||
|
||||
@@ -1221,7 +1221,7 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PushToHubMix
|
||||
|
||||
Increasing the size will add newly initialized vectors at the end. Reducing the size will remove
|
||||
vectors from the end. If not provided or `None`, just returns a pointer to the input tokens
|
||||
``torch.nn.Embedding``` module of the model without doing anything.
|
||||
`torch.nn.Embedding` module of the model without doing anything.
|
||||
|
||||
Return:
|
||||
`torch.nn.Embedding`: Pointer to the resized Embedding Module or the old Embedding Module if
|
||||
@@ -1285,9 +1285,9 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PushToHubMix
|
||||
|
||||
Increasing the size will add newly initialized vectors at the end. Reducing the size will remove
|
||||
vectors from the end. If not provided or `None`, just returns a pointer to the input tokens
|
||||
``torch.nn.Linear``` module of the model without doing anything. transposed (`bool`, *optional*,
|
||||
defaults to `False`): Whether `old_lm_head` is transposed or not. If True `old_lm_head.size()` is
|
||||
`lm_head_dim, vocab_size` else `vocab_size, lm_head_dim`.
|
||||
`torch.nn.Linear` module of the model without doing anything. transposed (`bool`, *optional*, defaults
|
||||
to `False`): Whether `old_lm_head` is transposed or not. If True `old_lm_head.size()` is `lm_head_dim,
|
||||
vocab_size` else `vocab_size, lm_head_dim`.
|
||||
|
||||
Return:
|
||||
`torch.nn.Linear`: Pointer to the resized Linear Module or the old Linear Module if `new_num_tokens` is
|
||||
|
||||
@@ -910,11 +910,11 @@ class TFBartDecoder(tf.keras.layers.Layer):
|
||||
|
||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of
|
||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
||||
embedding lookup matrix.
|
||||
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
|
||||
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
|
||||
you can choose to directly pass an embedded representation. This is useful if you want more control
|
||||
over how to convert `input_ids` indices into associated vectors than the model's internal embedding
|
||||
lookup matrix.
|
||||
output_attentions (`bool`, *optional*):
|
||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
||||
returned tensors for more detail.
|
||||
|
||||
@@ -894,11 +894,11 @@ class TFBlenderbotDecoder(tf.keras.layers.Layer):
|
||||
|
||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of
|
||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
||||
embedding lookup matrix.
|
||||
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
|
||||
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
|
||||
you can choose to directly pass an embedded representation. This is useful if you want more control
|
||||
over how to convert `input_ids` indices into associated vectors than the model's internal embedding
|
||||
lookup matrix.
|
||||
output_attentions (`bool`, *optional*):
|
||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
||||
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
|
||||
|
||||
@@ -898,11 +898,11 @@ class TFBlenderbotSmallDecoder(tf.keras.layers.Layer):
|
||||
|
||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of
|
||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
||||
embedding lookup matrix.
|
||||
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
|
||||
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
|
||||
you can choose to directly pass an embedded representation. This is useful if you want more control
|
||||
over how to convert `input_ids` indices into associated vectors than the model's internal embedding
|
||||
lookup matrix.
|
||||
output_attentions (`bool`, *optional*):
|
||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
||||
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
|
||||
|
||||
@@ -825,7 +825,7 @@ DEBERTA_START_DOCSTRING = r"""
|
||||
|
||||
This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
|
||||
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
|
||||
and behavior.```
|
||||
and behavior.
|
||||
|
||||
|
||||
Parameters:
|
||||
|
||||
@@ -920,7 +920,7 @@ DEBERTA_START_DOCSTRING = r"""
|
||||
|
||||
This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
|
||||
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
|
||||
and behavior.```
|
||||
and behavior.
|
||||
|
||||
|
||||
Parameters:
|
||||
|
||||
@@ -297,7 +297,7 @@ class CustomDPRReaderTokenizerMixin:
|
||||
spans in the same passage. It corresponds to the sum of the start and end logits of the span.
|
||||
- **relevance_score**: `float` that corresponds to the score of the each passage to answer the question,
|
||||
compared to all the other passages. It corresponds to the output of the QA classifier of the DPRReader.
|
||||
- **doc_id**: ``int``` the id of the passage. - **start_index**: `int` the start index of the span
|
||||
- **doc_id**: `int` the id of the passage. - **start_index**: `int` the start index of the span
|
||||
(inclusive). - **end_index**: `int` the end index of the span (inclusive).
|
||||
|
||||
Examples:
|
||||
|
||||
@@ -297,7 +297,7 @@ class CustomDPRReaderTokenizerMixin:
|
||||
spans in the same passage. It corresponds to the sum of the start and end logits of the span.
|
||||
- **relevance_score**: `float` that corresponds to the score of the each passage to answer the question,
|
||||
compared to all the other passages. It corresponds to the output of the QA classifier of the DPRReader.
|
||||
- **doc_id**: ``int``` the id of the passage. - ***start_index**: `int` the start index of the span
|
||||
- **doc_id**: `int` the id of the passage. - ***start_index**: `int` the start index of the span
|
||||
(inclusive). - **end_index**: `int` the end index of the span (inclusive).
|
||||
|
||||
Examples:
|
||||
|
||||
@@ -2009,8 +2009,8 @@ class LEDDecoder(LEDPreTrainedModel):
|
||||
|
||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor`
|
||||
of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of
|
||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
||||
embedding lookup matrix.
|
||||
|
||||
@@ -1991,7 +1991,7 @@ class TFLEDDecoder(tf.keras.layers.Layer):
|
||||
Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up
|
||||
decoding. If `past_key_values` are used, the user can optionally input only the last
|
||||
`decoder_input_ids` (those that don't have their past key value states given to this model) of shape
|
||||
`(batch_size, 1)` instead of all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`.
|
||||
`(batch_size, 1)` instead of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
|
||||
inputs_embeds (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
|
||||
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
|
||||
This is useful if you want more control over how to convert `input_ids` indices into associated vectors
|
||||
|
||||
@@ -646,11 +646,10 @@ M2M_100_INPUTS_DOCSTRING = r"""
|
||||
|
||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
|
||||
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
|
||||
``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of
|
||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
|
||||
you can choose to directly pass an embedded representation. This is useful if you want more control over
|
||||
how to convert `input_ids` indices into associated vectors than the model's internal embedding lookup
|
||||
matrix.
|
||||
`decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape
|
||||
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you
|
||||
can choose to directly pass an embedded representation. This is useful if you want more control over how to
|
||||
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix.
|
||||
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
|
||||
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
|
||||
representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
|
||||
@@ -952,8 +951,8 @@ class M2M100Decoder(M2M100PreTrainedModel):
|
||||
|
||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor`
|
||||
of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of
|
||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
||||
embedding lookup matrix.
|
||||
|
||||
@@ -937,11 +937,11 @@ class TFMarianDecoder(tf.keras.layers.Layer):
|
||||
|
||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of
|
||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
||||
embedding lookup matrix.
|
||||
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
|
||||
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
|
||||
you can choose to directly pass an embedded representation. This is useful if you want more control
|
||||
over how to convert `input_ids` indices into associated vectors than the model's internal embedding
|
||||
lookup matrix.
|
||||
output_attentions (`bool`, *optional*):
|
||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
||||
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
|
||||
|
||||
@@ -927,11 +927,11 @@ class TFMBartDecoder(tf.keras.layers.Layer):
|
||||
|
||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of
|
||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
||||
embedding lookup matrix.
|
||||
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
|
||||
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
|
||||
you can choose to directly pass an embedded representation. This is useful if you want more control
|
||||
over how to convert `input_ids` indices into associated vectors than the model's internal embedding
|
||||
lookup matrix.
|
||||
output_attentions (`bool`, *optional*):
|
||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
||||
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
|
||||
|
||||
@@ -57,8 +57,8 @@ class MBartTokenizer(PreTrainedTokenizer):
|
||||
Adapted from [`RobertaTokenizer`] and [`XLNetTokenizer`]. Based on
|
||||
[SentencePiece](https://github.com/google/sentencepiece).
|
||||
|
||||
The tokenization method is `<tokens> <eos> <language code>` for source language documents, and ``<language code>
|
||||
<tokens> <eos>``` for target language documents.
|
||||
The tokenization method is `<tokens> <eos> <language code>` for source language documents, and `<language code>
|
||||
<tokens> <eos>` for target language documents.
|
||||
|
||||
Examples:
|
||||
|
||||
|
||||
@@ -68,8 +68,8 @@ class MBartTokenizerFast(PreTrainedTokenizerFast):
|
||||
This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should
|
||||
refer to this superclass for more information regarding those methods.
|
||||
|
||||
The tokenization method is `<tokens> <eos> <language code>` for source language documents, and ``<language code>
|
||||
<tokens> <eos>``` for target language documents.
|
||||
The tokenization method is `<tokens> <eos> <language code>` for source language documents, and `<language code>
|
||||
<tokens> <eos>` for target language documents.
|
||||
|
||||
Examples:
|
||||
|
||||
|
||||
@@ -598,7 +598,7 @@ class TFOPTDecoder(tf.keras.layers.Layer):
|
||||
|
||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`.
|
||||
all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
|
||||
inputs_embeds (`tf.Tensor` of
|
||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
||||
|
||||
@@ -943,11 +943,11 @@ class TFPegasusDecoder(tf.keras.layers.Layer):
|
||||
|
||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of
|
||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
||||
embedding lookup matrix.
|
||||
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
|
||||
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
|
||||
you can choose to directly pass an embedded representation. This is useful if you want more control
|
||||
over how to convert `input_ids` indices into associated vectors than the model's internal embedding
|
||||
lookup matrix.
|
||||
output_attentions (`bool`, *optional*):
|
||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
||||
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
|
||||
|
||||
@@ -100,8 +100,8 @@ class PLBartTokenizer(PreTrainedTokenizer):
|
||||
Adapted from [`RobertaTokenizer`] and [`XLNetTokenizer`]. Based on
|
||||
[SentencePiece](https://github.com/google/sentencepiece).
|
||||
|
||||
The tokenization method is `<tokens> <eos> <language code>` for source language documents, and ``<language code>
|
||||
<tokens> <eos>``` for target language documents.
|
||||
The tokenization method is `<tokens> <eos> <language code>` for source language documents, and `<language code>
|
||||
<tokens> <eos>` for target language documents.
|
||||
|
||||
Args:
|
||||
vocab_file (`str`):
|
||||
|
||||
@@ -201,7 +201,7 @@ class RetriBertModel(RetriBertPreTrainedModel):
|
||||
Indices of input sequence tokens in the vocabulary for the documents in a batch.
|
||||
attention_mask_doc (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*):
|
||||
Mask to avoid performing attention on documents padding token indices.
|
||||
checkpoint_batch_size (`int`, *optional*, defaults to ```-1`):
|
||||
checkpoint_batch_size (`int`, *optional*, defaults to `-1`):
|
||||
If greater than 0, uses gradient checkpointing to only compute sequence representation on
|
||||
`checkpoint_batch_size` examples at a time on the GPU. All query representations are still compared to
|
||||
all document representations in the batch.
|
||||
|
||||
@@ -663,8 +663,8 @@ SPEECH_TO_TEXT_INPUTS_DOCSTRING = r"""
|
||||
|
||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
|
||||
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
|
||||
``decoder_input_ids``` of shape `(batch_size, sequence_length)`. decoder_inputs_embeds (`torch.FloatTensor`
|
||||
of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||
`decoder_input_ids` of shape `(batch_size, sequence_length)`. decoder_inputs_embeds (`torch.FloatTensor` of
|
||||
shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||
`decoder_input_ids` you can choose to directly pass an embedded representation. If `past_key_values` is
|
||||
used, optionally only the last `decoder_inputs_embeds` have to be input (see `past_key_values`). This is
|
||||
useful if you want more control over how to convert `decoder_input_ids` indices into associated vectors
|
||||
@@ -965,8 +965,8 @@ class Speech2TextDecoder(Speech2TextPreTrainedModel):
|
||||
|
||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor`
|
||||
of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of
|
||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
||||
embedding lookup matrix.
|
||||
|
||||
@@ -1002,11 +1002,11 @@ class TFSpeech2TextDecoder(tf.keras.layers.Layer):
|
||||
|
||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of
|
||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
||||
embedding lookup matrix.
|
||||
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
|
||||
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
|
||||
you can choose to directly pass an embedded representation. This is useful if you want more control
|
||||
over how to convert `input_ids` indices into associated vectors than the model's internal embedding
|
||||
lookup matrix.
|
||||
output_attentions (`bool`, *optional*):
|
||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
||||
returned tensors for more detail.
|
||||
|
||||
@@ -572,8 +572,8 @@ class Speech2Text2Decoder(Speech2Text2PreTrainedModel):
|
||||
|
||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor`
|
||||
of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of
|
||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
||||
embedding lookup matrix.
|
||||
|
||||
@@ -90,11 +90,11 @@ XGLM_INPUTS_DOCSTRING = r"""
|
||||
blocks) that can be used (see `past_key_values` input) to speed up sequential decoding.
|
||||
|
||||
If `past_key_values` are used, the user can optionally input only the last `input_ids` (those that don't
|
||||
have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
|
||||
``input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape
|
||||
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you
|
||||
can choose to directly pass an embedded representation. This is useful if you want more control over how to
|
||||
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix.
|
||||
have their past key value states given to this model) of shape `(batch_size, 1)` instead of all `input_ids`
|
||||
of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `(batch_size,
|
||||
sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you can choose to
|
||||
directly pass an embedded representation. This is useful if you want more control over how to convert
|
||||
`input_ids` indices into associated vectors than the model's internal embedding lookup matrix.
|
||||
inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
|
||||
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. If
|
||||
`past_key_values` is used, optionally only the last `inputs_embeds` have to be input (see
|
||||
|
||||
@@ -2136,7 +2136,7 @@ class {{cookiecutter.camelcase_modelname}}PreTrainedModel(PreTrainedModel):
|
||||
|
||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids`
|
||||
(those that don't have their past key value states given to this model) of shape `(batch_size, 1)`
|
||||
instead of all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert `input_ids` indices into associated
|
||||
instead of all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert `input_ids` indices into associated
|
||||
vectors than the model's internal embedding lookup matrix.
|
||||
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
|
||||
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
|
||||
@@ -2483,7 +2483,7 @@ class {{cookiecutter.camelcase_modelname}}Decoder({{cookiecutter.camelcase_model
|
||||
|
||||
If `past_key_values` are used, the user can optionally input only the last
|
||||
`decoder_input_ids` (those that don't have their past key value states given to this model) of
|
||||
shape `(batch_size, 1)` instead of all ``decoder_input_ids``` of shape `(batch_size,
|
||||
shape `(batch_size, 1)` instead of all `decoder_input_ids` of shape `(batch_size,
|
||||
sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert `input_ids` indices
|
||||
into associated vectors than the model's internal embedding lookup matrix.
|
||||
output_attentions (`bool`, *optional*):
|
||||
|
||||
@@ -92,6 +92,9 @@ def process_doc_file(code_file, add_new_line=True):
|
||||
|
||||
# fmt: off
|
||||
splits = code.split("```")
|
||||
if len(splits) % 2 != 1:
|
||||
raise ValueError("The number of occurrences of ``` should be an even number.")
|
||||
|
||||
splits = [s if i % 2 == 0 else process_code_block(s, add_new_line=add_new_line) for i, s in enumerate(splits)]
|
||||
clean_code = "```".join(splits)
|
||||
# fmt: on
|
||||
|
||||
Reference in New Issue
Block a user