Add a check regarding the number of occurrences of ``` (#18389)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
This commit is contained in:
@@ -1879,7 +1879,7 @@ class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin, TFGenerationMixin, Pu
|
|||||||
|
|
||||||
Increasing the size will add newly initialized vectors at the end. Reducing the size will remove
|
Increasing the size will add newly initialized vectors at the end. Reducing the size will remove
|
||||||
vectors from the end. If not provided or `None`, just returns a pointer to the input tokens
|
vectors from the end. If not provided or `None`, just returns a pointer to the input tokens
|
||||||
``tf.Variable``` module of the model without doing anything.
|
`tf.Variable` module of the model without doing anything.
|
||||||
|
|
||||||
Return:
|
Return:
|
||||||
`tf.Variable`: Pointer to the resized Embedding Module or the old Embedding Module if `new_num_tokens` is
|
`tf.Variable`: Pointer to the resized Embedding Module or the old Embedding Module if `new_num_tokens` is
|
||||||
|
|||||||
@@ -1221,7 +1221,7 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PushToHubMix
|
|||||||
|
|
||||||
Increasing the size will add newly initialized vectors at the end. Reducing the size will remove
|
Increasing the size will add newly initialized vectors at the end. Reducing the size will remove
|
||||||
vectors from the end. If not provided or `None`, just returns a pointer to the input tokens
|
vectors from the end. If not provided or `None`, just returns a pointer to the input tokens
|
||||||
``torch.nn.Embedding``` module of the model without doing anything.
|
`torch.nn.Embedding` module of the model without doing anything.
|
||||||
|
|
||||||
Return:
|
Return:
|
||||||
`torch.nn.Embedding`: Pointer to the resized Embedding Module or the old Embedding Module if
|
`torch.nn.Embedding`: Pointer to the resized Embedding Module or the old Embedding Module if
|
||||||
@@ -1285,9 +1285,9 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PushToHubMix
|
|||||||
|
|
||||||
Increasing the size will add newly initialized vectors at the end. Reducing the size will remove
|
Increasing the size will add newly initialized vectors at the end. Reducing the size will remove
|
||||||
vectors from the end. If not provided or `None`, just returns a pointer to the input tokens
|
vectors from the end. If not provided or `None`, just returns a pointer to the input tokens
|
||||||
``torch.nn.Linear``` module of the model without doing anything. transposed (`bool`, *optional*,
|
`torch.nn.Linear` module of the model without doing anything. transposed (`bool`, *optional*, defaults
|
||||||
defaults to `False`): Whether `old_lm_head` is transposed or not. If True `old_lm_head.size()` is
|
to `False`): Whether `old_lm_head` is transposed or not. If True `old_lm_head.size()` is `lm_head_dim,
|
||||||
`lm_head_dim, vocab_size` else `vocab_size, lm_head_dim`.
|
vocab_size` else `vocab_size, lm_head_dim`.
|
||||||
|
|
||||||
Return:
|
Return:
|
||||||
`torch.nn.Linear`: Pointer to the resized Linear Module or the old Linear Module if `new_num_tokens` is
|
`torch.nn.Linear`: Pointer to the resized Linear Module or the old Linear Module if `new_num_tokens` is
|
||||||
|
|||||||
@@ -910,11 +910,11 @@ class TFBartDecoder(tf.keras.layers.Layer):
|
|||||||
|
|
||||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of
|
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
|
||||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
|
||||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
you can choose to directly pass an embedded representation. This is useful if you want more control
|
||||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
over how to convert `input_ids` indices into associated vectors than the model's internal embedding
|
||||||
embedding lookup matrix.
|
lookup matrix.
|
||||||
output_attentions (`bool`, *optional*):
|
output_attentions (`bool`, *optional*):
|
||||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
||||||
returned tensors for more detail.
|
returned tensors for more detail.
|
||||||
|
|||||||
@@ -894,11 +894,11 @@ class TFBlenderbotDecoder(tf.keras.layers.Layer):
|
|||||||
|
|
||||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of
|
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
|
||||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
|
||||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
you can choose to directly pass an embedded representation. This is useful if you want more control
|
||||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
over how to convert `input_ids` indices into associated vectors than the model's internal embedding
|
||||||
embedding lookup matrix.
|
lookup matrix.
|
||||||
output_attentions (`bool`, *optional*):
|
output_attentions (`bool`, *optional*):
|
||||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
||||||
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
|
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
|
||||||
|
|||||||
@@ -898,11 +898,11 @@ class TFBlenderbotSmallDecoder(tf.keras.layers.Layer):
|
|||||||
|
|
||||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of
|
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
|
||||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
|
||||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
you can choose to directly pass an embedded representation. This is useful if you want more control
|
||||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
over how to convert `input_ids` indices into associated vectors than the model's internal embedding
|
||||||
embedding lookup matrix.
|
lookup matrix.
|
||||||
output_attentions (`bool`, *optional*):
|
output_attentions (`bool`, *optional*):
|
||||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
||||||
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
|
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
|
||||||
|
|||||||
@@ -825,7 +825,7 @@ DEBERTA_START_DOCSTRING = r"""
|
|||||||
|
|
||||||
This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
|
This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
|
||||||
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
|
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
|
||||||
and behavior.```
|
and behavior.
|
||||||
|
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
|
|||||||
@@ -920,7 +920,7 @@ DEBERTA_START_DOCSTRING = r"""
|
|||||||
|
|
||||||
This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
|
This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
|
||||||
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
|
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
|
||||||
and behavior.```
|
and behavior.
|
||||||
|
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
|
|||||||
@@ -297,7 +297,7 @@ class CustomDPRReaderTokenizerMixin:
|
|||||||
spans in the same passage. It corresponds to the sum of the start and end logits of the span.
|
spans in the same passage. It corresponds to the sum of the start and end logits of the span.
|
||||||
- **relevance_score**: `float` that corresponds to the score of the each passage to answer the question,
|
- **relevance_score**: `float` that corresponds to the score of the each passage to answer the question,
|
||||||
compared to all the other passages. It corresponds to the output of the QA classifier of the DPRReader.
|
compared to all the other passages. It corresponds to the output of the QA classifier of the DPRReader.
|
||||||
- **doc_id**: ``int``` the id of the passage. - **start_index**: `int` the start index of the span
|
- **doc_id**: `int` the id of the passage. - **start_index**: `int` the start index of the span
|
||||||
(inclusive). - **end_index**: `int` the end index of the span (inclusive).
|
(inclusive). - **end_index**: `int` the end index of the span (inclusive).
|
||||||
|
|
||||||
Examples:
|
Examples:
|
||||||
|
|||||||
@@ -297,7 +297,7 @@ class CustomDPRReaderTokenizerMixin:
|
|||||||
spans in the same passage. It corresponds to the sum of the start and end logits of the span.
|
spans in the same passage. It corresponds to the sum of the start and end logits of the span.
|
||||||
- **relevance_score**: `float` that corresponds to the score of the each passage to answer the question,
|
- **relevance_score**: `float` that corresponds to the score of the each passage to answer the question,
|
||||||
compared to all the other passages. It corresponds to the output of the QA classifier of the DPRReader.
|
compared to all the other passages. It corresponds to the output of the QA classifier of the DPRReader.
|
||||||
- **doc_id**: ``int``` the id of the passage. - ***start_index**: `int` the start index of the span
|
- **doc_id**: `int` the id of the passage. - ***start_index**: `int` the start index of the span
|
||||||
(inclusive). - **end_index**: `int` the end index of the span (inclusive).
|
(inclusive). - **end_index**: `int` the end index of the span (inclusive).
|
||||||
|
|
||||||
Examples:
|
Examples:
|
||||||
|
|||||||
@@ -2009,8 +2009,8 @@ class LEDDecoder(LEDPreTrainedModel):
|
|||||||
|
|
||||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor`
|
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of
|
||||||
of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
||||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
||||||
embedding lookup matrix.
|
embedding lookup matrix.
|
||||||
|
|||||||
@@ -1991,7 +1991,7 @@ class TFLEDDecoder(tf.keras.layers.Layer):
|
|||||||
Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up
|
Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up
|
||||||
decoding. If `past_key_values` are used, the user can optionally input only the last
|
decoding. If `past_key_values` are used, the user can optionally input only the last
|
||||||
`decoder_input_ids` (those that don't have their past key value states given to this model) of shape
|
`decoder_input_ids` (those that don't have their past key value states given to this model) of shape
|
||||||
`(batch_size, 1)` instead of all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`.
|
`(batch_size, 1)` instead of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
|
||||||
inputs_embeds (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
|
inputs_embeds (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
|
||||||
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
|
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
|
||||||
This is useful if you want more control over how to convert `input_ids` indices into associated vectors
|
This is useful if you want more control over how to convert `input_ids` indices into associated vectors
|
||||||
|
|||||||
@@ -646,11 +646,10 @@ M2M_100_INPUTS_DOCSTRING = r"""
|
|||||||
|
|
||||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
|
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
|
||||||
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
|
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
|
||||||
``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of
|
`decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape
|
||||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
|
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you
|
||||||
you can choose to directly pass an embedded representation. This is useful if you want more control over
|
can choose to directly pass an embedded representation. This is useful if you want more control over how to
|
||||||
how to convert `input_ids` indices into associated vectors than the model's internal embedding lookup
|
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix.
|
||||||
matrix.
|
|
||||||
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
|
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
|
||||||
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
|
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
|
||||||
representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
|
representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
|
||||||
@@ -952,8 +951,8 @@ class M2M100Decoder(M2M100PreTrainedModel):
|
|||||||
|
|
||||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor`
|
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of
|
||||||
of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
||||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
||||||
embedding lookup matrix.
|
embedding lookup matrix.
|
||||||
|
|||||||
@@ -937,11 +937,11 @@ class TFMarianDecoder(tf.keras.layers.Layer):
|
|||||||
|
|
||||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of
|
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
|
||||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
|
||||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
you can choose to directly pass an embedded representation. This is useful if you want more control
|
||||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
over how to convert `input_ids` indices into associated vectors than the model's internal embedding
|
||||||
embedding lookup matrix.
|
lookup matrix.
|
||||||
output_attentions (`bool`, *optional*):
|
output_attentions (`bool`, *optional*):
|
||||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
||||||
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
|
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
|
||||||
|
|||||||
@@ -927,11 +927,11 @@ class TFMBartDecoder(tf.keras.layers.Layer):
|
|||||||
|
|
||||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of
|
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
|
||||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
|
||||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
you can choose to directly pass an embedded representation. This is useful if you want more control
|
||||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
over how to convert `input_ids` indices into associated vectors than the model's internal embedding
|
||||||
embedding lookup matrix.
|
lookup matrix.
|
||||||
output_attentions (`bool`, *optional*):
|
output_attentions (`bool`, *optional*):
|
||||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
||||||
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
|
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
|
||||||
|
|||||||
@@ -57,8 +57,8 @@ class MBartTokenizer(PreTrainedTokenizer):
|
|||||||
Adapted from [`RobertaTokenizer`] and [`XLNetTokenizer`]. Based on
|
Adapted from [`RobertaTokenizer`] and [`XLNetTokenizer`]. Based on
|
||||||
[SentencePiece](https://github.com/google/sentencepiece).
|
[SentencePiece](https://github.com/google/sentencepiece).
|
||||||
|
|
||||||
The tokenization method is `<tokens> <eos> <language code>` for source language documents, and ``<language code>
|
The tokenization method is `<tokens> <eos> <language code>` for source language documents, and `<language code>
|
||||||
<tokens> <eos>``` for target language documents.
|
<tokens> <eos>` for target language documents.
|
||||||
|
|
||||||
Examples:
|
Examples:
|
||||||
|
|
||||||
|
|||||||
@@ -68,8 +68,8 @@ class MBartTokenizerFast(PreTrainedTokenizerFast):
|
|||||||
This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should
|
This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should
|
||||||
refer to this superclass for more information regarding those methods.
|
refer to this superclass for more information regarding those methods.
|
||||||
|
|
||||||
The tokenization method is `<tokens> <eos> <language code>` for source language documents, and ``<language code>
|
The tokenization method is `<tokens> <eos> <language code>` for source language documents, and `<language code>
|
||||||
<tokens> <eos>``` for target language documents.
|
<tokens> <eos>` for target language documents.
|
||||||
|
|
||||||
Examples:
|
Examples:
|
||||||
|
|
||||||
|
|||||||
@@ -598,7 +598,7 @@ class TFOPTDecoder(tf.keras.layers.Layer):
|
|||||||
|
|
||||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`.
|
all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
|
||||||
inputs_embeds (`tf.Tensor` of
|
inputs_embeds (`tf.Tensor` of
|
||||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
||||||
|
|||||||
@@ -943,11 +943,11 @@ class TFPegasusDecoder(tf.keras.layers.Layer):
|
|||||||
|
|
||||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of
|
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
|
||||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
|
||||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
you can choose to directly pass an embedded representation. This is useful if you want more control
|
||||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
over how to convert `input_ids` indices into associated vectors than the model's internal embedding
|
||||||
embedding lookup matrix.
|
lookup matrix.
|
||||||
output_attentions (`bool`, *optional*):
|
output_attentions (`bool`, *optional*):
|
||||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
||||||
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
|
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
|
||||||
|
|||||||
@@ -100,8 +100,8 @@ class PLBartTokenizer(PreTrainedTokenizer):
|
|||||||
Adapted from [`RobertaTokenizer`] and [`XLNetTokenizer`]. Based on
|
Adapted from [`RobertaTokenizer`] and [`XLNetTokenizer`]. Based on
|
||||||
[SentencePiece](https://github.com/google/sentencepiece).
|
[SentencePiece](https://github.com/google/sentencepiece).
|
||||||
|
|
||||||
The tokenization method is `<tokens> <eos> <language code>` for source language documents, and ``<language code>
|
The tokenization method is `<tokens> <eos> <language code>` for source language documents, and `<language code>
|
||||||
<tokens> <eos>``` for target language documents.
|
<tokens> <eos>` for target language documents.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
vocab_file (`str`):
|
vocab_file (`str`):
|
||||||
|
|||||||
@@ -201,7 +201,7 @@ class RetriBertModel(RetriBertPreTrainedModel):
|
|||||||
Indices of input sequence tokens in the vocabulary for the documents in a batch.
|
Indices of input sequence tokens in the vocabulary for the documents in a batch.
|
||||||
attention_mask_doc (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*):
|
attention_mask_doc (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*):
|
||||||
Mask to avoid performing attention on documents padding token indices.
|
Mask to avoid performing attention on documents padding token indices.
|
||||||
checkpoint_batch_size (`int`, *optional*, defaults to ```-1`):
|
checkpoint_batch_size (`int`, *optional*, defaults to `-1`):
|
||||||
If greater than 0, uses gradient checkpointing to only compute sequence representation on
|
If greater than 0, uses gradient checkpointing to only compute sequence representation on
|
||||||
`checkpoint_batch_size` examples at a time on the GPU. All query representations are still compared to
|
`checkpoint_batch_size` examples at a time on the GPU. All query representations are still compared to
|
||||||
all document representations in the batch.
|
all document representations in the batch.
|
||||||
|
|||||||
@@ -663,8 +663,8 @@ SPEECH_TO_TEXT_INPUTS_DOCSTRING = r"""
|
|||||||
|
|
||||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
|
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
|
||||||
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
|
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
|
||||||
``decoder_input_ids``` of shape `(batch_size, sequence_length)`. decoder_inputs_embeds (`torch.FloatTensor`
|
`decoder_input_ids` of shape `(batch_size, sequence_length)`. decoder_inputs_embeds (`torch.FloatTensor` of
|
||||||
of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||||
`decoder_input_ids` you can choose to directly pass an embedded representation. If `past_key_values` is
|
`decoder_input_ids` you can choose to directly pass an embedded representation. If `past_key_values` is
|
||||||
used, optionally only the last `decoder_inputs_embeds` have to be input (see `past_key_values`). This is
|
used, optionally only the last `decoder_inputs_embeds` have to be input (see `past_key_values`). This is
|
||||||
useful if you want more control over how to convert `decoder_input_ids` indices into associated vectors
|
useful if you want more control over how to convert `decoder_input_ids` indices into associated vectors
|
||||||
@@ -965,8 +965,8 @@ class Speech2TextDecoder(Speech2TextPreTrainedModel):
|
|||||||
|
|
||||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor`
|
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of
|
||||||
of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
||||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
||||||
embedding lookup matrix.
|
embedding lookup matrix.
|
||||||
|
|||||||
@@ -1002,11 +1002,11 @@ class TFSpeech2TextDecoder(tf.keras.layers.Layer):
|
|||||||
|
|
||||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of
|
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape
|
||||||
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids`
|
||||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
you can choose to directly pass an embedded representation. This is useful if you want more control
|
||||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
over how to convert `input_ids` indices into associated vectors than the model's internal embedding
|
||||||
embedding lookup matrix.
|
lookup matrix.
|
||||||
output_attentions (`bool`, *optional*):
|
output_attentions (`bool`, *optional*):
|
||||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
||||||
returned tensors for more detail.
|
returned tensors for more detail.
|
||||||
|
|||||||
@@ -572,8 +572,8 @@ class Speech2Text2Decoder(Speech2Text2PreTrainedModel):
|
|||||||
|
|
||||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
|
||||||
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
|
||||||
all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor`
|
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of
|
||||||
of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing
|
||||||
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more
|
||||||
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
control over how to convert `input_ids` indices into associated vectors than the model's internal
|
||||||
embedding lookup matrix.
|
embedding lookup matrix.
|
||||||
|
|||||||
@@ -90,11 +90,11 @@ XGLM_INPUTS_DOCSTRING = r"""
|
|||||||
blocks) that can be used (see `past_key_values` input) to speed up sequential decoding.
|
blocks) that can be used (see `past_key_values` input) to speed up sequential decoding.
|
||||||
|
|
||||||
If `past_key_values` are used, the user can optionally input only the last `input_ids` (those that don't
|
If `past_key_values` are used, the user can optionally input only the last `input_ids` (those that don't
|
||||||
have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
|
have their past key value states given to this model) of shape `(batch_size, 1)` instead of all `input_ids`
|
||||||
``input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape
|
of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `(batch_size,
|
||||||
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you
|
sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you can choose to
|
||||||
can choose to directly pass an embedded representation. This is useful if you want more control over how to
|
directly pass an embedded representation. This is useful if you want more control over how to convert
|
||||||
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix.
|
`input_ids` indices into associated vectors than the model's internal embedding lookup matrix.
|
||||||
inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
|
inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
|
||||||
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. If
|
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. If
|
||||||
`past_key_values` is used, optionally only the last `inputs_embeds` have to be input (see
|
`past_key_values` is used, optionally only the last `inputs_embeds` have to be input (see
|
||||||
|
|||||||
@@ -2136,7 +2136,7 @@ class {{cookiecutter.camelcase_modelname}}PreTrainedModel(PreTrainedModel):
|
|||||||
|
|
||||||
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids`
|
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids`
|
||||||
(those that don't have their past key value states given to this model) of shape `(batch_size, 1)`
|
(those that don't have their past key value states given to this model) of shape `(batch_size, 1)`
|
||||||
instead of all ``decoder_input_ids``` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert `input_ids` indices into associated
|
instead of all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert `input_ids` indices into associated
|
||||||
vectors than the model's internal embedding lookup matrix.
|
vectors than the model's internal embedding lookup matrix.
|
||||||
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
|
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
|
||||||
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
|
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
|
||||||
@@ -2483,7 +2483,7 @@ class {{cookiecutter.camelcase_modelname}}Decoder({{cookiecutter.camelcase_model
|
|||||||
|
|
||||||
If `past_key_values` are used, the user can optionally input only the last
|
If `past_key_values` are used, the user can optionally input only the last
|
||||||
`decoder_input_ids` (those that don't have their past key value states given to this model) of
|
`decoder_input_ids` (those that don't have their past key value states given to this model) of
|
||||||
shape `(batch_size, 1)` instead of all ``decoder_input_ids``` of shape `(batch_size,
|
shape `(batch_size, 1)` instead of all `decoder_input_ids` of shape `(batch_size,
|
||||||
sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert `input_ids` indices
|
sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert `input_ids` indices
|
||||||
into associated vectors than the model's internal embedding lookup matrix.
|
into associated vectors than the model's internal embedding lookup matrix.
|
||||||
output_attentions (`bool`, *optional*):
|
output_attentions (`bool`, *optional*):
|
||||||
|
|||||||
@@ -92,6 +92,9 @@ def process_doc_file(code_file, add_new_line=True):
|
|||||||
|
|
||||||
# fmt: off
|
# fmt: off
|
||||||
splits = code.split("```")
|
splits = code.split("```")
|
||||||
|
if len(splits) % 2 != 1:
|
||||||
|
raise ValueError("The number of occurrences of ``` should be an even number.")
|
||||||
|
|
||||||
splits = [s if i % 2 == 0 else process_code_block(s, add_new_line=add_new_line) for i, s in enumerate(splits)]
|
splits = [s if i % 2 == 0 else process_code_block(s, add_new_line=add_new_line) for i, s in enumerate(splits)]
|
||||||
clean_code = "```".join(splits)
|
clean_code = "```".join(splits)
|
||||||
# fmt: on
|
# fmt: on
|
||||||
|
|||||||
Reference in New Issue
Block a user