Removed attention_mask from GPT-2 and GPT documentation. Corrected multiple_choice_labels to actual name mc_labels
This commit is contained in:
@@ -408,10 +408,6 @@ GPT2_INPUTS_DOCSTRING = r""" Inputs:
|
|||||||
list of ``torch.FloatTensor`` (one for each layer):
|
list of ``torch.FloatTensor`` (one for each layer):
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
||||||
(see `past` output below). Can be used to speed up sequential decoding.
|
(see `past` output below). Can be used to speed up sequential decoding.
|
||||||
**attention_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
|
|
||||||
Mask to avoid performing attention on padding token indices.
|
|
||||||
Mask values selected in ``[0, 1]``:
|
|
||||||
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
|
|
||||||
**head_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(num_heads,)`` or ``(num_layers, num_heads)``:
|
**head_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(num_heads,)`` or ``(num_layers, num_heads)``:
|
||||||
Mask to nullify selected heads of the self-attention modules.
|
Mask to nullify selected heads of the self-attention modules.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
@@ -642,10 +638,6 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel):
|
|||||||
list of ``torch.FloatTensor`` (one for each layer):
|
list of ``torch.FloatTensor`` (one for each layer):
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
||||||
(see `past` output below). Can be used to speed up sequential decoding.
|
(see `past` output below). Can be used to speed up sequential decoding.
|
||||||
**attention_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, num_choices, sequence_length)``:
|
|
||||||
Mask to avoid performing attention on padding token indices.
|
|
||||||
Mask values selected in ``[0, 1]``:
|
|
||||||
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
|
|
||||||
**head_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(num_heads,)`` or ``(num_layers, num_heads)``:
|
**head_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(num_heads,)`` or ``(num_layers, num_heads)``:
|
||||||
Mask to nullify selected heads of the self-attention modules.
|
Mask to nullify selected heads of the self-attention modules.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
@@ -656,7 +648,7 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel):
|
|||||||
Indices are selected in ``[-1, 0, ..., config.vocab_size]``
|
Indices are selected in ``[-1, 0, ..., config.vocab_size]``
|
||||||
All labels set to ``-1`` are ignored (masked), the loss is only
|
All labels set to ``-1`` are ignored (masked), the loss is only
|
||||||
computed for labels in ``[0, ..., config.vocab_size]``
|
computed for labels in ``[0, ..., config.vocab_size]``
|
||||||
**multiple_choice_labels**: (`optional`) ``torch.LongTensor`` of shape ``(batch_size)``:
|
**mc_labels**: (`optional`) ``torch.LongTensor`` of shape ``(batch_size)``:
|
||||||
Labels for computing the multiple choice classification loss.
|
Labels for computing the multiple choice classification loss.
|
||||||
Indices should be in ``[0, ..., num_choices]`` where `num_choices` is the size of the second dimension
|
Indices should be in ``[0, ..., num_choices]`` where `num_choices` is the size of the second dimension
|
||||||
of the input tensors. (see `input_ids` above)
|
of the input tensors. (see `input_ids` above)
|
||||||
|
|||||||
@@ -415,11 +415,7 @@ OPENAI_GPT_INPUTS_DOCSTRING = r""" Inputs:
|
|||||||
**token_type_ids**: (`optional`) ``torch.LongTensor`` of shape ``(batch_size, sequence_length)``:
|
**token_type_ids**: (`optional`) ``torch.LongTensor`` of shape ``(batch_size, sequence_length)``:
|
||||||
A parallel sequence of tokens (can be used to indicate various portions of the inputs).
|
A parallel sequence of tokens (can be used to indicate various portions of the inputs).
|
||||||
The embeddings from these tokens will be summed with the respective token embeddings.
|
The embeddings from these tokens will be summed with the respective token embeddings.
|
||||||
Indices are selected in the vocabulary (unlike BERT which has a specific vocabulary for segment indices).
|
Indices are selected in the vocabulary (unlike BERT which has a specific vocabulary for segment indices)
|
||||||
**attention_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
|
|
||||||
Mask to avoid performing attention on padding token indices.
|
|
||||||
Mask values selected in ``[0, 1]``:
|
|
||||||
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
|
|
||||||
**head_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(num_heads,)`` or ``(num_layers, num_heads)``:
|
**head_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(num_heads,)`` or ``(num_layers, num_heads)``:
|
||||||
Mask to nullify selected heads of the self-attention modules.
|
Mask to nullify selected heads of the self-attention modules.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
@@ -626,10 +622,6 @@ class OpenAIGPTDoubleHeadsModel(OpenAIGPTPreTrainedModel):
|
|||||||
A parallel sequence of tokens (can be used to indicate various portions of the inputs).
|
A parallel sequence of tokens (can be used to indicate various portions of the inputs).
|
||||||
The embeddings from these tokens will be summed with the respective token embeddings.
|
The embeddings from these tokens will be summed with the respective token embeddings.
|
||||||
Indices are selected in the vocabulary (unlike BERT which has a specific vocabulary for segment indices).
|
Indices are selected in the vocabulary (unlike BERT which has a specific vocabulary for segment indices).
|
||||||
**attention_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, num_choices, sequence_length)``:
|
|
||||||
Mask to avoid performing attention on padding token indices.
|
|
||||||
Mask values selected in ``[0, 1]``:
|
|
||||||
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
|
|
||||||
**head_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(num_heads,)`` or ``(num_layers, num_heads)``:
|
**head_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(num_heads,)`` or ``(num_layers, num_heads)``:
|
||||||
Mask to nullify selected heads of the self-attention modules.
|
Mask to nullify selected heads of the self-attention modules.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
@@ -640,7 +632,7 @@ class OpenAIGPTDoubleHeadsModel(OpenAIGPTPreTrainedModel):
|
|||||||
Indices are selected in ``[-1, 0, ..., config.vocab_size]``
|
Indices are selected in ``[-1, 0, ..., config.vocab_size]``
|
||||||
All labels set to ``-1`` are ignored (masked), the loss is only
|
All labels set to ``-1`` are ignored (masked), the loss is only
|
||||||
computed for labels in ``[0, ..., config.vocab_size]``
|
computed for labels in ``[0, ..., config.vocab_size]``
|
||||||
**multiple_choice_labels**: (`optional`) ``torch.LongTensor`` of shape ``(batch_size)``:
|
**mc_labels**: (`optional`) ``torch.LongTensor`` of shape ``(batch_size)``:
|
||||||
Labels for computing the multiple choice classification loss.
|
Labels for computing the multiple choice classification loss.
|
||||||
Indices should be in ``[0, ..., num_choices]`` where `num_choices` is the size of the second dimension
|
Indices should be in ``[0, ..., num_choices]`` where `num_choices` is the size of the second dimension
|
||||||
of the input tensors. (see `input_ids` above)
|
of the input tensors. (see `input_ids` above)
|
||||||
|
|||||||
Reference in New Issue
Block a user