Clarify the use of past in GPT2 and CTRL
This commit is contained in:
@@ -220,7 +220,8 @@ CTRL_INPUTS_DOCSTRING = r""" Inputs:
|
|||||||
**past**:
|
**past**:
|
||||||
list of ``torch.FloatTensor`` (one for each layer):
|
list of ``torch.FloatTensor`` (one for each layer):
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
||||||
(see `past` output below). Can be used to speed up sequential decoding.
|
(see `past` output below). Can be used to speed up sequential decoding. The token ids which have their past given to this model
|
||||||
|
should not be passed as input ids as they have already been computed.
|
||||||
**attention_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
|
**attention_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
|
||||||
Mask to avoid performing attention on padding token indices.
|
Mask to avoid performing attention on padding token indices.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
@@ -252,7 +253,8 @@ class CTRLModel(CTRLPreTrainedModel):
|
|||||||
**past**:
|
**past**:
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks).
|
that contains pre-computed hidden-states (key and values in the attention blocks).
|
||||||
Can be used (see `past` input) to speed up sequential decoding.
|
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
|
||||||
|
should not be passed as input ids as they have already been computed.
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
@@ -437,7 +439,8 @@ class CTRLLMHeadModel(CTRLPreTrainedModel):
|
|||||||
**past**:
|
**past**:
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks).
|
that contains pre-computed hidden-states (key and values in the attention blocks).
|
||||||
Can be used (see `past` input) to speed up sequential decoding.
|
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
|
||||||
|
should not be passed as input ids as they have already been computed.
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
|
|||||||
@@ -298,7 +298,8 @@ GPT2_INPUTS_DOCSTRING = r""" Inputs:
|
|||||||
**past**:
|
**past**:
|
||||||
list of ``torch.FloatTensor`` (one for each layer):
|
list of ``torch.FloatTensor`` (one for each layer):
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
||||||
(see `past` output below). Can be used to speed up sequential decoding.
|
(see `past` output below). Can be used to speed up sequential decoding. The token ids which have their past given to this model
|
||||||
|
should not be passed as input ids as they have already been computed.
|
||||||
**attention_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
|
**attention_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
|
||||||
Mask to avoid performing attention on padding token indices.
|
Mask to avoid performing attention on padding token indices.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
@@ -330,7 +331,8 @@ class GPT2Model(GPT2PreTrainedModel):
|
|||||||
**past**:
|
**past**:
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks).
|
that contains pre-computed hidden-states (key and values in the attention blocks).
|
||||||
Can be used (see `past` input) to speed up sequential decoding.
|
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
|
||||||
|
should not be passed as input ids as they have already been computed.
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
@@ -503,7 +505,8 @@ class GPT2LMHeadModel(GPT2PreTrainedModel):
|
|||||||
**past**:
|
**past**:
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks).
|
that contains pre-computed hidden-states (key and values in the attention blocks).
|
||||||
Can be used (see `past` input) to speed up sequential decoding.
|
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
|
||||||
|
should not be passed as input ids as they have already been computed.
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
@@ -595,7 +598,8 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel):
|
|||||||
**past**:
|
**past**:
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks).
|
that contains pre-computed hidden-states (key and values in the attention blocks).
|
||||||
Can be used (see `past` input) to speed up sequential decoding.
|
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
|
||||||
|
should not be passed as input ids as they have already been computed.
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
|
|||||||
Reference in New Issue
Block a user