corrected documentation for past tensor shape for ctrl and gpt2 model
This commit is contained in:
@@ -252,7 +252,7 @@ class CTRLModel(CTRLPreTrainedModel):
|
|||||||
**last_hidden_state**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, hidden_size)``
|
**last_hidden_state**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, hidden_size)``
|
||||||
Sequence of hidden-states at the last layer of the model.
|
Sequence of hidden-states at the last layer of the model.
|
||||||
**past**:
|
**past**:
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(2, batch_size, num_heads, sequence_length, embed_size_per_head)``:
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks).
|
that contains pre-computed hidden-states (key and values in the attention blocks).
|
||||||
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
|
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
|
||||||
should not be passed as input ids as they have already been computed.
|
should not be passed as input ids as they have already been computed.
|
||||||
@@ -438,7 +438,7 @@ class CTRLLMHeadModel(CTRLPreTrainedModel):
|
|||||||
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
|
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
|
||||||
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
||||||
**past**:
|
**past**:
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(2, batch_size, num_heads, sequence_length, embed_size_per_head)``:
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks).
|
that contains pre-computed hidden-states (key and values in the attention blocks).
|
||||||
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
|
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
|
||||||
should not be passed as input ids as they have already been computed.
|
should not be passed as input ids as they have already been computed.
|
||||||
|
|||||||
@@ -329,7 +329,7 @@ class GPT2Model(GPT2PreTrainedModel):
|
|||||||
**last_hidden_state**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, hidden_size)``
|
**last_hidden_state**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, hidden_size)``
|
||||||
Sequence of hidden-states at the last layer of the model.
|
Sequence of hidden-states at the last layer of the model.
|
||||||
**past**:
|
**past**:
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(2, batch_size, num_heads, sequence_length, embed_size_per_head)``:
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks).
|
that contains pre-computed hidden-states (key and values in the attention blocks).
|
||||||
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
|
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
|
||||||
should not be passed as input ids as they have already been computed.
|
should not be passed as input ids as they have already been computed.
|
||||||
@@ -503,7 +503,7 @@ class GPT2LMHeadModel(GPT2PreTrainedModel):
|
|||||||
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
|
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
|
||||||
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
||||||
**past**:
|
**past**:
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(2, batch_size, num_heads, sequence_length, embed_size_per_head)``:
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks).
|
that contains pre-computed hidden-states (key and values in the attention blocks).
|
||||||
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
|
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
|
||||||
should not be passed as input ids as they have already been computed.
|
should not be passed as input ids as they have already been computed.
|
||||||
@@ -596,7 +596,7 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel):
|
|||||||
**mc_prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, num_choices)``
|
**mc_prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, num_choices)``
|
||||||
Prediction scores of the multiplechoice classification head (scores for each choice before SoftMax).
|
Prediction scores of the multiplechoice classification head (scores for each choice before SoftMax).
|
||||||
**past**:
|
**past**:
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(2, batch_size, num_heads, sequence_length, embed_size_per_head)``:
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks).
|
that contains pre-computed hidden-states (key and values in the attention blocks).
|
||||||
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
|
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
|
||||||
should not be passed as input ids as they have already been computed.
|
should not be passed as input ids as they have already been computed.
|
||||||
|
|||||||
@@ -400,7 +400,7 @@ class TFCTRLModel(TFCTRLPreTrainedModel):
|
|||||||
**last_hidden_state**: ``tf.Tensor`` of shape ``(batch_size, sequence_length, hidden_size)``
|
**last_hidden_state**: ``tf.Tensor`` of shape ``(batch_size, sequence_length, hidden_size)``
|
||||||
Sequence of hidden-states at the last layer of the model.
|
Sequence of hidden-states at the last layer of the model.
|
||||||
**past**:
|
**past**:
|
||||||
list of ``tf.Tensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
list of ``tf.Tensor`` (one for each layer) of shape ``(2, batch_size, num_heads, sequence_length, embed_size_per_head)``:
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks).
|
that contains pre-computed hidden-states (key and values in the attention blocks).
|
||||||
Can be used (see `past` input) to speed up sequential decoding.
|
Can be used (see `past` input) to speed up sequential decoding.
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
@@ -462,7 +462,7 @@ class TFCTRLLMHeadModel(TFCTRLPreTrainedModel):
|
|||||||
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
|
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
|
||||||
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
||||||
**past**:
|
**past**:
|
||||||
list of ``tf.Tensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
list of ``tf.Tensor`` (one for each layer) of shape ``(2, batch_size, num_heads, sequence_length, embed_size_per_head)``:
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks).
|
that contains pre-computed hidden-states (key and values in the attention blocks).
|
||||||
Can be used (see `past` input) to speed up sequential decoding.
|
Can be used (see `past` input) to speed up sequential decoding.
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
|
|||||||
@@ -436,7 +436,7 @@ class TFGPT2Model(TFGPT2PreTrainedModel):
|
|||||||
**last_hidden_state**: ``tf.Tensor`` of shape ``(batch_size, sequence_length, hidden_size)``
|
**last_hidden_state**: ``tf.Tensor`` of shape ``(batch_size, sequence_length, hidden_size)``
|
||||||
Sequence of hidden-states at the last layer of the model.
|
Sequence of hidden-states at the last layer of the model.
|
||||||
**past**:
|
**past**:
|
||||||
list of ``tf.Tensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
list of ``tf.Tensor`` (one for each layer) of shape ``(2, batch_size, num_heads, sequence_length, embed_size_per_head)``:
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks).
|
that contains pre-computed hidden-states (key and values in the attention blocks).
|
||||||
Can be used (see `past` input) to speed up sequential decoding.
|
Can be used (see `past` input) to speed up sequential decoding.
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
@@ -476,7 +476,7 @@ class TFGPT2LMHeadModel(TFGPT2PreTrainedModel):
|
|||||||
**prediction_scores**: `tf.Tensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
|
**prediction_scores**: `tf.Tensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
|
||||||
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
||||||
**past**:
|
**past**:
|
||||||
list of `tf.Tensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
list of `tf.Tensor`` (one for each layer) of shape ``(2, batch_size, num_heads, sequence_length, embed_size_per_head)``:
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks).
|
that contains pre-computed hidden-states (key and values in the attention blocks).
|
||||||
Can be used (see `past` input) to speed up sequential decoding.
|
Can be used (see `past` input) to speed up sequential decoding.
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
@@ -535,7 +535,7 @@ class TFGPT2DoubleHeadsModel(TFGPT2PreTrainedModel):
|
|||||||
**mc_prediction_scores**: `tf.Tensor`` of shape ``(batch_size, num_choices)``
|
**mc_prediction_scores**: `tf.Tensor`` of shape ``(batch_size, num_choices)``
|
||||||
Prediction scores of the multiplechoice classification head (scores for each choice before SoftMax).
|
Prediction scores of the multiplechoice classification head (scores for each choice before SoftMax).
|
||||||
**past**:
|
**past**:
|
||||||
list of `tf.Tensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
list of `tf.Tensor`` (one for each layer) of shape ``(2, batch_size, num_heads, sequence_length, embed_size_per_head)``:
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks).
|
that contains pre-computed hidden-states (key and values in the attention blocks).
|
||||||
Can be used (see `past` input) to speed up sequential decoding.
|
Can be used (see `past` input) to speed up sequential decoding.
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
|
|||||||
Reference in New Issue
Block a user