fix docstrings
This commit is contained in:
@@ -626,13 +626,13 @@ class BertModel(BertPreTrainedModel):
|
|||||||
Outputs: `Tuple` comprising various elements depending on the configuration (config) and inputs:
|
Outputs: `Tuple` comprising various elements depending on the configuration (config) and inputs:
|
||||||
**last_hidden_state**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, hidden_size)``
|
**last_hidden_state**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, hidden_size)``
|
||||||
Sequence of hidden-states at the last layer of the model.
|
Sequence of hidden-states at the last layer of the model.
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -737,13 +737,13 @@ class BertForPreTraining(BertPreTrainedModel):
|
|||||||
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
||||||
**seq_relationship_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, 2)``
|
**seq_relationship_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, 2)``
|
||||||
Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation before SoftMax).
|
Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation before SoftMax).
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -807,13 +807,13 @@ class BertForMaskedLM(BertPreTrainedModel):
|
|||||||
Masked language modeling loss.
|
Masked language modeling loss.
|
||||||
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
|
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
|
||||||
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -874,13 +874,13 @@ class BertForNextSentencePrediction(BertPreTrainedModel):
|
|||||||
Next sequence prediction (classification) loss.
|
Next sequence prediction (classification) loss.
|
||||||
**seq_relationship_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, 2)``
|
**seq_relationship_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, 2)``
|
||||||
Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation before SoftMax).
|
Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation before SoftMax).
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -934,13 +934,13 @@ class BertForSequenceClassification(BertPreTrainedModel):
|
|||||||
Classification (or regression if config.num_labels==1) loss.
|
Classification (or regression if config.num_labels==1) loss.
|
||||||
**logits**: ``torch.FloatTensor`` of shape ``(batch_size, config.num_labels)``
|
**logits**: ``torch.FloatTensor`` of shape ``(batch_size, config.num_labels)``
|
||||||
Classification (or regression if config.num_labels==1) scores (before SoftMax).
|
Classification (or regression if config.num_labels==1) scores (before SoftMax).
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -1040,13 +1040,13 @@ class BertForMultipleChoice(BertPreTrainedModel):
|
|||||||
**classification_scores**: ``torch.FloatTensor`` of shape ``(batch_size, num_choices)`` where `num_choices` is the size of the second dimension
|
**classification_scores**: ``torch.FloatTensor`` of shape ``(batch_size, num_choices)`` where `num_choices` is the size of the second dimension
|
||||||
of the input tensors. (see `input_ids` above).
|
of the input tensors. (see `input_ids` above).
|
||||||
Classification scores (before SoftMax).
|
Classification scores (before SoftMax).
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -1110,13 +1110,13 @@ class BertForTokenClassification(BertPreTrainedModel):
|
|||||||
Classification loss.
|
Classification loss.
|
||||||
**scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.num_labels)``
|
**scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.num_labels)``
|
||||||
Classification scores (before SoftMax).
|
Classification scores (before SoftMax).
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -1186,13 +1186,13 @@ class BertForQuestionAnswering(BertPreTrainedModel):
|
|||||||
Span-start scores (before SoftMax).
|
Span-start scores (before SoftMax).
|
||||||
**end_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length,)``
|
**end_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length,)``
|
||||||
Span-end scores (before SoftMax).
|
Span-end scores (before SoftMax).
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
|
|||||||
@@ -423,13 +423,13 @@ class GPT2Model(GPT2PreTrainedModel):
|
|||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks).
|
that contains pre-computed hidden-states (key and values in the attention blocks).
|
||||||
Can be used (see `past` input) to speed up sequential decoding.
|
Can be used (see `past` input) to speed up sequential decoding.
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -557,13 +557,13 @@ class GPT2LMHeadModel(GPT2PreTrainedModel):
|
|||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks).
|
that contains pre-computed hidden-states (key and values in the attention blocks).
|
||||||
Can be used (see `past` input) to speed up sequential decoding.
|
Can be used (see `past` input) to speed up sequential decoding.
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -673,13 +673,13 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel):
|
|||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks).
|
that contains pre-computed hidden-states (key and values in the attention blocks).
|
||||||
Can be used (see `past` input) to speed up sequential decoding.
|
Can be used (see `past` input) to speed up sequential decoding.
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
|
|||||||
@@ -429,13 +429,13 @@ class OpenAIGPTModel(OpenAIGPTPreTrainedModel):
|
|||||||
Outputs: `Tuple` comprising various elements depending on the configuration (config) and inputs:
|
Outputs: `Tuple` comprising various elements depending on the configuration (config) and inputs:
|
||||||
**last_hidden_state**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, hidden_size)``
|
**last_hidden_state**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, hidden_size)``
|
||||||
Sequence of hidden-states at the last layer of the model.
|
Sequence of hidden-states at the last layer of the model.
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -548,13 +548,13 @@ class OpenAIGPTLMHeadModel(OpenAIGPTPreTrainedModel):
|
|||||||
Language modeling loss.
|
Language modeling loss.
|
||||||
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
|
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
|
||||||
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -655,13 +655,13 @@ class OpenAIGPTDoubleHeadsModel(OpenAIGPTPreTrainedModel):
|
|||||||
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
||||||
**mc_prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, num_choices)``
|
**mc_prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, num_choices)``
|
||||||
Prediction scores of the multiplechoice classification head (scores for each choice before SoftMax).
|
Prediction scores of the multiplechoice classification head (scores for each choice before SoftMax).
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
|
|||||||
@@ -958,13 +958,13 @@ class TransfoXLModel(TransfoXLPreTrainedModel):
|
|||||||
list of ``torch.FloatTensor`` (one for each layer):
|
list of ``torch.FloatTensor`` (one for each layer):
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
||||||
(see `mems` input above). Can be used to speed up sequential decoding and attend to longer context.
|
(see `mems` input above). Can be used to speed up sequential decoding and attend to longer context.
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -1274,13 +1274,13 @@ class TransfoXLLMHeadModel(TransfoXLPreTrainedModel):
|
|||||||
list of ``torch.FloatTensor`` (one for each layer):
|
list of ``torch.FloatTensor`` (one for each layer):
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
||||||
(see `mems` input above). Can be used to speed up sequential decoding and attend to longer context.
|
(see `mems` input above). Can be used to speed up sequential decoding and attend to longer context.
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
|
|||||||
@@ -462,13 +462,13 @@ class XLMModel(XLMPreTrainedModel):
|
|||||||
Outputs: `Tuple` comprising various elements depending on the configuration (config) and inputs:
|
Outputs: `Tuple` comprising various elements depending on the configuration (config) and inputs:
|
||||||
**last_hidden_state**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, hidden_size)``
|
**last_hidden_state**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, hidden_size)``
|
||||||
Sequence of hidden-states at the last layer of the model.
|
Sequence of hidden-states at the last layer of the model.
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -735,13 +735,13 @@ class XLMWithLMHeadModel(XLMPreTrainedModel):
|
|||||||
Language modeling loss.
|
Language modeling loss.
|
||||||
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
|
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
|
||||||
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -795,13 +795,13 @@ class XLMForSequenceClassification(XLMPreTrainedModel):
|
|||||||
Classification (or regression if config.num_labels==1) loss.
|
Classification (or regression if config.num_labels==1) loss.
|
||||||
**logits**: ``torch.FloatTensor`` of shape ``(batch_size, config.num_labels)``
|
**logits**: ``torch.FloatTensor`` of shape ``(batch_size, config.num_labels)``
|
||||||
Classification (or regression if config.num_labels==1) scores (before SoftMax).
|
Classification (or regression if config.num_labels==1) scores (before SoftMax).
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -875,13 +875,13 @@ class XLMForQuestionAnswering(XLMPreTrainedModel):
|
|||||||
Span-start scores (before SoftMax).
|
Span-start scores (before SoftMax).
|
||||||
**end_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length,)``
|
**end_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length,)``
|
||||||
Span-end scores (before SoftMax).
|
Span-end scores (before SoftMax).
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
|
|||||||
@@ -702,13 +702,13 @@ class XLNetModel(XLNetPreTrainedModel):
|
|||||||
list of ``torch.FloatTensor`` (one for each layer):
|
list of ``torch.FloatTensor`` (one for each layer):
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
||||||
(see `mems` input above). Can be used to speed up sequential decoding and attend to longer context.
|
(see `mems` input above). Can be used to speed up sequential decoding and attend to longer context.
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -1009,13 +1009,13 @@ class XLNetLMHeadModel(XLNetPreTrainedModel):
|
|||||||
list of ``torch.FloatTensor`` (one for each layer):
|
list of ``torch.FloatTensor`` (one for each layer):
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
||||||
(see `mems` input above). Can be used to speed up sequential decoding and attend to longer context.
|
(see `mems` input above). Can be used to speed up sequential decoding and attend to longer context.
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -1090,13 +1090,13 @@ class XLNetForSequenceClassification(XLNetPreTrainedModel):
|
|||||||
list of ``torch.FloatTensor`` (one for each layer):
|
list of ``torch.FloatTensor`` (one for each layer):
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
||||||
(see `mems` input above). Can be used to speed up sequential decoding and attend to longer context.
|
(see `mems` input above). Can be used to speed up sequential decoding and attend to longer context.
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -1190,13 +1190,13 @@ class XLNetForQuestionAnswering(XLNetPreTrainedModel):
|
|||||||
list of ``torch.FloatTensor`` (one for each layer):
|
list of ``torch.FloatTensor`` (one for each layer):
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
||||||
(see `mems` input above). Can be used to speed up sequential decoding and attend to longer context.
|
(see `mems` input above). Can be used to speed up sequential decoding and attend to longer context.
|
||||||
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
|
||||||
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
|
||||||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
|
||||||
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
|
||||||
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
|
||||||
of shape ``(batch_size, sequence_length, hidden_size)``:
|
of shape ``(batch_size, sequence_length, hidden_size)``:
|
||||||
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
|
||||||
|
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
|
||||||
|
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
|
||||||
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user