Clean RAG docs and template docs (#7348)
* Clean RAG docs and template docs * Fix typo * Better doc
This commit is contained in:
@@ -4,11 +4,14 @@ RAG
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Retrieval-augmented generation ("RAG") models combine the powers of pretrained dense retrieval (DPR) and Seq2Seq models.
|
||||
RAG models retrieve docs, pass them to a seq2seq model, then marginalize to generate outputs.
|
||||
The retriever and seq2seq modules are initialized from pretrained models, and fine-tuned jointly, allowing both retrieval and generation to adapt to downstream tasks.
|
||||
Retrieval-augmented generation ("RAG") models combine the powers of pretrained dense retrieval (DPR) and
|
||||
sequence-to-sequence models. RAG models retrieve documents, pass them to a seq2seq model, then marginalize to generate
|
||||
outputs. The retriever and seq2seq modules are initialized from pretrained models, and fine-tuned jointly, allowing
|
||||
both retrieval and generation to adapt to downstream tasks.
|
||||
|
||||
It is based on the paper `Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks <https://arxiv.org/abs/2005.11401>`__ by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
|
||||
It is based on the paper `Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
|
||||
<https://arxiv.org/abs/2005.11401>`__ by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir
|
||||
Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
@@ -47,7 +50,7 @@ RagTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.RagTokenizer
|
||||
:members:
|
||||
:members: prepare_seq2seq_batch
|
||||
|
||||
|
||||
Rag specific outputs
|
||||
|
||||
@@ -38,35 +38,39 @@ RAG_CONFIG_DOC = r"""
|
||||
retrieval_vector_size (:obj:`int`, `optional`, defaults to 768):
|
||||
Dimensionality of the document embeddings indexed by :class:`~transformers.RagRetriever`.
|
||||
retrieval_batch_size (:obj:`int`, `optional`, defaults to 8):
|
||||
Retrieval batch size, defined as the number of queries issues concurrently to the faiss index excapsulated :class:`~transformers.RagRetriever`.
|
||||
Retrieval batch size, defined as the number of queries issues concurrently to the faiss index excapsulated
|
||||
:class:`~transformers.RagRetriever`.
|
||||
dataset (:obj:`str`, `optional`, defaults to :obj:`"wiki_dpr"`):
|
||||
A datatset identifier of the indexed dataset on HuggingFace AWS bucket (list all available datasets and ids using :obj:`datasets.list_datasets()`).
|
||||
dataset_split (:obj:`str`, `optional`, defaults to :obj:`train`)
|
||||
Which split of the ``dataset`` to load.
|
||||
index_name (:obj:`str`, `optional`, defaults to :obj:`compressed`)
|
||||
The index_name of the index associated with the :obj:`dataset`. One can choose between :obj:`legacy`, :obj:`exact` and :obj:`compressed`.
|
||||
A dataset identifier of the indexed dataset on HuggingFace AWS bucket (list all available datasets and
|
||||
ids using :obj:`datasets.list_datasets()`).
|
||||
dataset_split (:obj:`str`, `optional`, defaults to :obj:`"train"`)
|
||||
Which split of the :obj:`dataset` to load.
|
||||
index_name (:obj:`str`, `optional`, defaults to :obj:`"compressed"`)
|
||||
The index name of the index associated with the :obj:`dataset`. One can choose between :obj:`"legacy"`,
|
||||
:obj:`"exact"` and :obj:`"compressed"`.
|
||||
index_path (:obj:`str`, `optional`)
|
||||
The path to the serialized faiss index on disk.
|
||||
passages_path: (:obj:`str`, `optional`):
|
||||
A path to text passages compatible with the faiss index. Required if using :class:`~transformers.retrieval_rag.LegacyIndex`
|
||||
A path to text passages compatible with the faiss index. Required if using
|
||||
:class:`~transformers.retrieval_rag.LegacyIndex`
|
||||
use_dummy_dataset (:obj:`bool`, `optional`, defaults to ``False``)
|
||||
Whether to load a "dummy" variant of the dataset specified by :obj:`dataset`.
|
||||
label_smoothing (:obj:`float`, `optional`, defaults to 0.0):
|
||||
Only relevant if ``return_loss`` is set to :obj:`True`. Controls the ``epsilon`` parameter value for label smoothing in the loss calculation.
|
||||
If set to ``0.0``, no label smoothing is performed.
|
||||
Only relevant if ``return_loss`` is set to :obj:`True`. Controls the ``epsilon`` parameter value for label
|
||||
smoothing in the loss calculation. If set to 0, no label smoothing is performed.
|
||||
do_marginalize (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||
If :obj:`True`, the logits are marginalized over all documents
|
||||
by making use of ``torch.nn.functional.log_softmax``.
|
||||
reduce_loss (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||
If :obj:`True`, the NLL loss is reduced using the ``torch.Tensor.sum`` operation.
|
||||
Whether or not to reduce the NLL loss using the ``torch.Tensor.sum`` operation.
|
||||
do_deduplication (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
||||
Controls whether we want to deduplicate the generations from different context documents for a given input.
|
||||
Whether or not to deduplicate the generations from different context documents for a given input.
|
||||
Has to be set to :obj:`False` if used while training with distributed backend.
|
||||
exclude_bos_score (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||
If :obj:`True`, the score of the BOS token is disregarded when computing
|
||||
the loss.
|
||||
Whether or not to disregard the BOS token when computing the loss.
|
||||
output_retrieved(:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||
If set to ``True``, :obj:`retrieved_doc_embeds`, :obj:`retrieved_doc_ids`, :obj:`context_input_ids` and :obj:`context_attention_mask` are returned. See returned tensors for more detail.
|
||||
If set to ``True``, :obj:`retrieved_doc_embeds`, :obj:`retrieved_doc_ids`, :obj:`context_input_ids` and
|
||||
:obj:`context_attention_mask` are returned. See returned tensors for more detail.
|
||||
"""
|
||||
|
||||
|
||||
|
||||
@@ -45,66 +45,63 @@ class RetrievAugLMMarginOutput(ModelOutput):
|
||||
Prediction scores of the language modeling head.
|
||||
The score is possibly marginalized over all documents for each vocabulary token.
|
||||
doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`):
|
||||
Score between each retrieved document embeddigs
|
||||
(see :obj:`retrieved_doc_embeds`) and :obj:`question_encoder_last_hidden_state`.
|
||||
Score between each retrieved document embeddings (see :obj:`retrieved_doc_embeds`) and
|
||||
:obj:`question_encoder_last_hidden_state`.
|
||||
past_key_values (:obj:`List[torch.FloatTensor]`, `optional`, returned when ``use_cache=True`` is passed or when ``config.use_cache=True``):
|
||||
List of :obj:`torch.FloatTensor` of length :obj:`config.n_layers`, with each tensor of shape
|
||||
:obj:`(2, batch_size, num_heads, sequence_length, embed_size_per_head)`).
|
||||
|
||||
Contains pre-computed hidden-states (key and values in the attention blocks)
|
||||
of the decoder that can be used (see ``past_key_values`` input) to
|
||||
speed up sequential decoding.
|
||||
Contains precomputed hidden-states (key and values in the attention blocks) of the decoder that can be used
|
||||
(see ``past_key_values`` input) to speed up sequential decoding.
|
||||
retrieved_doc_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs, hidden_size)`, `optional`, returned when `output_retrieved=True`):
|
||||
Embedded documents retrieved by the retriever.
|
||||
Is used with ``question_encoder_last_hidden_state`` to compute
|
||||
the ``doc_scores``.
|
||||
Is used with ``question_encoder_last_hidden_state`` to compute the ``doc_scores``.
|
||||
retrieved_doc_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, config.n_docs)`, `optional`, returned when `output_retrieved=True`):
|
||||
The indexes of the embedded documents retrieved by the retriever.
|
||||
context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
||||
Input ids post-processed from the retrieved documents
|
||||
and the question encoder input_ids by the retriever.
|
||||
context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
||||
Attention mask post-processed from the retrieved documents
|
||||
and the question encoder input_ids by the retriever.
|
||||
Attention mask post-processed from the retrieved documents and the question encoder :obj:`input_ids` by the
|
||||
retriever.
|
||||
question_encoder_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
|
||||
Sequence of hidden-states at the output of the last layer
|
||||
of the question encoder pooled output of the model.
|
||||
Sequence of hidden states at the output of the last layer of the question encoder pooled output of the
|
||||
model.
|
||||
question_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings
|
||||
+ one for the output of each layer)
|
||||
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings and one for the output of each
|
||||
layer) of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||
|
||||
Hidden-states of the question encoder at the output of each layer plus the initial embedding outputs.
|
||||
Hidden states of the question encoder at the output of each layer plus the initial embedding outputs.
|
||||
question_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
|
||||
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
|
||||
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
|
||||
|
||||
Attentions weights of the question encoder, after the attention softmax, used to compute the weighted average in the
|
||||
self-attention heads.
|
||||
Attentions weights of the question encoder, after the attention softmax, used to compute the weighted
|
||||
average in the self-attention heads.
|
||||
generator_enc_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
|
||||
Sequence of hidden-states at the output of the last layer of the generator encoder of the model.
|
||||
generator_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
|
||||
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings and one for the output of each
|
||||
layer) of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||
|
||||
Hidden-states of the generator encoder at the output of each layer plus the initial embedding outputs.
|
||||
Hidden states of the generator encoder at the output of each layer plus the initial embedding outputs.
|
||||
generator_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
|
||||
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
|
||||
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
|
||||
|
||||
Attentions weights of the generator encoder, after the attention softmax, used to compute the weighted average in the
|
||||
self-attention heads.
|
||||
Attentions weights of the generator encoder, after the attention softmax, used to compute the weighted
|
||||
average in the self-attention heads.
|
||||
generator_dec_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
|
||||
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings and one for the output of each
|
||||
layer) of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||
|
||||
Hidden-states of the generator decoder at the output of each layer plus the initial embedding outputs.
|
||||
Hidden states of the generator decoder at the output of each layer plus the initial embedding outputs.
|
||||
generator_dec_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
|
||||
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
|
||||
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
|
||||
|
||||
Attentions weights of the generator decoder, after the attention softmax, used to compute the weighted average in the
|
||||
self-attention heads.
|
||||
Attentions weights of the generator decoder, after the attention softmax, used to compute the weighted
|
||||
average in the self-attention heads.
|
||||
"""
|
||||
|
||||
loss: Optional[torch.FloatTensor] = None
|
||||
@@ -133,14 +130,14 @@ class RetrievAugLMOutput(ModelOutput):
|
||||
Prediction scores of the language modeling head.
|
||||
The score is possibly marginalized over all documents for each vocabulary token.
|
||||
doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`):
|
||||
Score between each retrieved document embeddigs (see :obj:`retrieved_doc_embeds`) and :obj:`question_encoder_last_hidden_state`.
|
||||
Score between each retrieved document embeddings (see :obj:`retrieved_doc_embeds`) and
|
||||
:obj:`question_encoder_last_hidden_state`.
|
||||
past_key_values (:obj:`List[torch.FloatTensor]`, `optional`, returned when ``use_cache=True`` is passed or when ``config.use_cache=True``):
|
||||
List of :obj:`torch.FloatTensor` of length :obj:`config.n_layers`,
|
||||
with each tensor of shape
|
||||
List of :obj:`torch.FloatTensor` of length :obj:`config.n_layers`, with each tensor of shape
|
||||
:obj:`(2, batch_size, num_heads, sequence_length, embed_size_per_head)`).
|
||||
Contains pre-computed hidden-states (key and values in the attention blocks)
|
||||
of the decoder that can be used (see ``past_key_values`` input) to
|
||||
speed up sequential decoding.
|
||||
|
||||
Contains precomputed hidden-states (key and values in the attention blocks) of the decoder that can be used
|
||||
(see ``past_key_values`` input) to speed up sequential decoding.
|
||||
retrieved_doc_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs, hidden_size)`, `optional`, returned when `output_retrieved=True`):
|
||||
Embedded documents retrieved by the retriever.
|
||||
Is used with ``question_encoder_last_hidden_state`` to compute the ``doc_scores``.
|
||||
@@ -150,48 +147,46 @@ class RetrievAugLMOutput(ModelOutput):
|
||||
Input ids post-processed from the retrieved documents
|
||||
and the question encoder input_ids by the retriever.
|
||||
context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
||||
Attention mask post-processed from the retrieved
|
||||
documents and the question encoder input_ids by the retriever.
|
||||
Attention mask post-processed from the retrieved documents and the question encoder :obj:`input_ids` by the
|
||||
retriever.
|
||||
question_encoder_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
|
||||
Sequence of hidden-states at the output of the last layer
|
||||
of the question encoder pooled output of the model.
|
||||
Sequence of hidden states at the output of the last layer of the question encoder pooled output of the
|
||||
model.
|
||||
question_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
|
||||
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings and one for the output of each
|
||||
layer) of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||
|
||||
Hidden-states of the question encoder at the output of each
|
||||
layer plus the initial embedding outputs.
|
||||
Hidden states of the question encoder at the output of each layer plus the initial embedding outputs.
|
||||
question_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
|
||||
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
|
||||
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
|
||||
|
||||
Attentions weights of the question encoder, after the attention softmax, used to compute the weighted average in the
|
||||
self-attention heads.
|
||||
Attentions weights of the question encoder, after the attention softmax, used to compute the weighted
|
||||
average in the self-attention heads.
|
||||
generator_enc_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
|
||||
Sequence of hidden-states at the output of the last layer of the generator encoder of the model.
|
||||
generator_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
|
||||
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings and one for the output of each
|
||||
layer) of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||
|
||||
Hidden-states of the generator encoder at the output
|
||||
of each layer plus the initial embedding outputs.
|
||||
Hidden states of the generator encoder at the output of each layer plus the initial embedding outputs.
|
||||
generator_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
|
||||
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
|
||||
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
|
||||
|
||||
Attentions weights of the generator encoder, after the attention softmax, used to compute the weighted average in the
|
||||
self-attention heads.
|
||||
Attentions weights of the generator encoder, after the attention softmax, used to compute the weighted
|
||||
average in the self-attention heads.
|
||||
generator_dec_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
|
||||
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings and one for the output of each
|
||||
layer) of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||
|
||||
Hidden-states of the generator decoder at the output of each layer plus the initial embedding outputs.
|
||||
Hidden states of the generator decoder at the output of each layer plus the initial embedding outputs.
|
||||
generator_dec_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
|
||||
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
|
||||
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
|
||||
|
||||
Attentions weights of the generator decoder, after the attention softmax, used to compute the weighted average in the
|
||||
self-attention heads.
|
||||
Attentions weights of the generator decoder, after the attention softmax, used to compute the weighted
|
||||
average in the self-attention heads.
|
||||
"""
|
||||
|
||||
logits: torch.FloatTensor = None
|
||||
@@ -213,10 +208,11 @@ class RetrievAugLMOutput(ModelOutput):
|
||||
|
||||
class RagPreTrainedModel(PreTrainedModel):
|
||||
r"""
|
||||
RAG models were released with the paper `Retrieval-Augmented Generation for
|
||||
Knowledge-Intensive NLP Tasks <https://arxiv.org/abs/2005.11401>`_ by Patrick Lewis, Ethan Perez, Aleksandra Piktus et al.
|
||||
RAG models were released with the paper `Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
|
||||
<https://arxiv.org/abs/2005.11401>`_ by Patrick Lewis, Ethan Perez, Aleksandra Piktus et al.
|
||||
|
||||
RAG is a retriever augmented model and encapsulate three components: a question encoder, a dataset retriever and a generator, the encoder and generator are trainable while the retriever is just an indexed dataset.
|
||||
RAG is a retriever augmented model and encapsulate three components: a question encoder, a dataset retriever and a
|
||||
generator, the encoder and generator are trainable while the retriever is just an indexed dataset.
|
||||
|
||||
"""
|
||||
config_class = RagConfig
|
||||
@@ -232,40 +228,56 @@ class RagPreTrainedModel(PreTrainedModel):
|
||||
*model_args,
|
||||
**kwargs
|
||||
) -> PreTrainedModel:
|
||||
r"""Instantiates an question_encoder and a generator from one or two base classes of the library from pre-trained model checkpoints.
|
||||
r"""
|
||||
Instantiates an question encoder and a generator from one or two base classes of the library from pretrained
|
||||
model checkpoints.
|
||||
|
||||
|
||||
The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated).
|
||||
To train the model, you need to first set it back in training mode with `model.train()`.
|
||||
The model is set in evaluation mode by default using :obj:`model.eval()` (Dropout modules are deactivated).
|
||||
To train the model, you need to first set it back in training mode with :obj:`model.train()`.
|
||||
|
||||
Params:
|
||||
question_encoder_pretrained_model_name_or_path (:obj: `str`, `optional`, defaults to `None`):
|
||||
information necessary to initiate the question_encoder. Either:
|
||||
Information necessary to initiate the question encoder. Can be either:
|
||||
|
||||
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
||||
- a string with the `identifier name` of a pre-trained model that was user-uploaded to our S3, e.g.: ``dbmdz/bert-base-german-cased``.
|
||||
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/question_encoder``.
|
||||
- a path or url to a `tensorflow index checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In this case, ``from_tf`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
|
||||
- A string with the `shortcut name` of a pretrained model to load from cache or download, e.g.,
|
||||
``bert-base-uncased``.
|
||||
- A string with the `identifier name` of a pretrained model that was user-uploaded to our S3, e.g.,
|
||||
``dbmdz/bert-base-german-cased``.
|
||||
- A path to a `directory` containing model weights saved using
|
||||
:func:`~transformers.PreTrainedModel.save_pretrained`, e.g., ``./my_model_directory/``.
|
||||
- A path or url to a `tensorflow index checkpoint file` (e.g, ``./tf_model/model.ckpt.index``). In
|
||||
this case, ``from_tf`` should be set to :obj:`True` and a configuration object should be provided
|
||||
as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in
|
||||
a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
|
||||
|
||||
generator_pretrained_model_name_or_path (:obj: `str`, `optional`, defaults to `None`):
|
||||
information necessary to initiate the generator. Either:
|
||||
Information necessary to initiate the generator. Can be either:
|
||||
|
||||
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
||||
- a string with the `identifier name` of a pre-trained model that was user-uploaded to our S3, e.g.: ``dbmdz/bert-base-german-cased``.
|
||||
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/generator``.
|
||||
- a path or url to a `tensorflow index checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In this case, ``from_tf`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
|
||||
- A string with the `shortcut name` of a pretrained model to load from cache or download, e.g.,
|
||||
``bert-base-uncased``.
|
||||
- A string with the `identifier name` of a pretrained model that was user-uploaded to our S3, e.g.,
|
||||
``dbmdz/bert-base-german-cased``.
|
||||
- A path to a `directory` containing model weights saved using
|
||||
:func:`~transformers.PreTrainedModel.save_pretrained`, e.g., ``./my_model_directory/``.
|
||||
- A path or url to a `tensorflow index checkpoint file` (e.g, ``./tf_model/model.ckpt.index``). In
|
||||
this case, ``from_tf`` should be set to :obj:`True` and a configuration object should be provided
|
||||
as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in
|
||||
a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
|
||||
|
||||
model_args: (`optional`) Sequence of positional arguments:
|
||||
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
||||
model_args (remaining positional arguments, `optional`):
|
||||
All remaning positional arguments will be passed to the underlying model's ``__init__`` method.
|
||||
retriever (:class:`~transformers.RagRetriever`, `optional`):
|
||||
The retriever to use.
|
||||
kwwargs (remaining dictionary of keyword arguments, `optional`):
|
||||
Can be used to update the configuration object (after it being loaded) and initiate the model
|
||||
(e.g., ``output_attentions=True``).
|
||||
|
||||
retriever: (`optional`, ``RagRetriever``) An instance of a :class:`~transformers.RagRetriever` to use as a retriever.
|
||||
- To update the question_encoder configuration, use the prefix `question_encoder_` for each
|
||||
configuration parameter.
|
||||
- To update the generator configuration, use the prefix `generator_` for each configuration parameter.
|
||||
- To update the parent model configuration, do not use a prefix for each configuration parameter.
|
||||
|
||||
kwargs: (`optional`) Remaining dictionary of keyword arguments.
|
||||
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attentions=True``).
|
||||
- To update the question_encoder configuration, use the prefix `question_encoder_` for each configuration parameter
|
||||
- To update the generator configuration, use the prefix `generator_` for each configuration parameter
|
||||
- To update the parent model configuration, do not use a prefix for each configuration parameter
|
||||
Behave differently depending on whether a :obj:`config` is provided or automatically loaded.
|
||||
Behaves differently depending on whether a :obj:`config` is provided or automatically loaded.
|
||||
|
||||
Example::
|
||||
|
||||
@@ -345,23 +357,33 @@ class RagPreTrainedModel(PreTrainedModel):
|
||||
|
||||
|
||||
RAG_START_DOCSTRING = r"""
|
||||
|
||||
RAG is a seq2seq model which encapsulates two core components: a question encoder and a generator.
|
||||
During a forward pass, we encode the input with the question encoder and pass it
|
||||
to the retriever to extract relevant context documents. The documents are then prepended to the input.
|
||||
Such contextualized inputs is passed to the generator.
|
||||
|
||||
The question encoder can be any `autoencoding` model, preferably :obj:`~transformers.DPRQuestionEncoder`, and the generator can be any `seq2seq` model, preferably :obj:`~transformers.BartForConditionalGeneration`.
|
||||
The question encoder can be any `autoencoding` model, preferably :class:`~transformers.DPRQuestionEncoder`, and the
|
||||
generator can be any `seq2seq` model, preferably :class:`~transformers.BartForConditionalGeneration`.
|
||||
|
||||
The model can be initialized with a :obj:`~transformers.RagRetriever` for end-to-end generation or used in combination with the outputs of a retriever in multiple steps - see examples for more details.
|
||||
The model is compatible any `autoencoding` model as the ``question_encoder`` and any `seq2seq` model with language model head as the ``generator``.
|
||||
The model has been tested with :class:`~transformers.DPRQuestionEncoder` as the ``question_encoder`` and :class:`~transformers.BartForConditionalGeneration` or :class:`~transformers.T5ForConditionalGeneration` as the ``generator``.
|
||||
The model can be initialized with a :class:`~transformers.RagRetriever` for end-to-end generation or used in
|
||||
combination with the outputs of a retriever in multiple steps---see examples for more details.
|
||||
The model is compatible any `autoencoding` model as the ``question_encoder`` and any `seq2seq` model with language
|
||||
model head as the ``generator``. It has been tested with :class:`~transformers.DPRQuestionEncoder` as the
|
||||
``question_encoder`` and :class:`~transformers.BartForConditionalGeneration` or
|
||||
:class:`~transformers.T5ForConditionalGeneration` as the ``generator``.
|
||||
|
||||
This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`_ sub-class.
|
||||
This model inherits from :class:`~transformers.PreTrainedModel`. Check the superclass documentation for the generic
|
||||
methods the library implements for all its model (such as downloading or saving, resizing the input embeddings,
|
||||
pruning heads etc.)
|
||||
|
||||
This model is also a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__ subclass.
|
||||
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general
|
||||
usage and behavior.
|
||||
|
||||
Args:
|
||||
config (:class:`~transformers.RagConfig`): Model configuration class with all the parameters of the model.
|
||||
config (:class:`~transformers.RagConfig`):
|
||||
Model configuration class with all the parameters of the model.
|
||||
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
||||
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
||||
question_encoder (:class:`transformers.PreTrainedModel`):
|
||||
@@ -377,44 +399,65 @@ RAG_FORWARD_INPUTS_DOCSTRING = r"""
|
||||
Args:
|
||||
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`):
|
||||
Indices of input sequence tokens in the vocabulary.
|
||||
:class:`~transformers.RagConfig`, used to initialize the model, specifies which generator to use, it also specifies a compatible
|
||||
generator tokenizer. Use that tokenizer class to obtain the indices.
|
||||
attention_mask (:obj:`torch.Tensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
|
||||
Mask to avoid performing attention on padding token indices in input_ids.
|
||||
:class:`~transformers.RagConfig`, used to initialize the model, specifies which generator to use, it also
|
||||
specifies a compatible generator tokenizer. Use that tokenizer class to obtain the indices.
|
||||
attention_mask (:obj:`torch.Tensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
|
||||
Mask to avoid performing attention on padding token indices.
|
||||
Mask values selected in ``[0, 1]``:
|
||||
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
|
||||
|
||||
- 1 for tokens that are **not masked**,
|
||||
- 0 for tokens that are **maked**.
|
||||
|
||||
`What are attention masks? <../glossary.html#attention-mask>`__
|
||||
encoder_outputs (:obj:`tuple(tuple(torch.FloatTensor)`, `optional`)
|
||||
Tuple consists of (:obj:`last_hidden_state`, `optional`: :obj:`hidden_states`, `optional`: :obj:`attentions`)
|
||||
`last_hidden_state` of shape :obj:`(batch_size, n_docs * sequence_length, hidden_size)` is a sequence of hidden-states at the output of the last layer of the encoder.
|
||||
`doc_scores` of shape :obj:`(batch_size, n_docs)` store retrieval scores of documents retrieved for each input in the batch.
|
||||
Used by the (:class:`~transformers.RagTokenForGeneration`) model during decoding.
|
||||
decoder_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, target_sequence_length)`, `optional`, defaults to :obj:`None`):
|
||||
Provide for generation tasks. `None` by default, constuct as per instructions for the generator model you're using with your RAG instance.
|
||||
Provide for generation tasks. `None` by default, constuct as per instructions for the generator model you're using with your RAG instance.
|
||||
Tuple consists of (:obj:`generator_enc_last_hidden_state`, `optional`: :obj:`generator_enc_hidden_states`,
|
||||
`optional`: :obj:`generator_enc_attentions`). :obj:`generator_enc_last_hidden_state` of shape
|
||||
:obj:`(batch_size, n_docs * sequence_length, hidden_size)` is a sequence of hidden-states at the output of
|
||||
the last layer of the generator's encoder.
|
||||
|
||||
Used by the (:class:`~transformers.RagModel`) model during decoding.
|
||||
decoder_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, target_sequence_length)`, `optional`):
|
||||
Provide for generation tasks. `None` by default, constuct as per instructions for the generator model
|
||||
you're using with your RAG instance.
|
||||
decoder_attention_mask (:obj:`torch.BoolTensor` of shape :obj:`(batch_size, target_sequence_length)`, `optional`):
|
||||
Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.
|
||||
Default behavior: generate a tensor that ignores pad tokens in :obj:`decoder_input_ids`. Causal mask will
|
||||
also be used by default.
|
||||
past_key_values (:obj:`tuple(tuple(torch.FloatTensor))`):
|
||||
Tuple consists of two elements: :obj:`encoder_outputs` of the RAG model (see :obj:`encoder_outputs`) and :obj:`past_key_values` of the underlying generator.
|
||||
Can be used to speed up decoding. :obj:`past_key_values` are used in the (:class:`~transformers.RagTokenForGeneration`)
|
||||
model during decoding.
|
||||
Tuple consists of two elements: :obj:`encoder_outputs` of the RAG model (see :obj:`encoder_outputs`) and
|
||||
:obj:`past_key_values` of the underlying generator.
|
||||
Can be used to speed up decoding. :obj:`past_key_values` are used in the
|
||||
(:class:`~transformers.RagTokenForGeneration`) model during decoding.
|
||||
doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`):
|
||||
Score between each retrieved document embeddigs (see :obj:`retrieved_doc_embeds`) and :obj:`question_encoder_last_hidden_state`.
|
||||
If the model has is not initialized with a ``retriever`` :obj:`doc_scores` has to be provided to the forward pass. :obj:`doc_scores` can be computed via :obj:`question_encoder_last_hidden_state` and :obj:`retrieved_doc_embeds`, see examples for more information.
|
||||
Score between each retrieved document embeddigs (see :obj:`retrieved_doc_embeds`) and
|
||||
:obj:`question_encoder_last_hidden_state`.
|
||||
If the model has is not initialized with a ``retriever`` :obj:`doc_scores` has to be provided to the
|
||||
forward pass. :obj:`doc_scores` can be computed via :obj:`question_encoder_last_hidden_state` and
|
||||
:obj:`retrieved_doc_embeds`, see examples for more information.
|
||||
context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
||||
Input ids post-processed from the retrieved documents and the question encoder input_ids by the retriever.
|
||||
If the model has is not initialized with a ``retriever`` :obj:`context_input_ids` has to be provided to the forward pass. :obj:`context_input_ids` are returned by :meth:`~transformers.RagRetriever.__call__`
|
||||
Input IDs post-processed from the retrieved documents and the question encoder :obj:`input_ids` by the
|
||||
retriever.
|
||||
|
||||
If the model has is not initialized with a ``retriever`` :obj:`context_input_ids` has to be provided to the
|
||||
forward pass. :obj:`context_input_ids` are returned by :meth:`~transformers.RagRetriever.__call__`.
|
||||
context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
||||
Attention mask post-processed from the retrieved documents and the question encoder input_ids by the retriever.
|
||||
If the model has is not initialized with a ``retriever`` :obj:`context_attention_mask` has to be provided to the forward pass. :obj:`context_attention_mask` are returned by :meth:`~transformers.RagRetriever.__call__`
|
||||
Attention mask post-processed from the retrieved documents and the question encoder :obj:`input_ids` by the
|
||||
retriever.
|
||||
|
||||
If the model has is not initialized with a ``retriever`` :obj:`context_attention_mask` has to be provided
|
||||
to the forward pass. :obj:`context_attention_mask` are returned by
|
||||
:meth:`~transformers.RagRetriever.__call__`.
|
||||
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
||||
If `use_cache` is True, ``past_key_values`` are returned and can be used to speed up decoding (see
|
||||
``past_key_values``).
|
||||
If set to :obj:`True`, ``past_key_values`` key value states are returned and can be used to speed up
|
||||
decoding (see ``past_key_values``).
|
||||
output_attentions (:obj:`bool`, `optional`):
|
||||
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
|
||||
Whether or not to return the attentions tensors of all attention layers. See ``attentions`` under returned
|
||||
tensors for more detail.
|
||||
output_hidden_states (:obj:`bool`, `optional`):
|
||||
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
|
||||
Whether or not to return the hidden states of all layers. See ``hidden_states`` under returned tensors for
|
||||
more detail.
|
||||
output_retrieved(:obj:`bool`, `optional`):
|
||||
If set to ``True``, :obj:`retrieved_doc_embeds`, :obj:`retrieved_doc_ids`, :obj:`context_input_ids` and :obj:`context_attention_mask` are returned. See returned tensors for more detail.
|
||||
Whether or not to return the :obj:`retrieved_doc_embeds`, :obj:`retrieved_doc_ids`,
|
||||
:obj:`context_input_ids` and :obj:`context_attention_mask`. See returned tensors for more detail.
|
||||
"""
|
||||
|
||||
|
||||
@@ -662,15 +705,15 @@ class RagSequenceForGeneration(RagPreTrainedModel):
|
||||
**kwargs # needs kwargs for generation
|
||||
):
|
||||
r"""
|
||||
exclude_bos_score (:obj:`bool`, `optional`):
|
||||
Only relevant if ``labels`` is passed.
|
||||
If :obj:`True`, the score of the BOS token is disregarded when computing
|
||||
the loss.
|
||||
reduce_loss (:obj:`bool`, `optional`):
|
||||
Only relevant if ``labels`` is passed.
|
||||
If :obj:`True`, the NLL loss is reduced using the ``torch.Tensor.sum`` operation.
|
||||
kwargs (:obj:`Dict[str, any]`, optional, defaults to `{}`):
|
||||
Legacy dictionary, which is required so that model can use `generate()` function.
|
||||
exclude_bos_score (:obj:`bool`, `optional`):
|
||||
Only relevant if ``labels`` is passed.
|
||||
If :obj:`True`, the score of the BOS token is disregarded when computing
|
||||
the loss.
|
||||
reduce_loss (:obj:`bool`, `optional`):
|
||||
Only relevant if ``labels`` is passed.
|
||||
If :obj:`True`, the NLL loss is reduced using the ``torch.Tensor.sum`` operation.
|
||||
kwargs (:obj:`Dict[str, any]`, optional, defaults to `{}`):
|
||||
Legacy dictionary, which is required so that model can use `generate()` function.
|
||||
|
||||
Returns:
|
||||
|
||||
@@ -780,28 +823,31 @@ class RagSequenceForGeneration(RagPreTrainedModel):
|
||||
):
|
||||
"""
|
||||
Implements RAG sequence "thorough" decoding.
|
||||
Read the :meth:`~transformers.PreTrainedModel.generate`` documentation for more information on how to set other generate input parameters.
|
||||
Read the :meth:`~transformers.PreTrainedModel.generate`` documentation for more information on how to set other
|
||||
generate input parameters.
|
||||
|
||||
Args:
|
||||
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
|
||||
The sequence used as a prompt for the generation. If :obj:`input_ids` is not passed, then :obj:`context_input_ids` has to be provided.
|
||||
The sequence used as a prompt for the generation. If :obj:`input_ids` is not passed, then
|
||||
:obj:`context_input_ids` has to be provided.
|
||||
context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
||||
Input ids post-processed from the retrieved documents and the question encoder input_ids by the retriever.
|
||||
Input IDs post-processed from the retrieved documents and the question encoder input_ids by the
|
||||
retriever.
|
||||
do_deduplication (:obj:`bool`, `optional`):
|
||||
Controls whether we want to deduplicate the generations from different context documents for a given input.
|
||||
Whether or not to deduplicate the generations from different context documents for a given input.
|
||||
Has to be set to :obj:`False` if used while training with distributed backend.
|
||||
num_return_sequences(:obj:`int`, `optional`, defaults to 1):
|
||||
The number of independently computed returned sequences for each element in the batch. Note that this is not the value
|
||||
we pass to the ``generator``'s `:func:`~transformers.PreTrainedModel.generate`` function, where we set ``num_return_sequences``
|
||||
to `num_beams`.
|
||||
The number of independently computed returned sequences for each element in the batch. Note that this
|
||||
is not the value we pass to the ``generator``'s `:func:`~transformers.PreTrainedModel.generate``
|
||||
function, where we set ``num_return_sequences`` to :obj:`num_beams`.
|
||||
num_beams (:obj:`int`, `optional`, defaults to 1):
|
||||
Number of beams for beam search. 1 means no beam search.
|
||||
kwargs:
|
||||
Additional kwargs will be passed to :meth:`~transformers.PreTrainedModel.generate``.
|
||||
Return:
|
||||
Additional kwargs will be passed to :meth:`~transformers.PreTrainedModel.generate`.
|
||||
|
||||
Return:
|
||||
:obj:`torch.LongTensor` of shape :obj:`(batch_size * num_return_sequences, sequence_length)`:
|
||||
The generated sequences. The second dimension (sequence_length) is either equal to :obj:`max_length` or
|
||||
The generated sequences. The second dimension (sequence length) is either equal to :obj:`max_length` or
|
||||
shorter if all batches finished early due to the :obj:`eos_token_id`.
|
||||
"""
|
||||
|
||||
@@ -1033,14 +1079,15 @@ class RagTokenForGeneration(RagPreTrainedModel):
|
||||
**kwargs # needs kwargs for generation
|
||||
):
|
||||
r"""
|
||||
do_marginalize (:obj:`bool`, `optional`):
|
||||
If :obj:`True`, the logits are marginalized over all documents
|
||||
by making use of ``torch.nn.functional.log_softmax``.
|
||||
reduce_loss (:obj:`bool`, `optional`):
|
||||
Only relevant if ``labels`` is passed.
|
||||
If :obj:`True`, the NLL loss is reduced using the ``torch.Tensor.sum`` operation.
|
||||
kwargs (:obj:`Dict[str, any]`, optional, defaults to `{}`):
|
||||
Legacy dictionary, which is required so that model can use `generate()` function.
|
||||
do_marginalize (:obj:`bool`, `optional`):
|
||||
If :obj:`True`, the logits are marginalized over all documents
|
||||
by making use of ``torch.nn.functional.log_softmax``.
|
||||
reduce_loss (:obj:`bool`, `optional`):
|
||||
Only relevant if ``labels`` is passed.
|
||||
If :obj:`True`, the NLL loss is reduced using the ``torch.Tensor.sum`` operation.
|
||||
kwargs (:obj:`Dict[str, any]`, optional, defaults to `{}`):
|
||||
Legacy dictionary, which is required so that model can use `generate()` function.
|
||||
|
||||
Returns:
|
||||
|
||||
Example::
|
||||
@@ -1156,23 +1203,35 @@ class RagTokenForGeneration(RagPreTrainedModel):
|
||||
|
||||
Args:
|
||||
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
|
||||
The sequence used as a prompt for the generation. If :obj:`input_ids` is not passed, then :obj:`context_input_ids` has to be provided.
|
||||
The sequence used as a prompt for the generation. If :obj:`input_ids` is not passed, then
|
||||
:obj:`context_input_ids` has to be provided.
|
||||
context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
||||
Input ids post-processed from the retrieved documents and the question encoder input_ids by the retriever.
|
||||
If the model has is not initialized with a ``retriever`` :obj:`context_input_ids` has to be provided to the forward pass. :obj:`context_input_ids` are returned by :meth:`~transformers.RagRetriever.__call__`
|
||||
context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
||||
Attention mask post-processed from the retrieved documents and the question encoder input_ids by the retriever.
|
||||
If the model has is not initialized with a ``retriever`` :obj:`context_attention_mask` has to be provided to the forward pass. :obj:`context_attention_mask` are returned by :meth:`~transformers.RagRetriever.__call__`
|
||||
doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`):
|
||||
Score between each retrieved document embeddigs (see :obj:`retrieved_doc_embeds`) and :obj:`question_encoder_last_hidden_state`.
|
||||
If the model has is not initialized with a ``retriever`` :obj:`doc_scores` has to be provided to the forward pass. :obj:`doc_scores` can be computed via :obj:`question_encoder_last_hidden_state` and :obj:`retrieved_doc_embeds`, see examples for more information.
|
||||
Input IDs post-processed from the retrieved documents and the question encoder :obj:`input_ids` by the
|
||||
retriever.
|
||||
|
||||
If the model has is not initialized with a ``retriever``, :obj:`context_input_ids` has to be provided
|
||||
to the forward pass. :obj:`context_input_ids` are returned by
|
||||
:meth:`~transformers.RagRetriever.__call__`.
|
||||
context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
||||
Attention mask post-processed from the retrieved documents and the question encoder :obj:`input_ids` by
|
||||
the retriever.
|
||||
|
||||
If the model has is not initialized with a ``retriever``, :obj:`context_input_ids` has to be provided
|
||||
to the forward pass. :obj:`context_input_ids` are returned by
|
||||
:meth:`~transformers.RagRetriever.__call__`.
|
||||
doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`):
|
||||
Score between each retrieved document embeddigs (see :obj:`retrieved_doc_embeds`) and
|
||||
:obj:`question_encoder_last_hidden_state`.
|
||||
|
||||
If the model has is not initialized with a ``retriever``, :obj:`context_input_ids` has to be provided
|
||||
to the forward pass. :obj:`context_input_ids` are returned by
|
||||
:meth:`~transformers.RagRetriever.__call__`.
|
||||
max_length (:obj:`int`, `optional`, defaults to 20):
|
||||
The maximum length of the sequence to be generated.
|
||||
min_length (:obj:`int`, `optional`, defaults to 10):
|
||||
The minimum length of the sequence to be generated.
|
||||
early_stopping (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||
Whether to stop the beam search when at least ``num_beams`` sentences are finished per batch or not.
|
||||
Whether or not to stop the beam search when at least ``num_beams`` sentences are finished per batch or not.
|
||||
use_cache: (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
||||
Whether or not the model should use the past last key/values attentions (if applicable to the model) to
|
||||
speed up decoding.
|
||||
@@ -1195,14 +1254,13 @@ class RagTokenForGeneration(RagPreTrainedModel):
|
||||
num_beams (:obj:`int`, `optional`, defaults to 1):
|
||||
Number of beams for beam search. 1 means no beam search.
|
||||
num_return_sequences(:obj:`int`, `optional`, defaults to 1):
|
||||
The number of independently computed returned sequences for each element in the batch. Note that this is not the value
|
||||
we pass to the ``generator``'s `:func:`~transformers.PreTrainedModel.generate`` function, where we set ``num_return_sequences``
|
||||
to `num_beams`.
|
||||
The number of independently computed returned sequences for each element in the batch. Note that this
|
||||
is not the value we pass to the ``generator``'s `:func:`~transformers.PreTrainedModel.generate`
|
||||
function, where we set ``num_return_sequences`` to :obj:`num_beams`.
|
||||
decoder_start_token_id (:obj:`int`, `optional`):
|
||||
If an encoder-decoder model starts decoding with a different token than `bos`, the id of that token.
|
||||
|
||||
Return:
|
||||
|
||||
:obj:`torch.LongTensor` of shape :obj:`(batch_size * num_return_sequences, sequence_length)`:
|
||||
The generated sequences. The second dimension (sequence_length) is either equal to :obj:`max_length` or
|
||||
shorter if all batches finished early due to the :obj:`eos_token_id`.
|
||||
|
||||
@@ -399,12 +399,14 @@ class RagRetriever:
|
||||
The number of docs retrieved per query.
|
||||
|
||||
Return:
|
||||
retrieved_doc_embeds (:obj:`np.ndarray` of shape :obj:`(batch_size, n_docs, dim)`
|
||||
The retrieval embeddings of the retrieved docs per query.
|
||||
doc_ids (:obj:`np.ndarray` of shape :obj:`batch_size, n_docs`)
|
||||
The ids of the documents in the index
|
||||
doc_dicts (:obj:`List[dict]`):
|
||||
The retrieved_doc_embeds examples per query.
|
||||
:obj:`Tuple[np.ndarray, np.ndarray, List[dict]]`:
|
||||
A tuple with the following objects:
|
||||
|
||||
- **retrieved_doc_embeds** (:obj:`np.ndarray` of shape :obj:`(batch_size, n_docs, dim)`) -- The
|
||||
retrieval embeddings of the retrieved docs per query.
|
||||
- **doc_ids** (:obj:`np.ndarray` of shape :obj:`(batch_size, n_docs)`) -- The ids of the documents in the
|
||||
index
|
||||
- **doc_dicts** (:obj:`List[dict]`): The :obj:`retrieved_doc_embeds` examples per query.
|
||||
"""
|
||||
|
||||
doc_ids, retrieved_doc_embeds = self._main_retrieve(question_hidden_states, n_docs)
|
||||
|
||||
@@ -17,7 +17,8 @@ import os
|
||||
from typing import List, Optional
|
||||
|
||||
from .configuration_rag import RagConfig
|
||||
from .tokenization_utils_base import BatchEncoding
|
||||
from .file_utils import add_start_docstrings
|
||||
from .tokenization_utils_base import PREPARE_SEQ2SEQ_BATCH_DOCSTRING, BatchEncoding
|
||||
from .utils import logging
|
||||
|
||||
|
||||
@@ -60,6 +61,7 @@ class RagTokenizer:
|
||||
def batch_decode(self, *args, **kwargs):
|
||||
return self.generator.batch_decode(*args, **kwargs)
|
||||
|
||||
@add_start_docstrings(PREPARE_SEQ2SEQ_BATCH_DOCSTRING)
|
||||
def prepare_seq2seq_batch(
|
||||
self,
|
||||
src_texts: List[str],
|
||||
@@ -71,66 +73,6 @@ class RagTokenizer:
|
||||
truncation=True,
|
||||
**kwargs,
|
||||
) -> BatchEncoding:
|
||||
r"""
|
||||
|
||||
Prepare a batch that can be passed directly to an instance of :class:`~transformers.RagModel`.
|
||||
|
||||
Args:
|
||||
src_texts: (:obj:`List[str]`):
|
||||
List of documents to summarize or source language texts.
|
||||
tgt_texts: (:obj:`List[str]`, `optional`):
|
||||
List of summaries or target language texts.
|
||||
max_length (:obj:`int`, `optional`):
|
||||
Controls the maximum length for encoder inputs (documents to summarize or source language texts).
|
||||
If left unset or set to :obj:`None`, this will use the predefined model maximum length if a maximum
|
||||
length is required by one of the truncation/padding parameters. If the model has no specific maximum
|
||||
input length (like XLNet) truncation/padding to a maximum length will be deactivated.
|
||||
max_target_length (:obj:`int`, `optional`):
|
||||
Controls the maximum length of decoder inputs (target language texts or summaries).
|
||||
If left unset or set to :obj:`None`, this will use the max_length value.
|
||||
padding (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.PaddingStrategy`, `optional`, defaults to :obj:`False`):
|
||||
Activates and controls padding. Accepts the following values:
|
||||
|
||||
* :obj:`True` or :obj:`'longest'`: Pad to the longest sequence in the batch (or no padding if only a
|
||||
single sequence if provided).
|
||||
* :obj:`'max_length'`: Pad to a maximum length specified with the argument :obj:`max_length` or to the
|
||||
maximum acceptable input length for the model if that argument is not provided.
|
||||
* :obj:`False` or :obj:`'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of
|
||||
different lengths).
|
||||
return_tensors (:obj:`str` or :class:`~transformers.tokenization_utils_base.TensorType`, `optional`, defaults to "pt"):
|
||||
If set, will return tensors instead of list of python integers. Acceptable values are:
|
||||
|
||||
* :obj:`'tf'`: Return TensorFlow :obj:`tf.constant` objects.
|
||||
* :obj:`'pt'`: Return PyTorch :obj:`torch.Tensor` objects.
|
||||
* :obj:`'np'`: Return Numpy :obj:`np.ndarray` objects.
|
||||
truncation (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.TruncationStrategy`, `optional`, defaults to :obj:`True`):
|
||||
Activates and controls truncation. Accepts the following values:
|
||||
|
||||
* :obj:`True` or :obj:`'longest_first'`: Truncate to a maximum length specified with the argument
|
||||
:obj:`max_length` or to the maximum acceptable input length for the model if that argument is not
|
||||
provided. This will truncate token by token, removing a token from the longest sequence in the pair
|
||||
if a pair of sequences (or a batch of pairs) is provided.
|
||||
* :obj:`'only_first'`: Truncate to a maximum length specified with the argument :obj:`max_length` or to
|
||||
the maximum acceptable input length for the model if that argument is not provided. This will only
|
||||
truncate the first sequence of a pair if a pair of sequences (or a batch of pairs) is provided.
|
||||
* :obj:`'only_second'`: Truncate to a maximum length specified with the argument :obj:`max_length` or
|
||||
to the maximum acceptable input length for the model if that argument is not provided. This will only
|
||||
truncate the second sequence of a pair if a pair of sequences (or a batch of pairs) is provided.
|
||||
* :obj:`False` or :obj:`'do_not_truncate'` (default): No truncation (i.e., can output batch with
|
||||
sequence lengths greater than the model maximum admissible input size).
|
||||
**kwargs:
|
||||
Additional keyword arguments passed along to :obj:`self.__call__`.
|
||||
|
||||
Returns:
|
||||
:class:`~transformers.BatchEncoding`: A :class:`~transformers.BatchEncoding` with the following fields:
|
||||
|
||||
- **input_ids** -- List of token ids to be fed to the encoder.
|
||||
- **attention_mask** -- List of indices specifying which tokens should be attended to by the model.
|
||||
- **labels** -- List of token ids for tgt_texts
|
||||
|
||||
The full set of keys ``[input_ids, attention_mask, labels]``,
|
||||
will only be returned if tgt_texts is passed. Otherwise, input_ids, attention_mask will be the only keys.
|
||||
"""
|
||||
if max_length is None:
|
||||
max_length = self.question_encoder.model_max_length
|
||||
model_inputs: BatchEncoding = self.question_encoder(
|
||||
|
||||
@@ -31,10 +31,10 @@ XXX_PRETRAINED_CONFIG_ARCHIVE_MAP = {
|
||||
|
||||
class XxxConfig(PretrainedConfig):
|
||||
r"""
|
||||
This is the configuration class to store the configuration of a :class:`~transformers.XXXModel`.
|
||||
It is used to instantiate a XXX model according to the specified arguments, defining the model
|
||||
architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of
|
||||
the XXX `xxx-base-uncased <https://huggingface.co/xxx/xxx-base-uncased>`__ architecture.
|
||||
This is the configuration class to store the configuration of a :class:`~transformers.XxxModel` or a
|
||||
:class:`~transformers.TFXxxModel`. It is used to instantiate a XXX model according to the specified
|
||||
arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar
|
||||
configuration to that of the XXX `xxx-base-uncased <https://huggingface.co/xxx/xxx-base-uncased>`__ architecture.
|
||||
|
||||
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used
|
||||
to control the model outputs. Read the documentation from :class:`~transformers.PretrainedConfig`
|
||||
@@ -42,33 +42,35 @@ class XxxConfig(PretrainedConfig):
|
||||
|
||||
|
||||
Args:
|
||||
vocab_size (:obj:`int`, optional, defaults to 30522):
|
||||
Vocabulary size of the XXX model. Defines the different tokens that
|
||||
can be represented by the `inputs_ids` passed to the forward method of :class:`~transformers.XXXModel`.
|
||||
hidden_size (:obj:`int`, optional, defaults to 768):
|
||||
vocab_size (:obj:`int`, `optional`, defaults to 30522):
|
||||
Vocabulary size of the XXX model. Defines the number of different tokens that can be represented by the
|
||||
:obj:`inputs_ids` passed when calling :class:`~transformers.XxxModel` or
|
||||
:class:`~transformers.TFXxxModel`.
|
||||
hidden_size (:obj:`int`, `optional`, defaults to 768):
|
||||
Dimensionality of the encoder layers and the pooler layer.
|
||||
num_hidden_layers (:obj:`int`, optional, defaults to 12):
|
||||
num_hidden_layers (:obj:`int`, `optional`, defaults to 12):
|
||||
Number of hidden layers in the Transformer encoder.
|
||||
num_attention_heads (:obj:`int`, optional, defaults to 12):
|
||||
num_attention_heads (:obj:`int`, `optional`, defaults to 12):
|
||||
Number of attention heads for each attention layer in the Transformer encoder.
|
||||
hidden_act (:obj:`str` or :obj:`function`, optional, defaults to :obj:`"gelu"`):
|
||||
hidden_act (:obj:`str` or :obj:`Callable`, `optional`, defaults to :obj:`"gelu"`):
|
||||
The non-linear activation function (function or string) in the encoder and pooler.
|
||||
|
||||
If string, :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
|
||||
hidden_dropout_prob (:obj:`float`, optional, defaults to 0.1):
|
||||
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
|
||||
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
|
||||
attention_probs_dropout_prob (:obj:`float`, optional, defaults to 0.1):
|
||||
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
|
||||
The dropout ratio for the attention probabilities.
|
||||
max_position_embeddings (:obj:`int`, optional, defaults to 512):
|
||||
max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
|
||||
The maximum sequence length that this model might ever be used with.
|
||||
Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
|
||||
type_vocab_size (:obj:`int`, optional, defaults to 2):
|
||||
The vocabulary size of the `token_type_ids` passed into :class:`~transformers.BertModel`.
|
||||
initializer_range (:obj:`float`, optional, defaults to 0.02):
|
||||
type_vocab_size (:obj:`int`, `optional`, defaults to 2):
|
||||
The vocabulary size of the :obj:`token_type_ids` passed when calling :class:`~transformers.XxxModel` or
|
||||
:class:`~transformers.TFXxxModel`.
|
||||
initializer_range (:obj:`float`, `optional`, defaults to 0.02):
|
||||
The standard deviation of the :obj:`truncated_normal_initializer` for initializing all weight matrices.
|
||||
layer_norm_eps (:obj:`float`, optional, defaults to 1e-5):
|
||||
layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-5):
|
||||
The epsilon used by the layer normalization layers.
|
||||
gradient_checkpointing (:obj:`bool`, optional, defaults to :obj:`False`):
|
||||
gradient_checkpointing (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||
If :obj:`True`, use gradient checkpointing to save memory at the expense of slower backward pass.
|
||||
kwargs:
|
||||
Additional arguments for common configurations, passed to :class:`~transformers.PretrainedConfig`.
|
||||
|
||||
@@ -257,32 +257,37 @@ class TFXxxPreTrainedModel(TFPreTrainedModel):
|
||||
|
||||
|
||||
XXX_START_DOCSTRING = r"""
|
||||
|
||||
The XXX model was proposed in
|
||||
`XXX: Pre-training of Deep Bidirectional Transformers for Language Understanding
|
||||
<https://arxiv.org/abs/1810.04805>`__ by....
|
||||
|
||||
This model is a `tf.keras.Model <https://www.tensorflow.org/api_docs/python/tf/keras/Model>`__ sub-class.
|
||||
Use it as a regular TF 2.0 Keras Model and
|
||||
refer to the TF 2.0 documentation for all matter related to general usage and behavior.
|
||||
This model inherits from :class:`~transformers.TFPreTrainedModel`. Check the superclass documentation for the
|
||||
generic methods the library implements for all its model (such as downloading or saving, resizing the input
|
||||
embeddings, pruning heads etc.)
|
||||
|
||||
This model is also a `tf.keras.Model <https://www.tensorflow.org/api_docs/python/tf/keras/Model>`__ subclass.
|
||||
Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general
|
||||
usage and behavior.
|
||||
|
||||
.. note::
|
||||
|
||||
TF 2.0 models accepts two formats as inputs:
|
||||
|
||||
- having all inputs as keyword arguments (like PyTorch models), or
|
||||
- having all inputs as a list, tuple or dict in the first positional arguments.
|
||||
- having all inputs as keyword arguments (like PyTorch models), or
|
||||
- having all inputs as a list, tuple or dict in the first positional arguments.
|
||||
|
||||
This second option is useful when using :obj:`tf.keras.Model.fit()` method which currently requires having
|
||||
This second option is useful when using :meth:`tf.keras.Model.fit` method which currently requires having
|
||||
all the tensors in the first argument of the model call function: :obj:`model(inputs)`.
|
||||
|
||||
If you choose this second option, there are three possibilities you can use to gather all the input Tensors
|
||||
in the first positional argument :
|
||||
|
||||
- a single Tensor with input_ids only and nothing else: :obj:`model(inputs_ids)`
|
||||
- a single Tensor with :obj:`input_ids` only and nothing else: :obj:`model(inputs_ids)`
|
||||
- a list of varying length with one or several input Tensors IN THE ORDER given in the docstring:
|
||||
:obj:`model([input_ids, attention_mask])` or :obj:`model([input_ids, attention_mask, token_type_ids])`
|
||||
- a dictionary with one or several input Tensors associated to the input names given in the docstring:
|
||||
:obj:`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
|
||||
:obj:`model({"input_ids": input_ids, "token_type_ids": token_type_ids})`
|
||||
|
||||
Parameters:
|
||||
config (:class:`~transformers.XxxConfig`): Model configuration class with all the parameters of the model.
|
||||
@@ -292,27 +297,31 @@ XXX_START_DOCSTRING = r"""
|
||||
|
||||
XXX_INPUTS_DOCSTRING = r"""
|
||||
Args:
|
||||
input_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`{0}`):
|
||||
input_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`({0})`):
|
||||
Indices of input sequence tokens in the vocabulary.
|
||||
|
||||
Indices can be obtained using :class:`transformers.XxxTokenizer`.
|
||||
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||
:func:`transformers.PreTrainedTokenizer.__call__` for details.
|
||||
Indices can be obtained using :class:`~transformers.BertTokenizer`.
|
||||
See :func:`transformers.PreTrainedTokenizer.__call__` and
|
||||
:func:`transformers.PreTrainedTokenizer.encode` for details.
|
||||
|
||||
`What are input IDs? <../glossary.html#input-ids>`__
|
||||
attention_mask (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`{0}`, `optional`):
|
||||
attention_mask (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`({0})`, `optional`):
|
||||
Mask to avoid performing attention on padding token indices.
|
||||
Mask values selected in ``[0, 1]``:
|
||||
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
|
||||
|
||||
- 1 for tokens that are **not masked**,
|
||||
- 0 for tokens that are **maked**.
|
||||
|
||||
`What are attention masks? <../glossary.html#attention-mask>`__
|
||||
token_type_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`{0}`, `optional`):
|
||||
token_type_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`({0})`, `optional`):
|
||||
Segment token indices to indicate first and second portions of the inputs.
|
||||
Indices are selected in ``[0, 1]``: ``0`` corresponds to a `sentence A` token, ``1``
|
||||
corresponds to a `sentence B` token
|
||||
Indices are selected in ``[0, 1]``:
|
||||
|
||||
- 0 corresponds to a `sentence A` token,
|
||||
- 1 corresponds to a `sentence B` token.
|
||||
|
||||
`What are token type IDs? <../glossary.html#token-type-ids>`__
|
||||
position_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`{0}`, `optional`):
|
||||
position_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`({0})`, `optional`):
|
||||
Indices of positions of each input sequence tokens in the position embeddings.
|
||||
Selected in the range ``[0, config.max_position_embeddings - 1]``.
|
||||
|
||||
@@ -320,21 +329,25 @@ XXX_INPUTS_DOCSTRING = r"""
|
||||
head_mask (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`(num_heads,)` or :obj:`(num_layers, num_heads)`, `optional`):
|
||||
Mask to nullify selected heads of the self-attention modules.
|
||||
Mask values selected in ``[0, 1]``:
|
||||
:obj:`1` indicates the head is **not masked**, :obj:`0` indicates the head is **masked**.
|
||||
inputs_embeds (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, embedding_dim)`, `optional`):
|
||||
|
||||
- 1 indicates the head is **not masked**,
|
||||
- 0 indicates the head is **masked**.
|
||||
|
||||
inputs_embeds (:obj:`tf.Tensor` of shape :obj:`({0}, hidden_size)`, `optional`):
|
||||
Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation.
|
||||
This is useful if you want more control over how to convert `input_ids` indices into associated vectors
|
||||
than the model's internal embedding lookup matrix.
|
||||
training (:obj:`boolean`, `optional`, defaults to :obj:`False`):
|
||||
Whether to activate dropout modules (if set to :obj:`True`) during training or to de-activate them
|
||||
(if set to :obj:`False`) for evaluation.
|
||||
This is useful if you want more control over how to convert :obj:`input_ids` indices into associated
|
||||
vectors than the model's internal embedding lookup matrix.
|
||||
output_attentions (:obj:`bool`, `optional`):
|
||||
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
|
||||
Whether or not to return the attentions tensors of all attention layers. See ``attentions`` under returned
|
||||
tensors for more detail.
|
||||
output_hidden_states (:obj:`bool`, `optional`):
|
||||
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
|
||||
Whether or not to return the hidden states of all layers. See ``hidden_states`` under returned tensors for
|
||||
more detail.
|
||||
return_dict (:obj:`bool`, `optional`):
|
||||
If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a
|
||||
plain tuple.
|
||||
Whether or not to return a :class:`~transformers.file_utils.ModelOutput` instead of a plain tuple.
|
||||
training (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||
Whether or not to use the model in training mode (some modules like dropout modules have different
|
||||
behaviors between training and evaluation).
|
||||
"""
|
||||
|
||||
|
||||
@@ -347,7 +360,7 @@ class TFXxxModel(TFXxxPreTrainedModel):
|
||||
super().__init__(config, *inputs, **kwargs)
|
||||
self.transformer = TFXxxMainLayer(config, name="transformer")
|
||||
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||
@add_code_sample_docstrings(
|
||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||
checkpoint="xxx-base-cased",
|
||||
@@ -370,7 +383,7 @@ class TFXxxForMaskedLM(TFXxxPreTrainedModel, TFMaskedLanguageModelingLoss):
|
||||
self.transformer = TFXxxMainLayer(config, name="transformer")
|
||||
self.mlm = TFXxxMLMHead(config, self.transformer.embeddings, name="mlm")
|
||||
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||
@add_code_sample_docstrings(
|
||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||
checkpoint="xxx-base-cased",
|
||||
@@ -452,7 +465,7 @@ class TFXxxForSequenceClassification(TFXxxPreTrainedModel, TFSequenceClassificat
|
||||
config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="classifier"
|
||||
)
|
||||
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING)
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||
@add_code_sample_docstrings(
|
||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||
checkpoint="xxx-base-cased",
|
||||
@@ -544,7 +557,7 @@ class TFXxxForMultipleChoice(TFXxxPreTrainedModel, TFMultipleChoiceLoss):
|
||||
"""
|
||||
return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS)}
|
||||
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, num_choices, sequence_length)"))
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length"))
|
||||
@add_code_sample_docstrings(
|
||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||
checkpoint="xxx-base-cased",
|
||||
@@ -568,8 +581,8 @@ class TFXxxForMultipleChoice(TFXxxPreTrainedModel, TFMultipleChoiceLoss):
|
||||
r"""
|
||||
labels (:obj:`tf.Tensor` of shape :obj:`(batch_size,)`, `optional`):
|
||||
Labels for computing the multiple choice classification loss.
|
||||
Indices should be in ``[0, ..., num_choices]`` where `num_choices` is the size of the second dimension
|
||||
of the input tensors. (see `input_ids` above)s after the attention softmax, used to compute the weighted average in the self-attention
|
||||
Indices should be in ``[0, ..., num_choices]`` where :obj:`num_choices` is the size of the second dimension
|
||||
of the input tensors. (See :obj:`input_ids` above)
|
||||
heads.
|
||||
"""
|
||||
if isinstance(inputs, (tuple, list)):
|
||||
@@ -667,7 +680,7 @@ class TFXxxForTokenClassification(TFXxxPreTrainedModel, TFTokenClassificationLos
|
||||
config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="classifier"
|
||||
)
|
||||
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING)
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||
@add_code_sample_docstrings(
|
||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||
checkpoint="xxx-base-cased",
|
||||
@@ -734,8 +747,8 @@ class TFXxxForTokenClassification(TFXxxPreTrainedModel, TFTokenClassificationLos
|
||||
|
||||
|
||||
@add_start_docstrings(
|
||||
"""XXX Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of
|
||||
the hidden-states output to compute `span start logits` and `span end logits`). """,
|
||||
"""XXX Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear
|
||||
layer on top of the hidden-states output to compute `span start logits` and `span end logits`). """,
|
||||
XXX_START_DOCSTRING,
|
||||
)
|
||||
class TFXxxForQuestionAnswering(TFXxxPreTrainedModel, TFQuestionAnsweringLoss):
|
||||
@@ -748,7 +761,7 @@ class TFXxxForQuestionAnswering(TFXxxPreTrainedModel, TFQuestionAnsweringLoss):
|
||||
config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="qa_outputs"
|
||||
)
|
||||
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING)
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||
@add_code_sample_docstrings(
|
||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||
checkpoint="xxx-base-cased",
|
||||
@@ -773,11 +786,11 @@ class TFXxxForQuestionAnswering(TFXxxPreTrainedModel, TFQuestionAnsweringLoss):
|
||||
r"""
|
||||
start_positions (:obj:`tf.Tensor` of shape :obj:`(batch_size,)`, `optional`):
|
||||
Labels for position (index) of the start of the labelled span for computing the token classification loss.
|
||||
Positions are clamped to the length of the sequence (`sequence_length`).
|
||||
Positions are clamped to the length of the sequence (:obj:`sequence_length`).
|
||||
Position outside of the sequence are not taken into account for computing the loss.
|
||||
end_positions (:obj:`tf.Tensor` of shape :obj:`(batch_size,)`, `optional`):
|
||||
Labels for position (index) of the end of the labelled span for computing the token classification loss.
|
||||
Positions are clamped to the length of the sequence (`sequence_length`).
|
||||
Positions are clamped to the length of the sequence (:obj:`sequence_length`).
|
||||
Position outside of the sequence are not taken into account for computing the loss.
|
||||
"""
|
||||
return_dict = return_dict if return_dict is not None else self.transformer.return_dict
|
||||
|
||||
@@ -209,11 +209,16 @@ class XxxPreTrainedModel(PreTrainedModel):
|
||||
module.bias.data.zero_()
|
||||
|
||||
|
||||
XXX_START_DOCSTRING = r""" The XXX model was proposed in
|
||||
`XXX: Pre-training of Deep Bidirectional Transformers for Language Understanding
|
||||
XXX_START_DOCSTRING = r"""
|
||||
|
||||
The XXX model was proposed in `XXX: Pre-training of Deep Bidirectional Transformers for Language Understanding
|
||||
<https://arxiv.org/abs/1810.04805>`__ by....
|
||||
|
||||
This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`_ sub-class.
|
||||
This model inherits from :class:`~transformers.PreTrainedModel`. Check the superclass documentation for the generic
|
||||
methods the library implements for all its model (such as downloading or saving, resizing the input embeddings,
|
||||
pruning heads etc.)
|
||||
|
||||
This model is also a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__ subclass.
|
||||
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general
|
||||
usage and behavior.
|
||||
|
||||
@@ -225,27 +230,31 @@ XXX_START_DOCSTRING = r""" The XXX model was proposed in
|
||||
|
||||
XXX_INPUTS_DOCSTRING = r"""
|
||||
Inputs:
|
||||
input_ids (:obj:`torch.LongTensor` of shape :obj:`{0}`):
|
||||
input_ids (:obj:`torch.LongTensor` of shape :obj:`({0})`):
|
||||
Indices of input sequence tokens in the vocabulary.
|
||||
|
||||
Indices can be obtained using :class:`transformers.XxxTokenizer`.
|
||||
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||
:func:`transformers.PreTrainedTokenizer.__call__` for details.
|
||||
Indices can be obtained using :class:`~transformers.XxxTokenizer`.
|
||||
See :meth:`transformers.PreTrainedTokenizer.encode` and
|
||||
:meth:`transformers.PreTrainedTokenizer.__call__` for details.
|
||||
|
||||
`What are input IDs? <../glossary.html#input-ids>`__
|
||||
attention_mask (:obj:`torch.FloatTensor` of shape :obj:`{0}`, `optional`):
|
||||
attention_mask (:obj:`torch.FloatTensor` of shape :obj:`({0})`, `optional`):
|
||||
Mask to avoid performing attention on padding token indices.
|
||||
Mask values selected in ``[0, 1]``:
|
||||
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
|
||||
|
||||
- 1 for tokens that are **not masked**,
|
||||
- 0 for tokens that are **maked**.
|
||||
|
||||
`What are attention masks? <../glossary.html#attention-mask>`__
|
||||
token_type_ids (:obj:`torch.LongTensor` of shape :obj:`{0}`, `optional`):
|
||||
token_type_ids (:obj:`torch.LongTensor` of shape :obj:`({0})`, `optional`):
|
||||
Segment token indices to indicate first and second portions of the inputs.
|
||||
Indices are selected in ``[0, 1]``: ``0`` corresponds to a `sentence A` token, ``1``
|
||||
corresponds to a `sentence B` token
|
||||
Indices are selected in ``[0, 1]``:
|
||||
|
||||
- 0 corresponds to a `sentence A` token,
|
||||
- 1 corresponds to a `sentence B` token.
|
||||
|
||||
`What are token type IDs? <../glossary.html#token-type-ids>`_
|
||||
position_ids (:obj:`torch.LongTensor` of shape :obj:`{0}`, `optional`):
|
||||
position_ids (:obj:`torch.LongTensor` of shape :obj:`({0})`, `optional`):
|
||||
Indices of positions of each input sequence tokens in the position embeddings.
|
||||
Selected in the range ``[0, config.max_position_embeddings - 1]``.
|
||||
|
||||
@@ -253,18 +262,22 @@ XXX_INPUTS_DOCSTRING = r"""
|
||||
head_mask (:obj:`torch.FloatTensor` of shape :obj:`(num_heads,)` or :obj:`(num_layers, num_heads)`, `optional`):
|
||||
Mask to nullify selected heads of the self-attention modules.
|
||||
Mask values selected in ``[0, 1]``:
|
||||
:obj:`1` indicates the head is **not masked**, :obj:`0` indicates the head is **masked**.
|
||||
inputs_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
|
||||
|
||||
- 1 indicates the head is **not masked**,
|
||||
- 0 indicates the head is **masked**.
|
||||
|
||||
inputs_embeds (:obj:`torch.FloatTensor` of shape :obj:`({0}, hidden_size)`, `optional`):
|
||||
Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation.
|
||||
This is useful if you want more control over how to convert `input_ids` indices into associated vectors
|
||||
than the model's internal embedding lookup matrix.
|
||||
This is useful if you want more control over how to convert :obj:`input_ids` indices into associated
|
||||
vectors than the model's internal embedding lookup matrix.
|
||||
output_attentions (:obj:`bool`, `optional`):
|
||||
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
|
||||
Whether or not to return the attentions tensors of all attention layers. See ``attentions`` under returned
|
||||
tensors for more detail.
|
||||
output_hidden_states (:obj:`bool`, `optional`):
|
||||
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
|
||||
Whether or not to return the hidden states of all layers. See ``hidden_states`` under returned tensors for
|
||||
more detail.
|
||||
return_dict (:obj:`bool`, `optional`):
|
||||
If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a
|
||||
plain tuple.
|
||||
Whether or not to return a :class:`~transformers.file_utils.ModelOutput` instead of a plain tuple.
|
||||
"""
|
||||
|
||||
|
||||
@@ -296,7 +309,7 @@ class XxxModel(XxxPreTrainedModel):
|
||||
for layer, heads in heads_to_prune.items():
|
||||
self.encoder.layer[layer].attention.prune_heads(heads)
|
||||
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||
@add_code_sample_docstrings(
|
||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||
checkpoint="xxx-base-uncased",
|
||||
@@ -378,7 +391,7 @@ class XxxForMaskedLM(XxxPreTrainedModel):
|
||||
def get_output_embeddings(self):
|
||||
return self.lm_head
|
||||
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||
@add_code_sample_docstrings(
|
||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||
checkpoint="xxx-base-uncased",
|
||||
@@ -455,7 +468,7 @@ class XxxForSequenceClassification(XxxPreTrainedModel):
|
||||
|
||||
self.init_weights()
|
||||
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||
@add_code_sample_docstrings(
|
||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||
checkpoint="xxx-base-uncased",
|
||||
@@ -538,7 +551,7 @@ class XxxForMultipleChoice(XxxPreTrainedModel):
|
||||
|
||||
self.init_weights()
|
||||
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, num_choices, sequence_length)"))
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length"))
|
||||
@add_code_sample_docstrings(
|
||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||
checkpoint="xxx-base-uncased",
|
||||
@@ -561,8 +574,8 @@ class XxxForMultipleChoice(XxxPreTrainedModel):
|
||||
r"""
|
||||
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
|
||||
Labels for computing the multiple choice classification loss.
|
||||
Indices should be in ``[0, ..., num_choices-1]`` where `num_choices` is the size of the second dimension
|
||||
of the input tensors. (see `input_ids` above)
|
||||
Indices should be in ``[0, ..., num_choices-1]`` where :obj:`num_choices` is the size of the second dimension
|
||||
of the input tensors. (See :obj:`input_ids` above)
|
||||
"""
|
||||
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
|
||||
num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
|
||||
@@ -628,7 +641,7 @@ class XxxForTokenClassification(XxxPreTrainedModel):
|
||||
|
||||
self.init_weights()
|
||||
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||
@add_code_sample_docstrings(
|
||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||
checkpoint="xxx-base-uncased",
|
||||
@@ -713,7 +726,7 @@ class XxxForQuestionAnswering(XxxPreTrainedModel):
|
||||
|
||||
self.init_weights()
|
||||
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
|
||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||
@add_code_sample_docstrings(
|
||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||
checkpoint="xxx-base-uncased",
|
||||
@@ -737,11 +750,11 @@ class XxxForQuestionAnswering(XxxPreTrainedModel):
|
||||
r"""
|
||||
start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
|
||||
Labels for position (index) of the start of the labelled span for computing the token classification loss.
|
||||
Positions are clamped to the length of the sequence (`sequence_length`).
|
||||
Positions are clamped to the length of the sequence (:obj:`sequence_length`).
|
||||
Position outside of the sequence are not taken into account for computing the loss.
|
||||
end_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
|
||||
Labels for position (index) of the end of the labelled span for computing the token classification loss.
|
||||
Positions are clamped to the length of the sequence (`sequence_length`).
|
||||
Positions are clamped to the length of the sequence (:obj:`sequence_length`).
|
||||
Position outside of the sequence are not taken into account for computing the loss.
|
||||
"""
|
||||
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
|
||||
|
||||
@@ -80,16 +80,16 @@ class XxxTokenizer(PreTrainedTokenizer):
|
||||
r"""
|
||||
Constructs a XXX tokenizer. Based on XXX.
|
||||
|
||||
This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains most of the methods. Users
|
||||
should refer to the superclass for more information regarding methods.
|
||||
This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains most of the main methods.
|
||||
Users should refer to this superclass for more information regarding those methods.
|
||||
|
||||
Args:
|
||||
vocab_file (:obj:`str`):
|
||||
File containing the vocabulary.
|
||||
do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
||||
Whether to lowercase the input when tokenizing.
|
||||
Whether or not to lowercase the input when tokenizing.
|
||||
do_basic_tokenize (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
||||
Whether to do basic tokenization before WordPiece.
|
||||
Whether ot not to do basic tokenization before WordPiece.
|
||||
never_split (:obj:`Iterable`, `optional`):
|
||||
Collection of tokens which will never be split during tokenization. Only has an effect when
|
||||
:obj:`do_basic_tokenize=True`
|
||||
@@ -194,19 +194,19 @@ class XxxTokenizer(PreTrainedTokenizer):
|
||||
"""
|
||||
Build model inputs from a sequence or a pair of sequence for sequence classification tasks
|
||||
by concatenating and adding special tokens.
|
||||
A BERT sequence has the following format:
|
||||
A XXX sequence has the following format:
|
||||
|
||||
- single sequence: ``[CLS] X [SEP]``
|
||||
- pair of sequences: ``[CLS] A [SEP] B [SEP]``
|
||||
|
||||
Args:
|
||||
token_ids_0 (:obj:`List[int]`):
|
||||
List of IDs to which the special tokens will be added
|
||||
List of IDs to which the special tokens will be added.
|
||||
token_ids_1 (:obj:`List[int]`, `optional`):
|
||||
Optional second list of IDs for sequence pairs.
|
||||
|
||||
Returns:
|
||||
:obj:`List[int]`: list of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens.
|
||||
:obj:`List[int]`: List of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens.
|
||||
"""
|
||||
if token_ids_1 is None:
|
||||
return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
|
||||
@@ -218,16 +218,16 @@ class XxxTokenizer(PreTrainedTokenizer):
|
||||
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False
|
||||
) -> List[int]:
|
||||
"""
|
||||
Retrieves sequence ids from a token list that has no special tokens added. This method is called when adding
|
||||
Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
|
||||
special tokens using the tokenizer ``prepare_for_model`` method.
|
||||
|
||||
Args:
|
||||
token_ids_0 (:obj:`List[int]`):
|
||||
List of ids.
|
||||
List of IDs.
|
||||
token_ids_1 (:obj:`List[int]`, `optional`):
|
||||
Optional second list of IDs for sequence pairs.
|
||||
already_has_special_tokens (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||
Set to True if the token list is already formatted with special tokens for the model
|
||||
Whether or not the token list is already formatted with special tokens for the model.
|
||||
|
||||
Returns:
|
||||
:obj:`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
|
||||
@@ -249,7 +249,7 @@ class XxxTokenizer(PreTrainedTokenizer):
|
||||
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
|
||||
) -> List[int]:
|
||||
"""
|
||||
Creates a mask from the two sequences passed to be used in a sequence-pair classification task.
|
||||
Create a mask from the two sequences passed to be used in a sequence-pair classification task.
|
||||
A BERT sequence pair mask has the following format:
|
||||
|
||||
::
|
||||
@@ -257,11 +257,11 @@ class XxxTokenizer(PreTrainedTokenizer):
|
||||
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
|
||||
| first sequence | second sequence |
|
||||
|
||||
if token_ids_1 is None, only returns the first portion of the mask (0's).
|
||||
If :obj:`token_ids_1` is :obj:`None`, this method only returns the first portion of the mask (0s).
|
||||
|
||||
Args:
|
||||
token_ids_0 (:obj:`List[int]`):
|
||||
List of ids.
|
||||
List of IDs.
|
||||
token_ids_1 (:obj:`List[int]`, `optional`):
|
||||
Optional second list of IDs for sequence pairs.
|
||||
|
||||
@@ -277,7 +277,7 @@ class XxxTokenizer(PreTrainedTokenizer):
|
||||
|
||||
def save_vocabulary(self, vocab_path):
|
||||
"""
|
||||
Save the sentencepiece vocabulary (copy original file) and special tokens file to a directory.
|
||||
Save the vocabulary (copy original file) and special tokens file to a directory.
|
||||
|
||||
Args:
|
||||
vocab_path (:obj:`str`):
|
||||
|
||||
Reference in New Issue
Block a user