Clean RAG docs and template docs (#7348)
* Clean RAG docs and template docs * Fix typo * Better doc
This commit is contained in:
@@ -4,11 +4,14 @@ RAG
|
|||||||
Overview
|
Overview
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
Retrieval-augmented generation ("RAG") models combine the powers of pretrained dense retrieval (DPR) and Seq2Seq models.
|
Retrieval-augmented generation ("RAG") models combine the powers of pretrained dense retrieval (DPR) and
|
||||||
RAG models retrieve docs, pass them to a seq2seq model, then marginalize to generate outputs.
|
sequence-to-sequence models. RAG models retrieve documents, pass them to a seq2seq model, then marginalize to generate
|
||||||
The retriever and seq2seq modules are initialized from pretrained models, and fine-tuned jointly, allowing both retrieval and generation to adapt to downstream tasks.
|
outputs. The retriever and seq2seq modules are initialized from pretrained models, and fine-tuned jointly, allowing
|
||||||
|
both retrieval and generation to adapt to downstream tasks.
|
||||||
|
|
||||||
It is based on the paper `Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks <https://arxiv.org/abs/2005.11401>`__ by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
|
It is based on the paper `Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
|
||||||
|
<https://arxiv.org/abs/2005.11401>`__ by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir
|
||||||
|
Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
|
||||||
|
|
||||||
The abstract from the paper is the following:
|
The abstract from the paper is the following:
|
||||||
|
|
||||||
@@ -47,7 +50,7 @@ RagTokenizer
|
|||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: transformers.RagTokenizer
|
.. autoclass:: transformers.RagTokenizer
|
||||||
:members:
|
:members: prepare_seq2seq_batch
|
||||||
|
|
||||||
|
|
||||||
Rag specific outputs
|
Rag specific outputs
|
||||||
|
|||||||
@@ -38,35 +38,39 @@ RAG_CONFIG_DOC = r"""
|
|||||||
retrieval_vector_size (:obj:`int`, `optional`, defaults to 768):
|
retrieval_vector_size (:obj:`int`, `optional`, defaults to 768):
|
||||||
Dimensionality of the document embeddings indexed by :class:`~transformers.RagRetriever`.
|
Dimensionality of the document embeddings indexed by :class:`~transformers.RagRetriever`.
|
||||||
retrieval_batch_size (:obj:`int`, `optional`, defaults to 8):
|
retrieval_batch_size (:obj:`int`, `optional`, defaults to 8):
|
||||||
Retrieval batch size, defined as the number of queries issues concurrently to the faiss index excapsulated :class:`~transformers.RagRetriever`.
|
Retrieval batch size, defined as the number of queries issues concurrently to the faiss index excapsulated
|
||||||
|
:class:`~transformers.RagRetriever`.
|
||||||
dataset (:obj:`str`, `optional`, defaults to :obj:`"wiki_dpr"`):
|
dataset (:obj:`str`, `optional`, defaults to :obj:`"wiki_dpr"`):
|
||||||
A datatset identifier of the indexed dataset on HuggingFace AWS bucket (list all available datasets and ids using :obj:`datasets.list_datasets()`).
|
A dataset identifier of the indexed dataset on HuggingFace AWS bucket (list all available datasets and
|
||||||
dataset_split (:obj:`str`, `optional`, defaults to :obj:`train`)
|
ids using :obj:`datasets.list_datasets()`).
|
||||||
Which split of the ``dataset`` to load.
|
dataset_split (:obj:`str`, `optional`, defaults to :obj:`"train"`)
|
||||||
index_name (:obj:`str`, `optional`, defaults to :obj:`compressed`)
|
Which split of the :obj:`dataset` to load.
|
||||||
The index_name of the index associated with the :obj:`dataset`. One can choose between :obj:`legacy`, :obj:`exact` and :obj:`compressed`.
|
index_name (:obj:`str`, `optional`, defaults to :obj:`"compressed"`)
|
||||||
|
The index name of the index associated with the :obj:`dataset`. One can choose between :obj:`"legacy"`,
|
||||||
|
:obj:`"exact"` and :obj:`"compressed"`.
|
||||||
index_path (:obj:`str`, `optional`)
|
index_path (:obj:`str`, `optional`)
|
||||||
The path to the serialized faiss index on disk.
|
The path to the serialized faiss index on disk.
|
||||||
passages_path: (:obj:`str`, `optional`):
|
passages_path: (:obj:`str`, `optional`):
|
||||||
A path to text passages compatible with the faiss index. Required if using :class:`~transformers.retrieval_rag.LegacyIndex`
|
A path to text passages compatible with the faiss index. Required if using
|
||||||
|
:class:`~transformers.retrieval_rag.LegacyIndex`
|
||||||
use_dummy_dataset (:obj:`bool`, `optional`, defaults to ``False``)
|
use_dummy_dataset (:obj:`bool`, `optional`, defaults to ``False``)
|
||||||
Whether to load a "dummy" variant of the dataset specified by :obj:`dataset`.
|
Whether to load a "dummy" variant of the dataset specified by :obj:`dataset`.
|
||||||
label_smoothing (:obj:`float`, `optional`, defaults to 0.0):
|
label_smoothing (:obj:`float`, `optional`, defaults to 0.0):
|
||||||
Only relevant if ``return_loss`` is set to :obj:`True`. Controls the ``epsilon`` parameter value for label smoothing in the loss calculation.
|
Only relevant if ``return_loss`` is set to :obj:`True`. Controls the ``epsilon`` parameter value for label
|
||||||
If set to ``0.0``, no label smoothing is performed.
|
smoothing in the loss calculation. If set to 0, no label smoothing is performed.
|
||||||
do_marginalize (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
do_marginalize (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||||
If :obj:`True`, the logits are marginalized over all documents
|
If :obj:`True`, the logits are marginalized over all documents
|
||||||
by making use of ``torch.nn.functional.log_softmax``.
|
by making use of ``torch.nn.functional.log_softmax``.
|
||||||
reduce_loss (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
reduce_loss (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||||
If :obj:`True`, the NLL loss is reduced using the ``torch.Tensor.sum`` operation.
|
Whether or not to reduce the NLL loss using the ``torch.Tensor.sum`` operation.
|
||||||
do_deduplication (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
do_deduplication (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
||||||
Controls whether we want to deduplicate the generations from different context documents for a given input.
|
Whether or not to deduplicate the generations from different context documents for a given input.
|
||||||
Has to be set to :obj:`False` if used while training with distributed backend.
|
Has to be set to :obj:`False` if used while training with distributed backend.
|
||||||
exclude_bos_score (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
exclude_bos_score (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||||
If :obj:`True`, the score of the BOS token is disregarded when computing
|
Whether or not to disregard the BOS token when computing the loss.
|
||||||
the loss.
|
|
||||||
output_retrieved(:obj:`bool`, `optional`, defaults to :obj:`False`):
|
output_retrieved(:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||||
If set to ``True``, :obj:`retrieved_doc_embeds`, :obj:`retrieved_doc_ids`, :obj:`context_input_ids` and :obj:`context_attention_mask` are returned. See returned tensors for more detail.
|
If set to ``True``, :obj:`retrieved_doc_embeds`, :obj:`retrieved_doc_ids`, :obj:`context_input_ids` and
|
||||||
|
:obj:`context_attention_mask` are returned. See returned tensors for more detail.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -45,66 +45,63 @@ class RetrievAugLMMarginOutput(ModelOutput):
|
|||||||
Prediction scores of the language modeling head.
|
Prediction scores of the language modeling head.
|
||||||
The score is possibly marginalized over all documents for each vocabulary token.
|
The score is possibly marginalized over all documents for each vocabulary token.
|
||||||
doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`):
|
doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`):
|
||||||
Score between each retrieved document embeddigs
|
Score between each retrieved document embeddings (see :obj:`retrieved_doc_embeds`) and
|
||||||
(see :obj:`retrieved_doc_embeds`) and :obj:`question_encoder_last_hidden_state`.
|
:obj:`question_encoder_last_hidden_state`.
|
||||||
past_key_values (:obj:`List[torch.FloatTensor]`, `optional`, returned when ``use_cache=True`` is passed or when ``config.use_cache=True``):
|
past_key_values (:obj:`List[torch.FloatTensor]`, `optional`, returned when ``use_cache=True`` is passed or when ``config.use_cache=True``):
|
||||||
List of :obj:`torch.FloatTensor` of length :obj:`config.n_layers`, with each tensor of shape
|
List of :obj:`torch.FloatTensor` of length :obj:`config.n_layers`, with each tensor of shape
|
||||||
:obj:`(2, batch_size, num_heads, sequence_length, embed_size_per_head)`).
|
:obj:`(2, batch_size, num_heads, sequence_length, embed_size_per_head)`).
|
||||||
|
|
||||||
Contains pre-computed hidden-states (key and values in the attention blocks)
|
Contains precomputed hidden-states (key and values in the attention blocks) of the decoder that can be used
|
||||||
of the decoder that can be used (see ``past_key_values`` input) to
|
(see ``past_key_values`` input) to speed up sequential decoding.
|
||||||
speed up sequential decoding.
|
|
||||||
retrieved_doc_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs, hidden_size)`, `optional`, returned when `output_retrieved=True`):
|
retrieved_doc_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs, hidden_size)`, `optional`, returned when `output_retrieved=True`):
|
||||||
Embedded documents retrieved by the retriever.
|
Embedded documents retrieved by the retriever.
|
||||||
Is used with ``question_encoder_last_hidden_state`` to compute
|
Is used with ``question_encoder_last_hidden_state`` to compute the ``doc_scores``.
|
||||||
the ``doc_scores``.
|
|
||||||
retrieved_doc_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, config.n_docs)`, `optional`, returned when `output_retrieved=True`):
|
retrieved_doc_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, config.n_docs)`, `optional`, returned when `output_retrieved=True`):
|
||||||
The indexes of the embedded documents retrieved by the retriever.
|
The indexes of the embedded documents retrieved by the retriever.
|
||||||
context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
||||||
Input ids post-processed from the retrieved documents
|
Input ids post-processed from the retrieved documents
|
||||||
and the question encoder input_ids by the retriever.
|
and the question encoder input_ids by the retriever.
|
||||||
context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
||||||
Attention mask post-processed from the retrieved documents
|
Attention mask post-processed from the retrieved documents and the question encoder :obj:`input_ids` by the
|
||||||
and the question encoder input_ids by the retriever.
|
retriever.
|
||||||
question_encoder_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
|
question_encoder_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
|
||||||
Sequence of hidden-states at the output of the last layer
|
Sequence of hidden states at the output of the last layer of the question encoder pooled output of the
|
||||||
of the question encoder pooled output of the model.
|
model.
|
||||||
question_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
question_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
||||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings
|
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings and one for the output of each
|
||||||
+ one for the output of each layer)
|
layer) of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||||
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
|
||||||
|
|
||||||
Hidden-states of the question encoder at the output of each layer plus the initial embedding outputs.
|
Hidden states of the question encoder at the output of each layer plus the initial embedding outputs.
|
||||||
question_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
|
question_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
|
||||||
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
|
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
|
||||||
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
|
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
|
||||||
|
|
||||||
Attentions weights of the question encoder, after the attention softmax, used to compute the weighted average in the
|
Attentions weights of the question encoder, after the attention softmax, used to compute the weighted
|
||||||
self-attention heads.
|
average in the self-attention heads.
|
||||||
generator_enc_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
|
generator_enc_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
|
||||||
Sequence of hidden-states at the output of the last layer of the generator encoder of the model.
|
Sequence of hidden-states at the output of the last layer of the generator encoder of the model.
|
||||||
generator_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
generator_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
||||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
|
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings and one for the output of each
|
||||||
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
layer) of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||||
|
|
||||||
Hidden-states of the generator encoder at the output of each layer plus the initial embedding outputs.
|
Hidden states of the generator encoder at the output of each layer plus the initial embedding outputs.
|
||||||
generator_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
|
generator_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
|
||||||
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
|
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
|
||||||
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
|
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
|
||||||
|
|
||||||
Attentions weights of the generator encoder, after the attention softmax, used to compute the weighted average in the
|
Attentions weights of the generator encoder, after the attention softmax, used to compute the weighted
|
||||||
self-attention heads.
|
average in the self-attention heads.
|
||||||
generator_dec_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
generator_dec_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
||||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
|
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings and one for the output of each
|
||||||
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
layer) of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||||
|
|
||||||
Hidden-states of the generator decoder at the output of each layer plus the initial embedding outputs.
|
Hidden states of the generator decoder at the output of each layer plus the initial embedding outputs.
|
||||||
generator_dec_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
|
generator_dec_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
|
||||||
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
|
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
|
||||||
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
|
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
|
||||||
|
|
||||||
Attentions weights of the generator decoder, after the attention softmax, used to compute the weighted average in the
|
Attentions weights of the generator decoder, after the attention softmax, used to compute the weighted
|
||||||
self-attention heads.
|
average in the self-attention heads.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
loss: Optional[torch.FloatTensor] = None
|
loss: Optional[torch.FloatTensor] = None
|
||||||
@@ -133,14 +130,14 @@ class RetrievAugLMOutput(ModelOutput):
|
|||||||
Prediction scores of the language modeling head.
|
Prediction scores of the language modeling head.
|
||||||
The score is possibly marginalized over all documents for each vocabulary token.
|
The score is possibly marginalized over all documents for each vocabulary token.
|
||||||
doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`):
|
doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`):
|
||||||
Score between each retrieved document embeddigs (see :obj:`retrieved_doc_embeds`) and :obj:`question_encoder_last_hidden_state`.
|
Score between each retrieved document embeddings (see :obj:`retrieved_doc_embeds`) and
|
||||||
|
:obj:`question_encoder_last_hidden_state`.
|
||||||
past_key_values (:obj:`List[torch.FloatTensor]`, `optional`, returned when ``use_cache=True`` is passed or when ``config.use_cache=True``):
|
past_key_values (:obj:`List[torch.FloatTensor]`, `optional`, returned when ``use_cache=True`` is passed or when ``config.use_cache=True``):
|
||||||
List of :obj:`torch.FloatTensor` of length :obj:`config.n_layers`,
|
List of :obj:`torch.FloatTensor` of length :obj:`config.n_layers`, with each tensor of shape
|
||||||
with each tensor of shape
|
|
||||||
:obj:`(2, batch_size, num_heads, sequence_length, embed_size_per_head)`).
|
:obj:`(2, batch_size, num_heads, sequence_length, embed_size_per_head)`).
|
||||||
Contains pre-computed hidden-states (key and values in the attention blocks)
|
|
||||||
of the decoder that can be used (see ``past_key_values`` input) to
|
Contains precomputed hidden-states (key and values in the attention blocks) of the decoder that can be used
|
||||||
speed up sequential decoding.
|
(see ``past_key_values`` input) to speed up sequential decoding.
|
||||||
retrieved_doc_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs, hidden_size)`, `optional`, returned when `output_retrieved=True`):
|
retrieved_doc_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs, hidden_size)`, `optional`, returned when `output_retrieved=True`):
|
||||||
Embedded documents retrieved by the retriever.
|
Embedded documents retrieved by the retriever.
|
||||||
Is used with ``question_encoder_last_hidden_state`` to compute the ``doc_scores``.
|
Is used with ``question_encoder_last_hidden_state`` to compute the ``doc_scores``.
|
||||||
@@ -150,48 +147,46 @@ class RetrievAugLMOutput(ModelOutput):
|
|||||||
Input ids post-processed from the retrieved documents
|
Input ids post-processed from the retrieved documents
|
||||||
and the question encoder input_ids by the retriever.
|
and the question encoder input_ids by the retriever.
|
||||||
context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
||||||
Attention mask post-processed from the retrieved
|
Attention mask post-processed from the retrieved documents and the question encoder :obj:`input_ids` by the
|
||||||
documents and the question encoder input_ids by the retriever.
|
retriever.
|
||||||
question_encoder_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
|
question_encoder_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
|
||||||
Sequence of hidden-states at the output of the last layer
|
Sequence of hidden states at the output of the last layer of the question encoder pooled output of the
|
||||||
of the question encoder pooled output of the model.
|
model.
|
||||||
question_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
question_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
||||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
|
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings and one for the output of each
|
||||||
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
layer) of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||||
|
|
||||||
Hidden-states of the question encoder at the output of each
|
Hidden states of the question encoder at the output of each layer plus the initial embedding outputs.
|
||||||
layer plus the initial embedding outputs.
|
|
||||||
question_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
|
question_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
|
||||||
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
|
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
|
||||||
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
|
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
|
||||||
|
|
||||||
Attentions weights of the question encoder, after the attention softmax, used to compute the weighted average in the
|
Attentions weights of the question encoder, after the attention softmax, used to compute the weighted
|
||||||
self-attention heads.
|
average in the self-attention heads.
|
||||||
generator_enc_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
|
generator_enc_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
|
||||||
Sequence of hidden-states at the output of the last layer of the generator encoder of the model.
|
Sequence of hidden-states at the output of the last layer of the generator encoder of the model.
|
||||||
generator_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
generator_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
||||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
|
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings and one for the output of each
|
||||||
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
layer) of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||||
|
|
||||||
Hidden-states of the generator encoder at the output
|
Hidden states of the generator encoder at the output of each layer plus the initial embedding outputs.
|
||||||
of each layer plus the initial embedding outputs.
|
|
||||||
generator_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
|
generator_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
|
||||||
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
|
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
|
||||||
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
|
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
|
||||||
|
|
||||||
Attentions weights of the generator encoder, after the attention softmax, used to compute the weighted average in the
|
Attentions weights of the generator encoder, after the attention softmax, used to compute the weighted
|
||||||
self-attention heads.
|
average in the self-attention heads.
|
||||||
generator_dec_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
generator_dec_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
||||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
|
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings and one for the output of each
|
||||||
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
layer) of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||||
|
|
||||||
Hidden-states of the generator decoder at the output of each layer plus the initial embedding outputs.
|
Hidden states of the generator decoder at the output of each layer plus the initial embedding outputs.
|
||||||
generator_dec_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
|
generator_dec_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
|
||||||
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
|
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
|
||||||
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
|
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
|
||||||
|
|
||||||
Attentions weights of the generator decoder, after the attention softmax, used to compute the weighted average in the
|
Attentions weights of the generator decoder, after the attention softmax, used to compute the weighted
|
||||||
self-attention heads.
|
average in the self-attention heads.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
logits: torch.FloatTensor = None
|
logits: torch.FloatTensor = None
|
||||||
@@ -213,10 +208,11 @@ class RetrievAugLMOutput(ModelOutput):
|
|||||||
|
|
||||||
class RagPreTrainedModel(PreTrainedModel):
|
class RagPreTrainedModel(PreTrainedModel):
|
||||||
r"""
|
r"""
|
||||||
RAG models were released with the paper `Retrieval-Augmented Generation for
|
RAG models were released with the paper `Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
|
||||||
Knowledge-Intensive NLP Tasks <https://arxiv.org/abs/2005.11401>`_ by Patrick Lewis, Ethan Perez, Aleksandra Piktus et al.
|
<https://arxiv.org/abs/2005.11401>`_ by Patrick Lewis, Ethan Perez, Aleksandra Piktus et al.
|
||||||
|
|
||||||
RAG is a retriever augmented model and encapsulate three components: a question encoder, a dataset retriever and a generator, the encoder and generator are trainable while the retriever is just an indexed dataset.
|
RAG is a retriever augmented model and encapsulate three components: a question encoder, a dataset retriever and a
|
||||||
|
generator, the encoder and generator are trainable while the retriever is just an indexed dataset.
|
||||||
|
|
||||||
"""
|
"""
|
||||||
config_class = RagConfig
|
config_class = RagConfig
|
||||||
@@ -232,40 +228,56 @@ class RagPreTrainedModel(PreTrainedModel):
|
|||||||
*model_args,
|
*model_args,
|
||||||
**kwargs
|
**kwargs
|
||||||
) -> PreTrainedModel:
|
) -> PreTrainedModel:
|
||||||
r"""Instantiates an question_encoder and a generator from one or two base classes of the library from pre-trained model checkpoints.
|
r"""
|
||||||
|
Instantiates an question encoder and a generator from one or two base classes of the library from pretrained
|
||||||
|
model checkpoints.
|
||||||
|
|
||||||
|
The model is set in evaluation mode by default using :obj:`model.eval()` (Dropout modules are deactivated).
|
||||||
The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated).
|
To train the model, you need to first set it back in training mode with :obj:`model.train()`.
|
||||||
To train the model, you need to first set it back in training mode with `model.train()`.
|
|
||||||
|
|
||||||
Params:
|
Params:
|
||||||
question_encoder_pretrained_model_name_or_path (:obj: `str`, `optional`, defaults to `None`):
|
question_encoder_pretrained_model_name_or_path (:obj: `str`, `optional`, defaults to `None`):
|
||||||
information necessary to initiate the question_encoder. Either:
|
Information necessary to initiate the question encoder. Can be either:
|
||||||
|
|
||||||
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
- A string with the `shortcut name` of a pretrained model to load from cache or download, e.g.,
|
||||||
- a string with the `identifier name` of a pre-trained model that was user-uploaded to our S3, e.g.: ``dbmdz/bert-base-german-cased``.
|
``bert-base-uncased``.
|
||||||
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/question_encoder``.
|
- A string with the `identifier name` of a pretrained model that was user-uploaded to our S3, e.g.,
|
||||||
- a path or url to a `tensorflow index checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In this case, ``from_tf`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
|
``dbmdz/bert-base-german-cased``.
|
||||||
|
- A path to a `directory` containing model weights saved using
|
||||||
|
:func:`~transformers.PreTrainedModel.save_pretrained`, e.g., ``./my_model_directory/``.
|
||||||
|
- A path or url to a `tensorflow index checkpoint file` (e.g, ``./tf_model/model.ckpt.index``). In
|
||||||
|
this case, ``from_tf`` should be set to :obj:`True` and a configuration object should be provided
|
||||||
|
as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in
|
||||||
|
a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
|
||||||
|
|
||||||
generator_pretrained_model_name_or_path (:obj: `str`, `optional`, defaults to `None`):
|
generator_pretrained_model_name_or_path (:obj: `str`, `optional`, defaults to `None`):
|
||||||
information necessary to initiate the generator. Either:
|
Information necessary to initiate the generator. Can be either:
|
||||||
|
|
||||||
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
- A string with the `shortcut name` of a pretrained model to load from cache or download, e.g.,
|
||||||
- a string with the `identifier name` of a pre-trained model that was user-uploaded to our S3, e.g.: ``dbmdz/bert-base-german-cased``.
|
``bert-base-uncased``.
|
||||||
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/generator``.
|
- A string with the `identifier name` of a pretrained model that was user-uploaded to our S3, e.g.,
|
||||||
- a path or url to a `tensorflow index checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In this case, ``from_tf`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
|
``dbmdz/bert-base-german-cased``.
|
||||||
|
- A path to a `directory` containing model weights saved using
|
||||||
|
:func:`~transformers.PreTrainedModel.save_pretrained`, e.g., ``./my_model_directory/``.
|
||||||
|
- A path or url to a `tensorflow index checkpoint file` (e.g, ``./tf_model/model.ckpt.index``). In
|
||||||
|
this case, ``from_tf`` should be set to :obj:`True` and a configuration object should be provided
|
||||||
|
as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in
|
||||||
|
a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
|
||||||
|
|
||||||
model_args: (`optional`) Sequence of positional arguments:
|
model_args (remaining positional arguments, `optional`):
|
||||||
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
All remaning positional arguments will be passed to the underlying model's ``__init__`` method.
|
||||||
|
retriever (:class:`~transformers.RagRetriever`, `optional`):
|
||||||
|
The retriever to use.
|
||||||
|
kwwargs (remaining dictionary of keyword arguments, `optional`):
|
||||||
|
Can be used to update the configuration object (after it being loaded) and initiate the model
|
||||||
|
(e.g., ``output_attentions=True``).
|
||||||
|
|
||||||
retriever: (`optional`, ``RagRetriever``) An instance of a :class:`~transformers.RagRetriever` to use as a retriever.
|
- To update the question_encoder configuration, use the prefix `question_encoder_` for each
|
||||||
|
configuration parameter.
|
||||||
|
- To update the generator configuration, use the prefix `generator_` for each configuration parameter.
|
||||||
|
- To update the parent model configuration, do not use a prefix for each configuration parameter.
|
||||||
|
|
||||||
kwargs: (`optional`) Remaining dictionary of keyword arguments.
|
Behaves differently depending on whether a :obj:`config` is provided or automatically loaded.
|
||||||
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attentions=True``).
|
|
||||||
- To update the question_encoder configuration, use the prefix `question_encoder_` for each configuration parameter
|
|
||||||
- To update the generator configuration, use the prefix `generator_` for each configuration parameter
|
|
||||||
- To update the parent model configuration, do not use a prefix for each configuration parameter
|
|
||||||
Behave differently depending on whether a :obj:`config` is provided or automatically loaded.
|
|
||||||
|
|
||||||
Example::
|
Example::
|
||||||
|
|
||||||
@@ -345,23 +357,33 @@ class RagPreTrainedModel(PreTrainedModel):
|
|||||||
|
|
||||||
|
|
||||||
RAG_START_DOCSTRING = r"""
|
RAG_START_DOCSTRING = r"""
|
||||||
|
|
||||||
RAG is a seq2seq model which encapsulates two core components: a question encoder and a generator.
|
RAG is a seq2seq model which encapsulates two core components: a question encoder and a generator.
|
||||||
During a forward pass, we encode the input with the question encoder and pass it
|
During a forward pass, we encode the input with the question encoder and pass it
|
||||||
to the retriever to extract relevant context documents. The documents are then prepended to the input.
|
to the retriever to extract relevant context documents. The documents are then prepended to the input.
|
||||||
Such contextualized inputs is passed to the generator.
|
Such contextualized inputs is passed to the generator.
|
||||||
|
|
||||||
The question encoder can be any `autoencoding` model, preferably :obj:`~transformers.DPRQuestionEncoder`, and the generator can be any `seq2seq` model, preferably :obj:`~transformers.BartForConditionalGeneration`.
|
The question encoder can be any `autoencoding` model, preferably :class:`~transformers.DPRQuestionEncoder`, and the
|
||||||
|
generator can be any `seq2seq` model, preferably :class:`~transformers.BartForConditionalGeneration`.
|
||||||
|
|
||||||
The model can be initialized with a :obj:`~transformers.RagRetriever` for end-to-end generation or used in combination with the outputs of a retriever in multiple steps - see examples for more details.
|
The model can be initialized with a :class:`~transformers.RagRetriever` for end-to-end generation or used in
|
||||||
The model is compatible any `autoencoding` model as the ``question_encoder`` and any `seq2seq` model with language model head as the ``generator``.
|
combination with the outputs of a retriever in multiple steps---see examples for more details.
|
||||||
The model has been tested with :class:`~transformers.DPRQuestionEncoder` as the ``question_encoder`` and :class:`~transformers.BartForConditionalGeneration` or :class:`~transformers.T5ForConditionalGeneration` as the ``generator``.
|
The model is compatible any `autoencoding` model as the ``question_encoder`` and any `seq2seq` model with language
|
||||||
|
model head as the ``generator``. It has been tested with :class:`~transformers.DPRQuestionEncoder` as the
|
||||||
|
``question_encoder`` and :class:`~transformers.BartForConditionalGeneration` or
|
||||||
|
:class:`~transformers.T5ForConditionalGeneration` as the ``generator``.
|
||||||
|
|
||||||
This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`_ sub-class.
|
This model inherits from :class:`~transformers.PreTrainedModel`. Check the superclass documentation for the generic
|
||||||
|
methods the library implements for all its model (such as downloading or saving, resizing the input embeddings,
|
||||||
|
pruning heads etc.)
|
||||||
|
|
||||||
|
This model is also a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__ subclass.
|
||||||
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general
|
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general
|
||||||
usage and behavior.
|
usage and behavior.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
config (:class:`~transformers.RagConfig`): Model configuration class with all the parameters of the model.
|
config (:class:`~transformers.RagConfig`):
|
||||||
|
Model configuration class with all the parameters of the model.
|
||||||
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
||||||
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
||||||
question_encoder (:class:`transformers.PreTrainedModel`):
|
question_encoder (:class:`transformers.PreTrainedModel`):
|
||||||
@@ -377,44 +399,65 @@ RAG_FORWARD_INPUTS_DOCSTRING = r"""
|
|||||||
Args:
|
Args:
|
||||||
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`):
|
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`):
|
||||||
Indices of input sequence tokens in the vocabulary.
|
Indices of input sequence tokens in the vocabulary.
|
||||||
:class:`~transformers.RagConfig`, used to initialize the model, specifies which generator to use, it also specifies a compatible
|
:class:`~transformers.RagConfig`, used to initialize the model, specifies which generator to use, it also
|
||||||
generator tokenizer. Use that tokenizer class to obtain the indices.
|
specifies a compatible generator tokenizer. Use that tokenizer class to obtain the indices.
|
||||||
attention_mask (:obj:`torch.Tensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
|
attention_mask (:obj:`torch.Tensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
|
||||||
Mask to avoid performing attention on padding token indices in input_ids.
|
Mask to avoid performing attention on padding token indices.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
|
|
||||||
|
- 1 for tokens that are **not masked**,
|
||||||
|
- 0 for tokens that are **maked**.
|
||||||
|
|
||||||
|
`What are attention masks? <../glossary.html#attention-mask>`__
|
||||||
encoder_outputs (:obj:`tuple(tuple(torch.FloatTensor)`, `optional`)
|
encoder_outputs (:obj:`tuple(tuple(torch.FloatTensor)`, `optional`)
|
||||||
Tuple consists of (:obj:`last_hidden_state`, `optional`: :obj:`hidden_states`, `optional`: :obj:`attentions`)
|
Tuple consists of (:obj:`generator_enc_last_hidden_state`, `optional`: :obj:`generator_enc_hidden_states`,
|
||||||
`last_hidden_state` of shape :obj:`(batch_size, n_docs * sequence_length, hidden_size)` is a sequence of hidden-states at the output of the last layer of the encoder.
|
`optional`: :obj:`generator_enc_attentions`). :obj:`generator_enc_last_hidden_state` of shape
|
||||||
`doc_scores` of shape :obj:`(batch_size, n_docs)` store retrieval scores of documents retrieved for each input in the batch.
|
:obj:`(batch_size, n_docs * sequence_length, hidden_size)` is a sequence of hidden-states at the output of
|
||||||
Used by the (:class:`~transformers.RagTokenForGeneration`) model during decoding.
|
the last layer of the generator's encoder.
|
||||||
decoder_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, target_sequence_length)`, `optional`, defaults to :obj:`None`):
|
|
||||||
Provide for generation tasks. `None` by default, constuct as per instructions for the generator model you're using with your RAG instance.
|
Used by the (:class:`~transformers.RagModel`) model during decoding.
|
||||||
Provide for generation tasks. `None` by default, constuct as per instructions for the generator model you're using with your RAG instance.
|
decoder_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, target_sequence_length)`, `optional`):
|
||||||
|
Provide for generation tasks. `None` by default, constuct as per instructions for the generator model
|
||||||
|
you're using with your RAG instance.
|
||||||
decoder_attention_mask (:obj:`torch.BoolTensor` of shape :obj:`(batch_size, target_sequence_length)`, `optional`):
|
decoder_attention_mask (:obj:`torch.BoolTensor` of shape :obj:`(batch_size, target_sequence_length)`, `optional`):
|
||||||
Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.
|
Default behavior: generate a tensor that ignores pad tokens in :obj:`decoder_input_ids`. Causal mask will
|
||||||
|
also be used by default.
|
||||||
past_key_values (:obj:`tuple(tuple(torch.FloatTensor))`):
|
past_key_values (:obj:`tuple(tuple(torch.FloatTensor))`):
|
||||||
Tuple consists of two elements: :obj:`encoder_outputs` of the RAG model (see :obj:`encoder_outputs`) and :obj:`past_key_values` of the underlying generator.
|
Tuple consists of two elements: :obj:`encoder_outputs` of the RAG model (see :obj:`encoder_outputs`) and
|
||||||
Can be used to speed up decoding. :obj:`past_key_values` are used in the (:class:`~transformers.RagTokenForGeneration`)
|
:obj:`past_key_values` of the underlying generator.
|
||||||
model during decoding.
|
Can be used to speed up decoding. :obj:`past_key_values` are used in the
|
||||||
|
(:class:`~transformers.RagTokenForGeneration`) model during decoding.
|
||||||
doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`):
|
doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`):
|
||||||
Score between each retrieved document embeddigs (see :obj:`retrieved_doc_embeds`) and :obj:`question_encoder_last_hidden_state`.
|
Score between each retrieved document embeddigs (see :obj:`retrieved_doc_embeds`) and
|
||||||
If the model has is not initialized with a ``retriever`` :obj:`doc_scores` has to be provided to the forward pass. :obj:`doc_scores` can be computed via :obj:`question_encoder_last_hidden_state` and :obj:`retrieved_doc_embeds`, see examples for more information.
|
:obj:`question_encoder_last_hidden_state`.
|
||||||
|
If the model has is not initialized with a ``retriever`` :obj:`doc_scores` has to be provided to the
|
||||||
|
forward pass. :obj:`doc_scores` can be computed via :obj:`question_encoder_last_hidden_state` and
|
||||||
|
:obj:`retrieved_doc_embeds`, see examples for more information.
|
||||||
context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
||||||
Input ids post-processed from the retrieved documents and the question encoder input_ids by the retriever.
|
Input IDs post-processed from the retrieved documents and the question encoder :obj:`input_ids` by the
|
||||||
If the model has is not initialized with a ``retriever`` :obj:`context_input_ids` has to be provided to the forward pass. :obj:`context_input_ids` are returned by :meth:`~transformers.RagRetriever.__call__`
|
retriever.
|
||||||
|
|
||||||
|
If the model has is not initialized with a ``retriever`` :obj:`context_input_ids` has to be provided to the
|
||||||
|
forward pass. :obj:`context_input_ids` are returned by :meth:`~transformers.RagRetriever.__call__`.
|
||||||
context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
||||||
Attention mask post-processed from the retrieved documents and the question encoder input_ids by the retriever.
|
Attention mask post-processed from the retrieved documents and the question encoder :obj:`input_ids` by the
|
||||||
If the model has is not initialized with a ``retriever`` :obj:`context_attention_mask` has to be provided to the forward pass. :obj:`context_attention_mask` are returned by :meth:`~transformers.RagRetriever.__call__`
|
retriever.
|
||||||
|
|
||||||
|
If the model has is not initialized with a ``retriever`` :obj:`context_attention_mask` has to be provided
|
||||||
|
to the forward pass. :obj:`context_attention_mask` are returned by
|
||||||
|
:meth:`~transformers.RagRetriever.__call__`.
|
||||||
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
||||||
If `use_cache` is True, ``past_key_values`` are returned and can be used to speed up decoding (see
|
If set to :obj:`True`, ``past_key_values`` key value states are returned and can be used to speed up
|
||||||
``past_key_values``).
|
decoding (see ``past_key_values``).
|
||||||
output_attentions (:obj:`bool`, `optional`):
|
output_attentions (:obj:`bool`, `optional`):
|
||||||
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
|
Whether or not to return the attentions tensors of all attention layers. See ``attentions`` under returned
|
||||||
|
tensors for more detail.
|
||||||
output_hidden_states (:obj:`bool`, `optional`):
|
output_hidden_states (:obj:`bool`, `optional`):
|
||||||
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
|
Whether or not to return the hidden states of all layers. See ``hidden_states`` under returned tensors for
|
||||||
|
more detail.
|
||||||
output_retrieved(:obj:`bool`, `optional`):
|
output_retrieved(:obj:`bool`, `optional`):
|
||||||
If set to ``True``, :obj:`retrieved_doc_embeds`, :obj:`retrieved_doc_ids`, :obj:`context_input_ids` and :obj:`context_attention_mask` are returned. See returned tensors for more detail.
|
Whether or not to return the :obj:`retrieved_doc_embeds`, :obj:`retrieved_doc_ids`,
|
||||||
|
:obj:`context_input_ids` and :obj:`context_attention_mask`. See returned tensors for more detail.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
|
||||||
@@ -662,15 +705,15 @@ class RagSequenceForGeneration(RagPreTrainedModel):
|
|||||||
**kwargs # needs kwargs for generation
|
**kwargs # needs kwargs for generation
|
||||||
):
|
):
|
||||||
r"""
|
r"""
|
||||||
exclude_bos_score (:obj:`bool`, `optional`):
|
exclude_bos_score (:obj:`bool`, `optional`):
|
||||||
Only relevant if ``labels`` is passed.
|
Only relevant if ``labels`` is passed.
|
||||||
If :obj:`True`, the score of the BOS token is disregarded when computing
|
If :obj:`True`, the score of the BOS token is disregarded when computing
|
||||||
the loss.
|
the loss.
|
||||||
reduce_loss (:obj:`bool`, `optional`):
|
reduce_loss (:obj:`bool`, `optional`):
|
||||||
Only relevant if ``labels`` is passed.
|
Only relevant if ``labels`` is passed.
|
||||||
If :obj:`True`, the NLL loss is reduced using the ``torch.Tensor.sum`` operation.
|
If :obj:`True`, the NLL loss is reduced using the ``torch.Tensor.sum`` operation.
|
||||||
kwargs (:obj:`Dict[str, any]`, optional, defaults to `{}`):
|
kwargs (:obj:`Dict[str, any]`, optional, defaults to `{}`):
|
||||||
Legacy dictionary, which is required so that model can use `generate()` function.
|
Legacy dictionary, which is required so that model can use `generate()` function.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
|
|
||||||
@@ -780,28 +823,31 @@ class RagSequenceForGeneration(RagPreTrainedModel):
|
|||||||
):
|
):
|
||||||
"""
|
"""
|
||||||
Implements RAG sequence "thorough" decoding.
|
Implements RAG sequence "thorough" decoding.
|
||||||
Read the :meth:`~transformers.PreTrainedModel.generate`` documentation for more information on how to set other generate input parameters.
|
Read the :meth:`~transformers.PreTrainedModel.generate`` documentation for more information on how to set other
|
||||||
|
generate input parameters.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
|
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
|
||||||
The sequence used as a prompt for the generation. If :obj:`input_ids` is not passed, then :obj:`context_input_ids` has to be provided.
|
The sequence used as a prompt for the generation. If :obj:`input_ids` is not passed, then
|
||||||
|
:obj:`context_input_ids` has to be provided.
|
||||||
context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
||||||
Input ids post-processed from the retrieved documents and the question encoder input_ids by the retriever.
|
Input IDs post-processed from the retrieved documents and the question encoder input_ids by the
|
||||||
|
retriever.
|
||||||
do_deduplication (:obj:`bool`, `optional`):
|
do_deduplication (:obj:`bool`, `optional`):
|
||||||
Controls whether we want to deduplicate the generations from different context documents for a given input.
|
Whether or not to deduplicate the generations from different context documents for a given input.
|
||||||
Has to be set to :obj:`False` if used while training with distributed backend.
|
Has to be set to :obj:`False` if used while training with distributed backend.
|
||||||
num_return_sequences(:obj:`int`, `optional`, defaults to 1):
|
num_return_sequences(:obj:`int`, `optional`, defaults to 1):
|
||||||
The number of independently computed returned sequences for each element in the batch. Note that this is not the value
|
The number of independently computed returned sequences for each element in the batch. Note that this
|
||||||
we pass to the ``generator``'s `:func:`~transformers.PreTrainedModel.generate`` function, where we set ``num_return_sequences``
|
is not the value we pass to the ``generator``'s `:func:`~transformers.PreTrainedModel.generate``
|
||||||
to `num_beams`.
|
function, where we set ``num_return_sequences`` to :obj:`num_beams`.
|
||||||
num_beams (:obj:`int`, `optional`, defaults to 1):
|
num_beams (:obj:`int`, `optional`, defaults to 1):
|
||||||
Number of beams for beam search. 1 means no beam search.
|
Number of beams for beam search. 1 means no beam search.
|
||||||
kwargs:
|
kwargs:
|
||||||
Additional kwargs will be passed to :meth:`~transformers.PreTrainedModel.generate``.
|
Additional kwargs will be passed to :meth:`~transformers.PreTrainedModel.generate`.
|
||||||
Return:
|
|
||||||
|
|
||||||
|
Return:
|
||||||
:obj:`torch.LongTensor` of shape :obj:`(batch_size * num_return_sequences, sequence_length)`:
|
:obj:`torch.LongTensor` of shape :obj:`(batch_size * num_return_sequences, sequence_length)`:
|
||||||
The generated sequences. The second dimension (sequence_length) is either equal to :obj:`max_length` or
|
The generated sequences. The second dimension (sequence length) is either equal to :obj:`max_length` or
|
||||||
shorter if all batches finished early due to the :obj:`eos_token_id`.
|
shorter if all batches finished early due to the :obj:`eos_token_id`.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
@@ -1033,14 +1079,15 @@ class RagTokenForGeneration(RagPreTrainedModel):
|
|||||||
**kwargs # needs kwargs for generation
|
**kwargs # needs kwargs for generation
|
||||||
):
|
):
|
||||||
r"""
|
r"""
|
||||||
do_marginalize (:obj:`bool`, `optional`):
|
do_marginalize (:obj:`bool`, `optional`):
|
||||||
If :obj:`True`, the logits are marginalized over all documents
|
If :obj:`True`, the logits are marginalized over all documents
|
||||||
by making use of ``torch.nn.functional.log_softmax``.
|
by making use of ``torch.nn.functional.log_softmax``.
|
||||||
reduce_loss (:obj:`bool`, `optional`):
|
reduce_loss (:obj:`bool`, `optional`):
|
||||||
Only relevant if ``labels`` is passed.
|
Only relevant if ``labels`` is passed.
|
||||||
If :obj:`True`, the NLL loss is reduced using the ``torch.Tensor.sum`` operation.
|
If :obj:`True`, the NLL loss is reduced using the ``torch.Tensor.sum`` operation.
|
||||||
kwargs (:obj:`Dict[str, any]`, optional, defaults to `{}`):
|
kwargs (:obj:`Dict[str, any]`, optional, defaults to `{}`):
|
||||||
Legacy dictionary, which is required so that model can use `generate()` function.
|
Legacy dictionary, which is required so that model can use `generate()` function.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
|
|
||||||
Example::
|
Example::
|
||||||
@@ -1156,23 +1203,35 @@ class RagTokenForGeneration(RagPreTrainedModel):
|
|||||||
|
|
||||||
Args:
|
Args:
|
||||||
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
|
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
|
||||||
The sequence used as a prompt for the generation. If :obj:`input_ids` is not passed, then :obj:`context_input_ids` has to be provided.
|
The sequence used as a prompt for the generation. If :obj:`input_ids` is not passed, then
|
||||||
|
:obj:`context_input_ids` has to be provided.
|
||||||
context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
||||||
Input ids post-processed from the retrieved documents and the question encoder input_ids by the retriever.
|
Input IDs post-processed from the retrieved documents and the question encoder :obj:`input_ids` by the
|
||||||
If the model has is not initialized with a ``retriever`` :obj:`context_input_ids` has to be provided to the forward pass. :obj:`context_input_ids` are returned by :meth:`~transformers.RagRetriever.__call__`
|
retriever.
|
||||||
context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
|
||||||
Attention mask post-processed from the retrieved documents and the question encoder input_ids by the retriever.
|
|
||||||
If the model has is not initialized with a ``retriever`` :obj:`context_attention_mask` has to be provided to the forward pass. :obj:`context_attention_mask` are returned by :meth:`~transformers.RagRetriever.__call__`
|
|
||||||
doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`):
|
|
||||||
Score between each retrieved document embeddigs (see :obj:`retrieved_doc_embeds`) and :obj:`question_encoder_last_hidden_state`.
|
|
||||||
If the model has is not initialized with a ``retriever`` :obj:`doc_scores` has to be provided to the forward pass. :obj:`doc_scores` can be computed via :obj:`question_encoder_last_hidden_state` and :obj:`retrieved_doc_embeds`, see examples for more information.
|
|
||||||
|
|
||||||
|
If the model has is not initialized with a ``retriever``, :obj:`context_input_ids` has to be provided
|
||||||
|
to the forward pass. :obj:`context_input_ids` are returned by
|
||||||
|
:meth:`~transformers.RagRetriever.__call__`.
|
||||||
|
context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
|
||||||
|
Attention mask post-processed from the retrieved documents and the question encoder :obj:`input_ids` by
|
||||||
|
the retriever.
|
||||||
|
|
||||||
|
If the model has is not initialized with a ``retriever``, :obj:`context_input_ids` has to be provided
|
||||||
|
to the forward pass. :obj:`context_input_ids` are returned by
|
||||||
|
:meth:`~transformers.RagRetriever.__call__`.
|
||||||
|
doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`):
|
||||||
|
Score between each retrieved document embeddigs (see :obj:`retrieved_doc_embeds`) and
|
||||||
|
:obj:`question_encoder_last_hidden_state`.
|
||||||
|
|
||||||
|
If the model has is not initialized with a ``retriever``, :obj:`context_input_ids` has to be provided
|
||||||
|
to the forward pass. :obj:`context_input_ids` are returned by
|
||||||
|
:meth:`~transformers.RagRetriever.__call__`.
|
||||||
max_length (:obj:`int`, `optional`, defaults to 20):
|
max_length (:obj:`int`, `optional`, defaults to 20):
|
||||||
The maximum length of the sequence to be generated.
|
The maximum length of the sequence to be generated.
|
||||||
min_length (:obj:`int`, `optional`, defaults to 10):
|
min_length (:obj:`int`, `optional`, defaults to 10):
|
||||||
The minimum length of the sequence to be generated.
|
The minimum length of the sequence to be generated.
|
||||||
early_stopping (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
early_stopping (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||||
Whether to stop the beam search when at least ``num_beams`` sentences are finished per batch or not.
|
Whether or not to stop the beam search when at least ``num_beams`` sentences are finished per batch or not.
|
||||||
use_cache: (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
use_cache: (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
||||||
Whether or not the model should use the past last key/values attentions (if applicable to the model) to
|
Whether or not the model should use the past last key/values attentions (if applicable to the model) to
|
||||||
speed up decoding.
|
speed up decoding.
|
||||||
@@ -1195,14 +1254,13 @@ class RagTokenForGeneration(RagPreTrainedModel):
|
|||||||
num_beams (:obj:`int`, `optional`, defaults to 1):
|
num_beams (:obj:`int`, `optional`, defaults to 1):
|
||||||
Number of beams for beam search. 1 means no beam search.
|
Number of beams for beam search. 1 means no beam search.
|
||||||
num_return_sequences(:obj:`int`, `optional`, defaults to 1):
|
num_return_sequences(:obj:`int`, `optional`, defaults to 1):
|
||||||
The number of independently computed returned sequences for each element in the batch. Note that this is not the value
|
The number of independently computed returned sequences for each element in the batch. Note that this
|
||||||
we pass to the ``generator``'s `:func:`~transformers.PreTrainedModel.generate`` function, where we set ``num_return_sequences``
|
is not the value we pass to the ``generator``'s `:func:`~transformers.PreTrainedModel.generate`
|
||||||
to `num_beams`.
|
function, where we set ``num_return_sequences`` to :obj:`num_beams`.
|
||||||
decoder_start_token_id (:obj:`int`, `optional`):
|
decoder_start_token_id (:obj:`int`, `optional`):
|
||||||
If an encoder-decoder model starts decoding with a different token than `bos`, the id of that token.
|
If an encoder-decoder model starts decoding with a different token than `bos`, the id of that token.
|
||||||
|
|
||||||
Return:
|
Return:
|
||||||
|
|
||||||
:obj:`torch.LongTensor` of shape :obj:`(batch_size * num_return_sequences, sequence_length)`:
|
:obj:`torch.LongTensor` of shape :obj:`(batch_size * num_return_sequences, sequence_length)`:
|
||||||
The generated sequences. The second dimension (sequence_length) is either equal to :obj:`max_length` or
|
The generated sequences. The second dimension (sequence_length) is either equal to :obj:`max_length` or
|
||||||
shorter if all batches finished early due to the :obj:`eos_token_id`.
|
shorter if all batches finished early due to the :obj:`eos_token_id`.
|
||||||
|
|||||||
@@ -399,12 +399,14 @@ class RagRetriever:
|
|||||||
The number of docs retrieved per query.
|
The number of docs retrieved per query.
|
||||||
|
|
||||||
Return:
|
Return:
|
||||||
retrieved_doc_embeds (:obj:`np.ndarray` of shape :obj:`(batch_size, n_docs, dim)`
|
:obj:`Tuple[np.ndarray, np.ndarray, List[dict]]`:
|
||||||
The retrieval embeddings of the retrieved docs per query.
|
A tuple with the following objects:
|
||||||
doc_ids (:obj:`np.ndarray` of shape :obj:`batch_size, n_docs`)
|
|
||||||
The ids of the documents in the index
|
- **retrieved_doc_embeds** (:obj:`np.ndarray` of shape :obj:`(batch_size, n_docs, dim)`) -- The
|
||||||
doc_dicts (:obj:`List[dict]`):
|
retrieval embeddings of the retrieved docs per query.
|
||||||
The retrieved_doc_embeds examples per query.
|
- **doc_ids** (:obj:`np.ndarray` of shape :obj:`(batch_size, n_docs)`) -- The ids of the documents in the
|
||||||
|
index
|
||||||
|
- **doc_dicts** (:obj:`List[dict]`): The :obj:`retrieved_doc_embeds` examples per query.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
doc_ids, retrieved_doc_embeds = self._main_retrieve(question_hidden_states, n_docs)
|
doc_ids, retrieved_doc_embeds = self._main_retrieve(question_hidden_states, n_docs)
|
||||||
|
|||||||
@@ -17,7 +17,8 @@ import os
|
|||||||
from typing import List, Optional
|
from typing import List, Optional
|
||||||
|
|
||||||
from .configuration_rag import RagConfig
|
from .configuration_rag import RagConfig
|
||||||
from .tokenization_utils_base import BatchEncoding
|
from .file_utils import add_start_docstrings
|
||||||
|
from .tokenization_utils_base import PREPARE_SEQ2SEQ_BATCH_DOCSTRING, BatchEncoding
|
||||||
from .utils import logging
|
from .utils import logging
|
||||||
|
|
||||||
|
|
||||||
@@ -60,6 +61,7 @@ class RagTokenizer:
|
|||||||
def batch_decode(self, *args, **kwargs):
|
def batch_decode(self, *args, **kwargs):
|
||||||
return self.generator.batch_decode(*args, **kwargs)
|
return self.generator.batch_decode(*args, **kwargs)
|
||||||
|
|
||||||
|
@add_start_docstrings(PREPARE_SEQ2SEQ_BATCH_DOCSTRING)
|
||||||
def prepare_seq2seq_batch(
|
def prepare_seq2seq_batch(
|
||||||
self,
|
self,
|
||||||
src_texts: List[str],
|
src_texts: List[str],
|
||||||
@@ -71,66 +73,6 @@ class RagTokenizer:
|
|||||||
truncation=True,
|
truncation=True,
|
||||||
**kwargs,
|
**kwargs,
|
||||||
) -> BatchEncoding:
|
) -> BatchEncoding:
|
||||||
r"""
|
|
||||||
|
|
||||||
Prepare a batch that can be passed directly to an instance of :class:`~transformers.RagModel`.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
src_texts: (:obj:`List[str]`):
|
|
||||||
List of documents to summarize or source language texts.
|
|
||||||
tgt_texts: (:obj:`List[str]`, `optional`):
|
|
||||||
List of summaries or target language texts.
|
|
||||||
max_length (:obj:`int`, `optional`):
|
|
||||||
Controls the maximum length for encoder inputs (documents to summarize or source language texts).
|
|
||||||
If left unset or set to :obj:`None`, this will use the predefined model maximum length if a maximum
|
|
||||||
length is required by one of the truncation/padding parameters. If the model has no specific maximum
|
|
||||||
input length (like XLNet) truncation/padding to a maximum length will be deactivated.
|
|
||||||
max_target_length (:obj:`int`, `optional`):
|
|
||||||
Controls the maximum length of decoder inputs (target language texts or summaries).
|
|
||||||
If left unset or set to :obj:`None`, this will use the max_length value.
|
|
||||||
padding (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.PaddingStrategy`, `optional`, defaults to :obj:`False`):
|
|
||||||
Activates and controls padding. Accepts the following values:
|
|
||||||
|
|
||||||
* :obj:`True` or :obj:`'longest'`: Pad to the longest sequence in the batch (or no padding if only a
|
|
||||||
single sequence if provided).
|
|
||||||
* :obj:`'max_length'`: Pad to a maximum length specified with the argument :obj:`max_length` or to the
|
|
||||||
maximum acceptable input length for the model if that argument is not provided.
|
|
||||||
* :obj:`False` or :obj:`'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of
|
|
||||||
different lengths).
|
|
||||||
return_tensors (:obj:`str` or :class:`~transformers.tokenization_utils_base.TensorType`, `optional`, defaults to "pt"):
|
|
||||||
If set, will return tensors instead of list of python integers. Acceptable values are:
|
|
||||||
|
|
||||||
* :obj:`'tf'`: Return TensorFlow :obj:`tf.constant` objects.
|
|
||||||
* :obj:`'pt'`: Return PyTorch :obj:`torch.Tensor` objects.
|
|
||||||
* :obj:`'np'`: Return Numpy :obj:`np.ndarray` objects.
|
|
||||||
truncation (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.TruncationStrategy`, `optional`, defaults to :obj:`True`):
|
|
||||||
Activates and controls truncation. Accepts the following values:
|
|
||||||
|
|
||||||
* :obj:`True` or :obj:`'longest_first'`: Truncate to a maximum length specified with the argument
|
|
||||||
:obj:`max_length` or to the maximum acceptable input length for the model if that argument is not
|
|
||||||
provided. This will truncate token by token, removing a token from the longest sequence in the pair
|
|
||||||
if a pair of sequences (or a batch of pairs) is provided.
|
|
||||||
* :obj:`'only_first'`: Truncate to a maximum length specified with the argument :obj:`max_length` or to
|
|
||||||
the maximum acceptable input length for the model if that argument is not provided. This will only
|
|
||||||
truncate the first sequence of a pair if a pair of sequences (or a batch of pairs) is provided.
|
|
||||||
* :obj:`'only_second'`: Truncate to a maximum length specified with the argument :obj:`max_length` or
|
|
||||||
to the maximum acceptable input length for the model if that argument is not provided. This will only
|
|
||||||
truncate the second sequence of a pair if a pair of sequences (or a batch of pairs) is provided.
|
|
||||||
* :obj:`False` or :obj:`'do_not_truncate'` (default): No truncation (i.e., can output batch with
|
|
||||||
sequence lengths greater than the model maximum admissible input size).
|
|
||||||
**kwargs:
|
|
||||||
Additional keyword arguments passed along to :obj:`self.__call__`.
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
:class:`~transformers.BatchEncoding`: A :class:`~transformers.BatchEncoding` with the following fields:
|
|
||||||
|
|
||||||
- **input_ids** -- List of token ids to be fed to the encoder.
|
|
||||||
- **attention_mask** -- List of indices specifying which tokens should be attended to by the model.
|
|
||||||
- **labels** -- List of token ids for tgt_texts
|
|
||||||
|
|
||||||
The full set of keys ``[input_ids, attention_mask, labels]``,
|
|
||||||
will only be returned if tgt_texts is passed. Otherwise, input_ids, attention_mask will be the only keys.
|
|
||||||
"""
|
|
||||||
if max_length is None:
|
if max_length is None:
|
||||||
max_length = self.question_encoder.model_max_length
|
max_length = self.question_encoder.model_max_length
|
||||||
model_inputs: BatchEncoding = self.question_encoder(
|
model_inputs: BatchEncoding = self.question_encoder(
|
||||||
|
|||||||
@@ -31,10 +31,10 @@ XXX_PRETRAINED_CONFIG_ARCHIVE_MAP = {
|
|||||||
|
|
||||||
class XxxConfig(PretrainedConfig):
|
class XxxConfig(PretrainedConfig):
|
||||||
r"""
|
r"""
|
||||||
This is the configuration class to store the configuration of a :class:`~transformers.XXXModel`.
|
This is the configuration class to store the configuration of a :class:`~transformers.XxxModel` or a
|
||||||
It is used to instantiate a XXX model according to the specified arguments, defining the model
|
:class:`~transformers.TFXxxModel`. It is used to instantiate a XXX model according to the specified
|
||||||
architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of
|
arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar
|
||||||
the XXX `xxx-base-uncased <https://huggingface.co/xxx/xxx-base-uncased>`__ architecture.
|
configuration to that of the XXX `xxx-base-uncased <https://huggingface.co/xxx/xxx-base-uncased>`__ architecture.
|
||||||
|
|
||||||
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used
|
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used
|
||||||
to control the model outputs. Read the documentation from :class:`~transformers.PretrainedConfig`
|
to control the model outputs. Read the documentation from :class:`~transformers.PretrainedConfig`
|
||||||
@@ -42,33 +42,35 @@ class XxxConfig(PretrainedConfig):
|
|||||||
|
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
vocab_size (:obj:`int`, optional, defaults to 30522):
|
vocab_size (:obj:`int`, `optional`, defaults to 30522):
|
||||||
Vocabulary size of the XXX model. Defines the different tokens that
|
Vocabulary size of the XXX model. Defines the number of different tokens that can be represented by the
|
||||||
can be represented by the `inputs_ids` passed to the forward method of :class:`~transformers.XXXModel`.
|
:obj:`inputs_ids` passed when calling :class:`~transformers.XxxModel` or
|
||||||
hidden_size (:obj:`int`, optional, defaults to 768):
|
:class:`~transformers.TFXxxModel`.
|
||||||
|
hidden_size (:obj:`int`, `optional`, defaults to 768):
|
||||||
Dimensionality of the encoder layers and the pooler layer.
|
Dimensionality of the encoder layers and the pooler layer.
|
||||||
num_hidden_layers (:obj:`int`, optional, defaults to 12):
|
num_hidden_layers (:obj:`int`, `optional`, defaults to 12):
|
||||||
Number of hidden layers in the Transformer encoder.
|
Number of hidden layers in the Transformer encoder.
|
||||||
num_attention_heads (:obj:`int`, optional, defaults to 12):
|
num_attention_heads (:obj:`int`, `optional`, defaults to 12):
|
||||||
Number of attention heads for each attention layer in the Transformer encoder.
|
Number of attention heads for each attention layer in the Transformer encoder.
|
||||||
hidden_act (:obj:`str` or :obj:`function`, optional, defaults to :obj:`"gelu"`):
|
hidden_act (:obj:`str` or :obj:`Callable`, `optional`, defaults to :obj:`"gelu"`):
|
||||||
The non-linear activation function (function or string) in the encoder and pooler.
|
The non-linear activation function (function or string) in the encoder and pooler.
|
||||||
|
|
||||||
If string, :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
|
If string, :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
|
||||||
hidden_dropout_prob (:obj:`float`, optional, defaults to 0.1):
|
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
|
||||||
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
|
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
|
||||||
attention_probs_dropout_prob (:obj:`float`, optional, defaults to 0.1):
|
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
|
||||||
The dropout ratio for the attention probabilities.
|
The dropout ratio for the attention probabilities.
|
||||||
max_position_embeddings (:obj:`int`, optional, defaults to 512):
|
max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
|
||||||
The maximum sequence length that this model might ever be used with.
|
The maximum sequence length that this model might ever be used with.
|
||||||
Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
|
Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
|
||||||
type_vocab_size (:obj:`int`, optional, defaults to 2):
|
type_vocab_size (:obj:`int`, `optional`, defaults to 2):
|
||||||
The vocabulary size of the `token_type_ids` passed into :class:`~transformers.BertModel`.
|
The vocabulary size of the :obj:`token_type_ids` passed when calling :class:`~transformers.XxxModel` or
|
||||||
initializer_range (:obj:`float`, optional, defaults to 0.02):
|
:class:`~transformers.TFXxxModel`.
|
||||||
|
initializer_range (:obj:`float`, `optional`, defaults to 0.02):
|
||||||
The standard deviation of the :obj:`truncated_normal_initializer` for initializing all weight matrices.
|
The standard deviation of the :obj:`truncated_normal_initializer` for initializing all weight matrices.
|
||||||
layer_norm_eps (:obj:`float`, optional, defaults to 1e-5):
|
layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-5):
|
||||||
The epsilon used by the layer normalization layers.
|
The epsilon used by the layer normalization layers.
|
||||||
gradient_checkpointing (:obj:`bool`, optional, defaults to :obj:`False`):
|
gradient_checkpointing (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||||
If :obj:`True`, use gradient checkpointing to save memory at the expense of slower backward pass.
|
If :obj:`True`, use gradient checkpointing to save memory at the expense of slower backward pass.
|
||||||
kwargs:
|
kwargs:
|
||||||
Additional arguments for common configurations, passed to :class:`~transformers.PretrainedConfig`.
|
Additional arguments for common configurations, passed to :class:`~transformers.PretrainedConfig`.
|
||||||
|
|||||||
@@ -257,32 +257,37 @@ class TFXxxPreTrainedModel(TFPreTrainedModel):
|
|||||||
|
|
||||||
|
|
||||||
XXX_START_DOCSTRING = r"""
|
XXX_START_DOCSTRING = r"""
|
||||||
|
|
||||||
The XXX model was proposed in
|
The XXX model was proposed in
|
||||||
`XXX: Pre-training of Deep Bidirectional Transformers for Language Understanding
|
`XXX: Pre-training of Deep Bidirectional Transformers for Language Understanding
|
||||||
<https://arxiv.org/abs/1810.04805>`__ by....
|
<https://arxiv.org/abs/1810.04805>`__ by....
|
||||||
|
|
||||||
This model is a `tf.keras.Model <https://www.tensorflow.org/api_docs/python/tf/keras/Model>`__ sub-class.
|
This model inherits from :class:`~transformers.TFPreTrainedModel`. Check the superclass documentation for the
|
||||||
Use it as a regular TF 2.0 Keras Model and
|
generic methods the library implements for all its model (such as downloading or saving, resizing the input
|
||||||
refer to the TF 2.0 documentation for all matter related to general usage and behavior.
|
embeddings, pruning heads etc.)
|
||||||
|
|
||||||
|
This model is also a `tf.keras.Model <https://www.tensorflow.org/api_docs/python/tf/keras/Model>`__ subclass.
|
||||||
|
Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general
|
||||||
|
usage and behavior.
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
TF 2.0 models accepts two formats as inputs:
|
TF 2.0 models accepts two formats as inputs:
|
||||||
|
|
||||||
- having all inputs as keyword arguments (like PyTorch models), or
|
- having all inputs as keyword arguments (like PyTorch models), or
|
||||||
- having all inputs as a list, tuple or dict in the first positional arguments.
|
- having all inputs as a list, tuple or dict in the first positional arguments.
|
||||||
|
|
||||||
This second option is useful when using :obj:`tf.keras.Model.fit()` method which currently requires having
|
This second option is useful when using :meth:`tf.keras.Model.fit` method which currently requires having
|
||||||
all the tensors in the first argument of the model call function: :obj:`model(inputs)`.
|
all the tensors in the first argument of the model call function: :obj:`model(inputs)`.
|
||||||
|
|
||||||
If you choose this second option, there are three possibilities you can use to gather all the input Tensors
|
If you choose this second option, there are three possibilities you can use to gather all the input Tensors
|
||||||
in the first positional argument :
|
in the first positional argument :
|
||||||
|
|
||||||
- a single Tensor with input_ids only and nothing else: :obj:`model(inputs_ids)`
|
- a single Tensor with :obj:`input_ids` only and nothing else: :obj:`model(inputs_ids)`
|
||||||
- a list of varying length with one or several input Tensors IN THE ORDER given in the docstring:
|
- a list of varying length with one or several input Tensors IN THE ORDER given in the docstring:
|
||||||
:obj:`model([input_ids, attention_mask])` or :obj:`model([input_ids, attention_mask, token_type_ids])`
|
:obj:`model([input_ids, attention_mask])` or :obj:`model([input_ids, attention_mask, token_type_ids])`
|
||||||
- a dictionary with one or several input Tensors associated to the input names given in the docstring:
|
- a dictionary with one or several input Tensors associated to the input names given in the docstring:
|
||||||
:obj:`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
|
:obj:`model({"input_ids": input_ids, "token_type_ids": token_type_ids})`
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~transformers.XxxConfig`): Model configuration class with all the parameters of the model.
|
config (:class:`~transformers.XxxConfig`): Model configuration class with all the parameters of the model.
|
||||||
@@ -292,27 +297,31 @@ XXX_START_DOCSTRING = r"""
|
|||||||
|
|
||||||
XXX_INPUTS_DOCSTRING = r"""
|
XXX_INPUTS_DOCSTRING = r"""
|
||||||
Args:
|
Args:
|
||||||
input_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`{0}`):
|
input_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`({0})`):
|
||||||
Indices of input sequence tokens in the vocabulary.
|
Indices of input sequence tokens in the vocabulary.
|
||||||
|
|
||||||
Indices can be obtained using :class:`transformers.XxxTokenizer`.
|
Indices can be obtained using :class:`~transformers.BertTokenizer`.
|
||||||
See :func:`transformers.PreTrainedTokenizer.encode` and
|
See :func:`transformers.PreTrainedTokenizer.__call__` and
|
||||||
:func:`transformers.PreTrainedTokenizer.__call__` for details.
|
:func:`transformers.PreTrainedTokenizer.encode` for details.
|
||||||
|
|
||||||
`What are input IDs? <../glossary.html#input-ids>`__
|
`What are input IDs? <../glossary.html#input-ids>`__
|
||||||
attention_mask (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`{0}`, `optional`):
|
attention_mask (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`({0})`, `optional`):
|
||||||
Mask to avoid performing attention on padding token indices.
|
Mask to avoid performing attention on padding token indices.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
|
|
||||||
|
- 1 for tokens that are **not masked**,
|
||||||
|
- 0 for tokens that are **maked**.
|
||||||
|
|
||||||
`What are attention masks? <../glossary.html#attention-mask>`__
|
`What are attention masks? <../glossary.html#attention-mask>`__
|
||||||
token_type_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`{0}`, `optional`):
|
token_type_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`({0})`, `optional`):
|
||||||
Segment token indices to indicate first and second portions of the inputs.
|
Segment token indices to indicate first and second portions of the inputs.
|
||||||
Indices are selected in ``[0, 1]``: ``0`` corresponds to a `sentence A` token, ``1``
|
Indices are selected in ``[0, 1]``:
|
||||||
corresponds to a `sentence B` token
|
|
||||||
|
- 0 corresponds to a `sentence A` token,
|
||||||
|
- 1 corresponds to a `sentence B` token.
|
||||||
|
|
||||||
`What are token type IDs? <../glossary.html#token-type-ids>`__
|
`What are token type IDs? <../glossary.html#token-type-ids>`__
|
||||||
position_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`{0}`, `optional`):
|
position_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`({0})`, `optional`):
|
||||||
Indices of positions of each input sequence tokens in the position embeddings.
|
Indices of positions of each input sequence tokens in the position embeddings.
|
||||||
Selected in the range ``[0, config.max_position_embeddings - 1]``.
|
Selected in the range ``[0, config.max_position_embeddings - 1]``.
|
||||||
|
|
||||||
@@ -320,21 +329,25 @@ XXX_INPUTS_DOCSTRING = r"""
|
|||||||
head_mask (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`(num_heads,)` or :obj:`(num_layers, num_heads)`, `optional`):
|
head_mask (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`(num_heads,)` or :obj:`(num_layers, num_heads)`, `optional`):
|
||||||
Mask to nullify selected heads of the self-attention modules.
|
Mask to nullify selected heads of the self-attention modules.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
:obj:`1` indicates the head is **not masked**, :obj:`0` indicates the head is **masked**.
|
|
||||||
inputs_embeds (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, embedding_dim)`, `optional`):
|
- 1 indicates the head is **not masked**,
|
||||||
|
- 0 indicates the head is **masked**.
|
||||||
|
|
||||||
|
inputs_embeds (:obj:`tf.Tensor` of shape :obj:`({0}, hidden_size)`, `optional`):
|
||||||
Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation.
|
Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation.
|
||||||
This is useful if you want more control over how to convert `input_ids` indices into associated vectors
|
This is useful if you want more control over how to convert :obj:`input_ids` indices into associated
|
||||||
than the model's internal embedding lookup matrix.
|
vectors than the model's internal embedding lookup matrix.
|
||||||
training (:obj:`boolean`, `optional`, defaults to :obj:`False`):
|
|
||||||
Whether to activate dropout modules (if set to :obj:`True`) during training or to de-activate them
|
|
||||||
(if set to :obj:`False`) for evaluation.
|
|
||||||
output_attentions (:obj:`bool`, `optional`):
|
output_attentions (:obj:`bool`, `optional`):
|
||||||
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
|
Whether or not to return the attentions tensors of all attention layers. See ``attentions`` under returned
|
||||||
|
tensors for more detail.
|
||||||
output_hidden_states (:obj:`bool`, `optional`):
|
output_hidden_states (:obj:`bool`, `optional`):
|
||||||
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
|
Whether or not to return the hidden states of all layers. See ``hidden_states`` under returned tensors for
|
||||||
|
more detail.
|
||||||
return_dict (:obj:`bool`, `optional`):
|
return_dict (:obj:`bool`, `optional`):
|
||||||
If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a
|
Whether or not to return a :class:`~transformers.file_utils.ModelOutput` instead of a plain tuple.
|
||||||
plain tuple.
|
training (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||||
|
Whether or not to use the model in training mode (some modules like dropout modules have different
|
||||||
|
behaviors between training and evaluation).
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
|
||||||
@@ -347,7 +360,7 @@ class TFXxxModel(TFXxxPreTrainedModel):
|
|||||||
super().__init__(config, *inputs, **kwargs)
|
super().__init__(config, *inputs, **kwargs)
|
||||||
self.transformer = TFXxxMainLayer(config, name="transformer")
|
self.transformer = TFXxxMainLayer(config, name="transformer")
|
||||||
|
|
||||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
|
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||||
@add_code_sample_docstrings(
|
@add_code_sample_docstrings(
|
||||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||||
checkpoint="xxx-base-cased",
|
checkpoint="xxx-base-cased",
|
||||||
@@ -370,7 +383,7 @@ class TFXxxForMaskedLM(TFXxxPreTrainedModel, TFMaskedLanguageModelingLoss):
|
|||||||
self.transformer = TFXxxMainLayer(config, name="transformer")
|
self.transformer = TFXxxMainLayer(config, name="transformer")
|
||||||
self.mlm = TFXxxMLMHead(config, self.transformer.embeddings, name="mlm")
|
self.mlm = TFXxxMLMHead(config, self.transformer.embeddings, name="mlm")
|
||||||
|
|
||||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
|
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||||
@add_code_sample_docstrings(
|
@add_code_sample_docstrings(
|
||||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||||
checkpoint="xxx-base-cased",
|
checkpoint="xxx-base-cased",
|
||||||
@@ -452,7 +465,7 @@ class TFXxxForSequenceClassification(TFXxxPreTrainedModel, TFSequenceClassificat
|
|||||||
config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="classifier"
|
config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="classifier"
|
||||||
)
|
)
|
||||||
|
|
||||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING)
|
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||||
@add_code_sample_docstrings(
|
@add_code_sample_docstrings(
|
||||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||||
checkpoint="xxx-base-cased",
|
checkpoint="xxx-base-cased",
|
||||||
@@ -544,7 +557,7 @@ class TFXxxForMultipleChoice(TFXxxPreTrainedModel, TFMultipleChoiceLoss):
|
|||||||
"""
|
"""
|
||||||
return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS)}
|
return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS)}
|
||||||
|
|
||||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, num_choices, sequence_length)"))
|
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length"))
|
||||||
@add_code_sample_docstrings(
|
@add_code_sample_docstrings(
|
||||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||||
checkpoint="xxx-base-cased",
|
checkpoint="xxx-base-cased",
|
||||||
@@ -568,8 +581,8 @@ class TFXxxForMultipleChoice(TFXxxPreTrainedModel, TFMultipleChoiceLoss):
|
|||||||
r"""
|
r"""
|
||||||
labels (:obj:`tf.Tensor` of shape :obj:`(batch_size,)`, `optional`):
|
labels (:obj:`tf.Tensor` of shape :obj:`(batch_size,)`, `optional`):
|
||||||
Labels for computing the multiple choice classification loss.
|
Labels for computing the multiple choice classification loss.
|
||||||
Indices should be in ``[0, ..., num_choices]`` where `num_choices` is the size of the second dimension
|
Indices should be in ``[0, ..., num_choices]`` where :obj:`num_choices` is the size of the second dimension
|
||||||
of the input tensors. (see `input_ids` above)s after the attention softmax, used to compute the weighted average in the self-attention
|
of the input tensors. (See :obj:`input_ids` above)
|
||||||
heads.
|
heads.
|
||||||
"""
|
"""
|
||||||
if isinstance(inputs, (tuple, list)):
|
if isinstance(inputs, (tuple, list)):
|
||||||
@@ -667,7 +680,7 @@ class TFXxxForTokenClassification(TFXxxPreTrainedModel, TFTokenClassificationLos
|
|||||||
config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="classifier"
|
config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="classifier"
|
||||||
)
|
)
|
||||||
|
|
||||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING)
|
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||||
@add_code_sample_docstrings(
|
@add_code_sample_docstrings(
|
||||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||||
checkpoint="xxx-base-cased",
|
checkpoint="xxx-base-cased",
|
||||||
@@ -734,8 +747,8 @@ class TFXxxForTokenClassification(TFXxxPreTrainedModel, TFTokenClassificationLos
|
|||||||
|
|
||||||
|
|
||||||
@add_start_docstrings(
|
@add_start_docstrings(
|
||||||
"""XXX Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of
|
"""XXX Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear
|
||||||
the hidden-states output to compute `span start logits` and `span end logits`). """,
|
layer on top of the hidden-states output to compute `span start logits` and `span end logits`). """,
|
||||||
XXX_START_DOCSTRING,
|
XXX_START_DOCSTRING,
|
||||||
)
|
)
|
||||||
class TFXxxForQuestionAnswering(TFXxxPreTrainedModel, TFQuestionAnsweringLoss):
|
class TFXxxForQuestionAnswering(TFXxxPreTrainedModel, TFQuestionAnsweringLoss):
|
||||||
@@ -748,7 +761,7 @@ class TFXxxForQuestionAnswering(TFXxxPreTrainedModel, TFQuestionAnsweringLoss):
|
|||||||
config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="qa_outputs"
|
config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="qa_outputs"
|
||||||
)
|
)
|
||||||
|
|
||||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING)
|
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||||
@add_code_sample_docstrings(
|
@add_code_sample_docstrings(
|
||||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||||
checkpoint="xxx-base-cased",
|
checkpoint="xxx-base-cased",
|
||||||
@@ -773,11 +786,11 @@ class TFXxxForQuestionAnswering(TFXxxPreTrainedModel, TFQuestionAnsweringLoss):
|
|||||||
r"""
|
r"""
|
||||||
start_positions (:obj:`tf.Tensor` of shape :obj:`(batch_size,)`, `optional`):
|
start_positions (:obj:`tf.Tensor` of shape :obj:`(batch_size,)`, `optional`):
|
||||||
Labels for position (index) of the start of the labelled span for computing the token classification loss.
|
Labels for position (index) of the start of the labelled span for computing the token classification loss.
|
||||||
Positions are clamped to the length of the sequence (`sequence_length`).
|
Positions are clamped to the length of the sequence (:obj:`sequence_length`).
|
||||||
Position outside of the sequence are not taken into account for computing the loss.
|
Position outside of the sequence are not taken into account for computing the loss.
|
||||||
end_positions (:obj:`tf.Tensor` of shape :obj:`(batch_size,)`, `optional`):
|
end_positions (:obj:`tf.Tensor` of shape :obj:`(batch_size,)`, `optional`):
|
||||||
Labels for position (index) of the end of the labelled span for computing the token classification loss.
|
Labels for position (index) of the end of the labelled span for computing the token classification loss.
|
||||||
Positions are clamped to the length of the sequence (`sequence_length`).
|
Positions are clamped to the length of the sequence (:obj:`sequence_length`).
|
||||||
Position outside of the sequence are not taken into account for computing the loss.
|
Position outside of the sequence are not taken into account for computing the loss.
|
||||||
"""
|
"""
|
||||||
return_dict = return_dict if return_dict is not None else self.transformer.return_dict
|
return_dict = return_dict if return_dict is not None else self.transformer.return_dict
|
||||||
|
|||||||
@@ -209,11 +209,16 @@ class XxxPreTrainedModel(PreTrainedModel):
|
|||||||
module.bias.data.zero_()
|
module.bias.data.zero_()
|
||||||
|
|
||||||
|
|
||||||
XXX_START_DOCSTRING = r""" The XXX model was proposed in
|
XXX_START_DOCSTRING = r"""
|
||||||
`XXX: Pre-training of Deep Bidirectional Transformers for Language Understanding
|
|
||||||
|
The XXX model was proposed in `XXX: Pre-training of Deep Bidirectional Transformers for Language Understanding
|
||||||
<https://arxiv.org/abs/1810.04805>`__ by....
|
<https://arxiv.org/abs/1810.04805>`__ by....
|
||||||
|
|
||||||
This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`_ sub-class.
|
This model inherits from :class:`~transformers.PreTrainedModel`. Check the superclass documentation for the generic
|
||||||
|
methods the library implements for all its model (such as downloading or saving, resizing the input embeddings,
|
||||||
|
pruning heads etc.)
|
||||||
|
|
||||||
|
This model is also a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__ subclass.
|
||||||
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general
|
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general
|
||||||
usage and behavior.
|
usage and behavior.
|
||||||
|
|
||||||
@@ -225,27 +230,31 @@ XXX_START_DOCSTRING = r""" The XXX model was proposed in
|
|||||||
|
|
||||||
XXX_INPUTS_DOCSTRING = r"""
|
XXX_INPUTS_DOCSTRING = r"""
|
||||||
Inputs:
|
Inputs:
|
||||||
input_ids (:obj:`torch.LongTensor` of shape :obj:`{0}`):
|
input_ids (:obj:`torch.LongTensor` of shape :obj:`({0})`):
|
||||||
Indices of input sequence tokens in the vocabulary.
|
Indices of input sequence tokens in the vocabulary.
|
||||||
|
|
||||||
Indices can be obtained using :class:`transformers.XxxTokenizer`.
|
Indices can be obtained using :class:`~transformers.XxxTokenizer`.
|
||||||
See :func:`transformers.PreTrainedTokenizer.encode` and
|
See :meth:`transformers.PreTrainedTokenizer.encode` and
|
||||||
:func:`transformers.PreTrainedTokenizer.__call__` for details.
|
:meth:`transformers.PreTrainedTokenizer.__call__` for details.
|
||||||
|
|
||||||
`What are input IDs? <../glossary.html#input-ids>`__
|
`What are input IDs? <../glossary.html#input-ids>`__
|
||||||
attention_mask (:obj:`torch.FloatTensor` of shape :obj:`{0}`, `optional`):
|
attention_mask (:obj:`torch.FloatTensor` of shape :obj:`({0})`, `optional`):
|
||||||
Mask to avoid performing attention on padding token indices.
|
Mask to avoid performing attention on padding token indices.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
|
|
||||||
|
- 1 for tokens that are **not masked**,
|
||||||
|
- 0 for tokens that are **maked**.
|
||||||
|
|
||||||
`What are attention masks? <../glossary.html#attention-mask>`__
|
`What are attention masks? <../glossary.html#attention-mask>`__
|
||||||
token_type_ids (:obj:`torch.LongTensor` of shape :obj:`{0}`, `optional`):
|
token_type_ids (:obj:`torch.LongTensor` of shape :obj:`({0})`, `optional`):
|
||||||
Segment token indices to indicate first and second portions of the inputs.
|
Segment token indices to indicate first and second portions of the inputs.
|
||||||
Indices are selected in ``[0, 1]``: ``0`` corresponds to a `sentence A` token, ``1``
|
Indices are selected in ``[0, 1]``:
|
||||||
corresponds to a `sentence B` token
|
|
||||||
|
- 0 corresponds to a `sentence A` token,
|
||||||
|
- 1 corresponds to a `sentence B` token.
|
||||||
|
|
||||||
`What are token type IDs? <../glossary.html#token-type-ids>`_
|
`What are token type IDs? <../glossary.html#token-type-ids>`_
|
||||||
position_ids (:obj:`torch.LongTensor` of shape :obj:`{0}`, `optional`):
|
position_ids (:obj:`torch.LongTensor` of shape :obj:`({0})`, `optional`):
|
||||||
Indices of positions of each input sequence tokens in the position embeddings.
|
Indices of positions of each input sequence tokens in the position embeddings.
|
||||||
Selected in the range ``[0, config.max_position_embeddings - 1]``.
|
Selected in the range ``[0, config.max_position_embeddings - 1]``.
|
||||||
|
|
||||||
@@ -253,18 +262,22 @@ XXX_INPUTS_DOCSTRING = r"""
|
|||||||
head_mask (:obj:`torch.FloatTensor` of shape :obj:`(num_heads,)` or :obj:`(num_layers, num_heads)`, `optional`):
|
head_mask (:obj:`torch.FloatTensor` of shape :obj:`(num_heads,)` or :obj:`(num_layers, num_heads)`, `optional`):
|
||||||
Mask to nullify selected heads of the self-attention modules.
|
Mask to nullify selected heads of the self-attention modules.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
:obj:`1` indicates the head is **not masked**, :obj:`0` indicates the head is **masked**.
|
|
||||||
inputs_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
|
- 1 indicates the head is **not masked**,
|
||||||
|
- 0 indicates the head is **masked**.
|
||||||
|
|
||||||
|
inputs_embeds (:obj:`torch.FloatTensor` of shape :obj:`({0}, hidden_size)`, `optional`):
|
||||||
Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation.
|
Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation.
|
||||||
This is useful if you want more control over how to convert `input_ids` indices into associated vectors
|
This is useful if you want more control over how to convert :obj:`input_ids` indices into associated
|
||||||
than the model's internal embedding lookup matrix.
|
vectors than the model's internal embedding lookup matrix.
|
||||||
output_attentions (:obj:`bool`, `optional`):
|
output_attentions (:obj:`bool`, `optional`):
|
||||||
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
|
Whether or not to return the attentions tensors of all attention layers. See ``attentions`` under returned
|
||||||
|
tensors for more detail.
|
||||||
output_hidden_states (:obj:`bool`, `optional`):
|
output_hidden_states (:obj:`bool`, `optional`):
|
||||||
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
|
Whether or not to return the hidden states of all layers. See ``hidden_states`` under returned tensors for
|
||||||
|
more detail.
|
||||||
return_dict (:obj:`bool`, `optional`):
|
return_dict (:obj:`bool`, `optional`):
|
||||||
If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a
|
Whether or not to return a :class:`~transformers.file_utils.ModelOutput` instead of a plain tuple.
|
||||||
plain tuple.
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
|
||||||
@@ -296,7 +309,7 @@ class XxxModel(XxxPreTrainedModel):
|
|||||||
for layer, heads in heads_to_prune.items():
|
for layer, heads in heads_to_prune.items():
|
||||||
self.encoder.layer[layer].attention.prune_heads(heads)
|
self.encoder.layer[layer].attention.prune_heads(heads)
|
||||||
|
|
||||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
|
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||||
@add_code_sample_docstrings(
|
@add_code_sample_docstrings(
|
||||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||||
checkpoint="xxx-base-uncased",
|
checkpoint="xxx-base-uncased",
|
||||||
@@ -378,7 +391,7 @@ class XxxForMaskedLM(XxxPreTrainedModel):
|
|||||||
def get_output_embeddings(self):
|
def get_output_embeddings(self):
|
||||||
return self.lm_head
|
return self.lm_head
|
||||||
|
|
||||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
|
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||||
@add_code_sample_docstrings(
|
@add_code_sample_docstrings(
|
||||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||||
checkpoint="xxx-base-uncased",
|
checkpoint="xxx-base-uncased",
|
||||||
@@ -455,7 +468,7 @@ class XxxForSequenceClassification(XxxPreTrainedModel):
|
|||||||
|
|
||||||
self.init_weights()
|
self.init_weights()
|
||||||
|
|
||||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
|
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||||
@add_code_sample_docstrings(
|
@add_code_sample_docstrings(
|
||||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||||
checkpoint="xxx-base-uncased",
|
checkpoint="xxx-base-uncased",
|
||||||
@@ -538,7 +551,7 @@ class XxxForMultipleChoice(XxxPreTrainedModel):
|
|||||||
|
|
||||||
self.init_weights()
|
self.init_weights()
|
||||||
|
|
||||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, num_choices, sequence_length)"))
|
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length"))
|
||||||
@add_code_sample_docstrings(
|
@add_code_sample_docstrings(
|
||||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||||
checkpoint="xxx-base-uncased",
|
checkpoint="xxx-base-uncased",
|
||||||
@@ -561,8 +574,8 @@ class XxxForMultipleChoice(XxxPreTrainedModel):
|
|||||||
r"""
|
r"""
|
||||||
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
|
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
|
||||||
Labels for computing the multiple choice classification loss.
|
Labels for computing the multiple choice classification loss.
|
||||||
Indices should be in ``[0, ..., num_choices-1]`` where `num_choices` is the size of the second dimension
|
Indices should be in ``[0, ..., num_choices-1]`` where :obj:`num_choices` is the size of the second dimension
|
||||||
of the input tensors. (see `input_ids` above)
|
of the input tensors. (See :obj:`input_ids` above)
|
||||||
"""
|
"""
|
||||||
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
|
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
|
||||||
num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
|
num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
|
||||||
@@ -628,7 +641,7 @@ class XxxForTokenClassification(XxxPreTrainedModel):
|
|||||||
|
|
||||||
self.init_weights()
|
self.init_weights()
|
||||||
|
|
||||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
|
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||||
@add_code_sample_docstrings(
|
@add_code_sample_docstrings(
|
||||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||||
checkpoint="xxx-base-uncased",
|
checkpoint="xxx-base-uncased",
|
||||||
@@ -713,7 +726,7 @@ class XxxForQuestionAnswering(XxxPreTrainedModel):
|
|||||||
|
|
||||||
self.init_weights()
|
self.init_weights()
|
||||||
|
|
||||||
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
|
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
|
||||||
@add_code_sample_docstrings(
|
@add_code_sample_docstrings(
|
||||||
tokenizer_class=_TOKENIZER_FOR_DOC,
|
tokenizer_class=_TOKENIZER_FOR_DOC,
|
||||||
checkpoint="xxx-base-uncased",
|
checkpoint="xxx-base-uncased",
|
||||||
@@ -737,11 +750,11 @@ class XxxForQuestionAnswering(XxxPreTrainedModel):
|
|||||||
r"""
|
r"""
|
||||||
start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
|
start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
|
||||||
Labels for position (index) of the start of the labelled span for computing the token classification loss.
|
Labels for position (index) of the start of the labelled span for computing the token classification loss.
|
||||||
Positions are clamped to the length of the sequence (`sequence_length`).
|
Positions are clamped to the length of the sequence (:obj:`sequence_length`).
|
||||||
Position outside of the sequence are not taken into account for computing the loss.
|
Position outside of the sequence are not taken into account for computing the loss.
|
||||||
end_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
|
end_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
|
||||||
Labels for position (index) of the end of the labelled span for computing the token classification loss.
|
Labels for position (index) of the end of the labelled span for computing the token classification loss.
|
||||||
Positions are clamped to the length of the sequence (`sequence_length`).
|
Positions are clamped to the length of the sequence (:obj:`sequence_length`).
|
||||||
Position outside of the sequence are not taken into account for computing the loss.
|
Position outside of the sequence are not taken into account for computing the loss.
|
||||||
"""
|
"""
|
||||||
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
|
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
|
||||||
|
|||||||
@@ -80,16 +80,16 @@ class XxxTokenizer(PreTrainedTokenizer):
|
|||||||
r"""
|
r"""
|
||||||
Constructs a XXX tokenizer. Based on XXX.
|
Constructs a XXX tokenizer. Based on XXX.
|
||||||
|
|
||||||
This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains most of the methods. Users
|
This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains most of the main methods.
|
||||||
should refer to the superclass for more information regarding methods.
|
Users should refer to this superclass for more information regarding those methods.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
vocab_file (:obj:`str`):
|
vocab_file (:obj:`str`):
|
||||||
File containing the vocabulary.
|
File containing the vocabulary.
|
||||||
do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
||||||
Whether to lowercase the input when tokenizing.
|
Whether or not to lowercase the input when tokenizing.
|
||||||
do_basic_tokenize (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
do_basic_tokenize (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
||||||
Whether to do basic tokenization before WordPiece.
|
Whether ot not to do basic tokenization before WordPiece.
|
||||||
never_split (:obj:`Iterable`, `optional`):
|
never_split (:obj:`Iterable`, `optional`):
|
||||||
Collection of tokens which will never be split during tokenization. Only has an effect when
|
Collection of tokens which will never be split during tokenization. Only has an effect when
|
||||||
:obj:`do_basic_tokenize=True`
|
:obj:`do_basic_tokenize=True`
|
||||||
@@ -194,19 +194,19 @@ class XxxTokenizer(PreTrainedTokenizer):
|
|||||||
"""
|
"""
|
||||||
Build model inputs from a sequence or a pair of sequence for sequence classification tasks
|
Build model inputs from a sequence or a pair of sequence for sequence classification tasks
|
||||||
by concatenating and adding special tokens.
|
by concatenating and adding special tokens.
|
||||||
A BERT sequence has the following format:
|
A XXX sequence has the following format:
|
||||||
|
|
||||||
- single sequence: ``[CLS] X [SEP]``
|
- single sequence: ``[CLS] X [SEP]``
|
||||||
- pair of sequences: ``[CLS] A [SEP] B [SEP]``
|
- pair of sequences: ``[CLS] A [SEP] B [SEP]``
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
token_ids_0 (:obj:`List[int]`):
|
token_ids_0 (:obj:`List[int]`):
|
||||||
List of IDs to which the special tokens will be added
|
List of IDs to which the special tokens will be added.
|
||||||
token_ids_1 (:obj:`List[int]`, `optional`):
|
token_ids_1 (:obj:`List[int]`, `optional`):
|
||||||
Optional second list of IDs for sequence pairs.
|
Optional second list of IDs for sequence pairs.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
:obj:`List[int]`: list of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens.
|
:obj:`List[int]`: List of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens.
|
||||||
"""
|
"""
|
||||||
if token_ids_1 is None:
|
if token_ids_1 is None:
|
||||||
return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
|
return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
|
||||||
@@ -218,16 +218,16 @@ class XxxTokenizer(PreTrainedTokenizer):
|
|||||||
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False
|
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False
|
||||||
) -> List[int]:
|
) -> List[int]:
|
||||||
"""
|
"""
|
||||||
Retrieves sequence ids from a token list that has no special tokens added. This method is called when adding
|
Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
|
||||||
special tokens using the tokenizer ``prepare_for_model`` method.
|
special tokens using the tokenizer ``prepare_for_model`` method.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
token_ids_0 (:obj:`List[int]`):
|
token_ids_0 (:obj:`List[int]`):
|
||||||
List of ids.
|
List of IDs.
|
||||||
token_ids_1 (:obj:`List[int]`, `optional`):
|
token_ids_1 (:obj:`List[int]`, `optional`):
|
||||||
Optional second list of IDs for sequence pairs.
|
Optional second list of IDs for sequence pairs.
|
||||||
already_has_special_tokens (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
already_has_special_tokens (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||||
Set to True if the token list is already formatted with special tokens for the model
|
Whether or not the token list is already formatted with special tokens for the model.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
:obj:`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
|
:obj:`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
|
||||||
@@ -249,7 +249,7 @@ class XxxTokenizer(PreTrainedTokenizer):
|
|||||||
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
|
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
|
||||||
) -> List[int]:
|
) -> List[int]:
|
||||||
"""
|
"""
|
||||||
Creates a mask from the two sequences passed to be used in a sequence-pair classification task.
|
Create a mask from the two sequences passed to be used in a sequence-pair classification task.
|
||||||
A BERT sequence pair mask has the following format:
|
A BERT sequence pair mask has the following format:
|
||||||
|
|
||||||
::
|
::
|
||||||
@@ -257,11 +257,11 @@ class XxxTokenizer(PreTrainedTokenizer):
|
|||||||
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
|
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
|
||||||
| first sequence | second sequence |
|
| first sequence | second sequence |
|
||||||
|
|
||||||
if token_ids_1 is None, only returns the first portion of the mask (0's).
|
If :obj:`token_ids_1` is :obj:`None`, this method only returns the first portion of the mask (0s).
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
token_ids_0 (:obj:`List[int]`):
|
token_ids_0 (:obj:`List[int]`):
|
||||||
List of ids.
|
List of IDs.
|
||||||
token_ids_1 (:obj:`List[int]`, `optional`):
|
token_ids_1 (:obj:`List[int]`, `optional`):
|
||||||
Optional second list of IDs for sequence pairs.
|
Optional second list of IDs for sequence pairs.
|
||||||
|
|
||||||
@@ -277,7 +277,7 @@ class XxxTokenizer(PreTrainedTokenizer):
|
|||||||
|
|
||||||
def save_vocabulary(self, vocab_path):
|
def save_vocabulary(self, vocab_path):
|
||||||
"""
|
"""
|
||||||
Save the sentencepiece vocabulary (copy original file) and special tokens file to a directory.
|
Save the vocabulary (copy original file) and special tokens file to a directory.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
vocab_path (:obj:`str`):
|
vocab_path (:obj:`str`):
|
||||||
|
|||||||
Reference in New Issue
Block a user