Add DPR model (#5279)

* beginning of dpr modeling * wip * implement forward * remove biencoder + better init weights * export dpr model to embed model for nlp lib * add new api * remove old code * make style * fix dumb typo * don't load bert weights * docs * docs * style * move the `k` parameter * fix init_weights * add pretrained configs * minor * update config names * style * better config * style * clean code based on PR comments * change Dpr to DPR * fix config * switch encoder config to a dict * style * inheritance -> composition * add messages in assert startements * add dpr reader tokenizer * one tokenizer per model * fix base_model_prefix * fix imports * typo * add convert script * docs * change tokenizers conf names * style * change tokenizers conf names * minor * minor * fix wrong names * minor * remove unused convert functions * rename convert script * use return_tensors in tokenizers * remove n_questions dim * move generate logic to tokenizer * style * add docs * docs * quality * docs * add tests * style * add tokenization tests * DPR full tests * Stay true to the attention mask building * update docs * missing param in bert input docs * docs * style Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-07-07 14:56:12 +02:00
parent d2a9399115
commit fbd8792195
11 changed files with 1531 additions and 3 deletions
--- a/src/transformers/tokenization_utils_base.py
+++ b/src/transformers/tokenization_utils_base.py
@@ -965,7 +965,7 @@ ENCODE_KWARGS_DOCSTRING = r"""
                >= 7.5 (Volta).
            return_tensors (:obj:`str`, `optional`, defaults to :obj:`None`):
                Can be set to 'tf', 'pt' or 'np' to return respectively TensorFlow :obj:`tf.constant`,
-                PyTorch :obj:`torch.Tensor` or Numpy :oj: `np.ndarray` instead of a list of python integers.
+                PyTorch :obj:`torch.Tensor` or Numpy :obj: `np.ndarray` instead of a list of python integers.
 """

 ENCODE_PLUS_ADDITIONAL_KWARGS_DOCSTRING = r"""
@@ -1900,7 +1900,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin):
            return_attention_mask: (optional) Set to False to avoid returning attention mask (default: set to model specifics)
            return_tensors (:obj:`str`, `optional`, defaults to :obj:`None`):
                Can be set to 'tf', 'pt' or 'np' to return respectively TensorFlow :obj:`tf.constant`,
-                PyTorch :obj:`torch.Tensor` or Numpy :oj: `np.ndarray` instead of a list of python integers.
+                PyTorch :obj:`torch.Tensor` or Numpy :obj: `np.ndarray` instead of a list of python integers.
            verbose (:obj:`bool`, `optional`, defaults to :obj:`True`):
                Set to ``False`` to avoid printing infos and warnings.
        """