From dd288303273a96ebe38346d048b2900bbd747989 Mon Sep 17 00:00:00 2001 From: Lysandre Date: Fri, 7 Feb 2020 12:17:51 -0500 Subject: [PATCH] Update RoBERTa tips --- docs/source/model_doc/roberta.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/source/model_doc/roberta.rst b/docs/source/model_doc/roberta.rst index d3276d55e0..62138bb72e 100644 --- a/docs/source/model_doc/roberta.rst +++ b/docs/source/model_doc/roberta.rst @@ -23,6 +23,9 @@ Tips: - This implementation is the same as :class:`~transformers.BertModel` with a tiny embeddings tweak as well as a setup for Roberta pretrained models. +- RoBERTa has the same architecture as BERT, but uses a byte-level BPE as a tokenizer (same as GPT-2) and uses a + different pre-training scheme. +- RoBERTa doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token `tokenizer.sep_token` (or ``) - `Camembert <./camembert.html>`__ is a wrapper around RoBERTa. Refer to this page for usage examples. RobertaConfig