GPT-2 PyTorch models + better tips for BERT

This commit is contained in:
Lysandre
2020-01-16 17:16:26 -05:00
committed by Lysandre Debut
parent dbeb7fb4e6
commit bd0d3fd76e
3 changed files with 196 additions and 146 deletions

View File

@@ -27,7 +27,13 @@ Tips:
- BERT is a model with absolute position embeddings so it's usually advised to pad the inputs on
the right rather than the left.
- BERT was trained with a masked language modeling (MLM) objective. It is therefore efficient at predicting masked
tokens and at NLU in general, but is not optimal for text generation. Models trained with a causal language
modeling (CLM) objective are better in that regard.
- Alongside MLM, BERT was trained using a next sentence prediction (NSP) objective using the [CLS] token as a sequence
approximate. The user may use this token (the first token in a sequence built with special tokens) to get a sequence
prediction rather than a token prediction. However, averaging over the sequence may yield better results than using
the [CLS] token.
BertConfig
~~~~~~~~~~~~~~~~~~~~~