GPT-2 PyTorch models + better tips for BERT
This commit is contained in:
@@ -27,7 +27,13 @@ Tips:
|
||||
|
||||
- BERT is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
||||
the right rather than the left.
|
||||
|
||||
- BERT was trained with a masked language modeling (MLM) objective. It is therefore efficient at predicting masked
|
||||
tokens and at NLU in general, but is not optimal for text generation. Models trained with a causal language
|
||||
modeling (CLM) objective are better in that regard.
|
||||
- Alongside MLM, BERT was trained using a next sentence prediction (NSP) objective using the [CLS] token as a sequence
|
||||
approximate. The user may use this token (the first token in a sequence built with special tokens) to get a sequence
|
||||
prediction rather than a token prediction. However, averaging over the sequence may yield better results than using
|
||||
the [CLS] token.
|
||||
|
||||
BertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Reference in New Issue
Block a user