Remove outdated BERT tips (#6217)
* Remove out-dated BERT tips * Update modeling_outputs.py * Update bert.rst * Update bert.rst
This commit is contained in:
@@ -27,13 +27,8 @@ Tips:
|
|||||||
|
|
||||||
- BERT is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
- BERT is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
||||||
the right rather than the left.
|
the right rather than the left.
|
||||||
- BERT was trained with a masked language modeling (MLM) objective. It is therefore efficient at predicting masked
|
- BERT was trained with the masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is efficient at predicting masked
|
||||||
tokens and at NLU in general, but is not optimal for text generation. Models trained with a causal language
|
tokens and at NLU in general, but is not optimal for text generation.
|
||||||
modeling (CLM) objective are better in that regard.
|
|
||||||
- Alongside MLM, BERT was trained using a next sentence prediction (NSP) objective using the [CLS] token as a sequence
|
|
||||||
approximate. The user may use this token (the first token in a sequence built with special tokens) to get a sequence
|
|
||||||
prediction rather than a token prediction. However, averaging over the sequence may yield better results than using
|
|
||||||
the [CLS] token.
|
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/google-research/bert>`_.
|
The original code can be found `here <https://github.com/google-research/bert>`_.
|
||||||
|
|
||||||
|
|||||||
@@ -45,10 +45,6 @@ class BaseModelOutputWithPooling(ModelOutput):
|
|||||||
further processed by a Linear layer and a Tanh activation function. The Linear
|
further processed by a Linear layer and a Tanh activation function. The Linear
|
||||||
layer weights are trained from the next sentence prediction (classification)
|
layer weights are trained from the next sentence prediction (classification)
|
||||||
objective during pretraining.
|
objective during pretraining.
|
||||||
|
|
||||||
This output is usually *not* a good summary
|
|
||||||
of the semantic content of the input, you're often better with averaging or pooling
|
|
||||||
the sequence of hidden-states for the whole input sequence.
|
|
||||||
hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
|
||||||
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
|
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
|
||||||
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
|
||||||
|
|||||||
Reference in New Issue
Block a user