Doc styling (#8067)
* Important files * Styling them all * Revert "Styling them all" This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e. * Syling them for realsies * Fix syntax error * Fix benchmark_utils * More fixes * Fix modeling auto and script * Remove new line * Fixes * More fixes * Fix more files * Style * Add FSMT * More fixes * More fixes * More fixes * More fixes * Fixes * More fixes * More fixes * Last fixes * Make sphinx happy
This commit is contained in:
@@ -16,11 +16,11 @@ The abstract from the paper is the following:
|
||||
better performance than pretraining approaches based on autoregressive language modeling. However, relying on
|
||||
corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a
|
||||
pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive
|
||||
pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over
|
||||
all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive
|
||||
formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model,
|
||||
into pretraining. Empirically, under comparable experiment settings, XLNet outperforms BERT on 20 tasks, often by
|
||||
a large margin, including question answering, natural language inference, sentiment analysis, and document ranking.*
|
||||
pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all
|
||||
permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive
|
||||
formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into
|
||||
pretraining. Empirically, under comparable experiment settings, XLNet outperforms BERT on 20 tasks, often by a large
|
||||
margin, including question answering, natural language inference, sentiment analysis, and document ranking.*
|
||||
|
||||
Tips:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user