PyTorch XLNet

This commit is contained in:
Lysandre
2020-01-17 14:59:31 -05:00
committed by Lysandre Debut
parent 83fa8d9fb5
commit cd656fb21a
2 changed files with 355 additions and 388 deletions

View File

@@ -1,6 +1,22 @@
XLNet
----------------------------------------------------
The XLNet model was proposed in `XLNet: Generalized Autoregressive Pretraining for Language Understanding`_
by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
XLnet is an extension of the Transformer-XL model pre-trained using an autoregressive method
to learn bidirectional contexts by maximizing the expected likelihood over all permutations
of the input sequence factorization order.
The specific attention pattern can be controlled at training and test time using the `perm_mask` input.
Due to the difficulty of training a fully auto-regressive model over various factorization order,
XLNet is pretrained using only a sub-set of the output tokens as target which are selected
with the `target_mapping` input.
To use XLNet for sequential decoding (i.e. not in fully bi-directional setting), use the `perm_mask` and
`target_mapping` inputs to control the attention span and outputs (see examples in `examples/run_generation.py`)
``XLNetConfig``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~