Use HF papers (#38184)
* Use hf papers * Hugging Face papers * doi to hf papers * style
This commit is contained in:
committed by
GitHub
parent
1031ed5166
commit
de24fb63ed
@@ -26,7 +26,7 @@ way which enables simple and efficient model parallelism.
|
||||
## Masked language modeling
|
||||
|
||||
In the following, we demonstrate how to train a bi-directional transformer model
|
||||
using masked language modeling objective as introduced in [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
|
||||
using masked language modeling objective as introduced in [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://huggingface.co/papers/1810.04805).
|
||||
More specifically, we demonstrate how JAX/Flax can be leveraged
|
||||
to pre-train [**`FacebookAI/roberta-base`**](https://huggingface.co/FacebookAI/roberta-base)
|
||||
in Norwegian on a single TPUv3-8 pod.
|
||||
@@ -229,7 +229,7 @@ look at [this](https://colab.research.google.com/github/huggingface/notebooks/bl
|
||||
## T5-like span-masked language modeling
|
||||
|
||||
In the following, we demonstrate how to train a T5 model using the span-masked language model
|
||||
objective as proposed in the [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683).
|
||||
objective as proposed in the [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://huggingface.co/papers/1910.10683).
|
||||
More specifically, we demonstrate how JAX/Flax can be leveraged
|
||||
to pre-train [**`google/t5-v1_1-base`**](https://huggingface.co/google/t5-v1_1-base)
|
||||
in Norwegian on a single TPUv3-8 pod.
|
||||
@@ -341,7 +341,7 @@ Training statistics can be accessed on directly on the 🤗 [hub](https://huggin
|
||||
## BART: Denoising language modeling
|
||||
|
||||
In the following, we demonstrate how to train a BART model
|
||||
using denoising language modeling objective as introduced in [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461).
|
||||
using denoising language modeling objective as introduced in [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://huggingface.co/papers/1910.13461).
|
||||
More specifically, we demonstrate how JAX/Flax can be leveraged
|
||||
to pre-train [**`bart-base`**](https://huggingface.co/facebook/bart-base)
|
||||
in Norwegian on a single TPUv3-8 pod.
|
||||
|
||||
@@ -265,7 +265,7 @@ class FlaxDataCollatorForBartDenoisingLM:
|
||||
Data collator used for BART denoising language modeling. The code is largely copied from
|
||||
`<https://github.com/morganmcg1/rotobart/blob/main/data_collator.py#L223>`__.
|
||||
For more information on how BART denoising language modeling works, one can take a look
|
||||
at the `official paper <https://arxiv.org/pdf/1910.13461.pdf>`__
|
||||
at the `official paper <https://huggingface.co/papers/1910.13461>`__
|
||||
or the `official code for preprocessing <https://github.com/facebookresearch/fairseq/blob/main/fairseq/data/denoising_dataset.py>`__ .
|
||||
Args:
|
||||
tokenizer (:class:`~transformers.PreTrainedTokenizer` or :class:`~transformers.PreTrainedTokenizerFast`):
|
||||
|
||||
@@ -309,7 +309,7 @@ class FlaxDataCollatorForT5MLM:
|
||||
Data collator used for T5 span-masked language modeling.
|
||||
It is made sure that after masking the inputs are of length `data_args.max_seq_length` and targets are also of fixed length.
|
||||
For more information on how T5 span-masked language modeling works, one can take a look
|
||||
at the `official paper <https://arxiv.org/pdf/1910.10683.pdf>`__
|
||||
at the `official paper <https://huggingface.co/papers/1910.10683>`__
|
||||
or the `official code for preprocessing <https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/data/preprocessors.py>`__ .
|
||||
|
||||
Args:
|
||||
|
||||
Reference in New Issue
Block a user