Updates the default branch from master to main (#16326)
* Updates the default branch from master to main * Links from `master` to `main` * Typo * Update examples/flax/README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
@@ -38,7 +38,7 @@ This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The
|
||||
### Examples
|
||||
|
||||
- Examples and scripts for fine-tuning BART and other models for sequence to sequence tasks can be found in
|
||||
[examples/pytorch/summarization/](https://github.com/huggingface/transformers/tree/master/examples/pytorch/summarization/README.md).
|
||||
[examples/pytorch/summarization/](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization/README.md).
|
||||
- An example of how to train [`BartForConditionalGeneration`] with a Hugging Face `datasets`
|
||||
object can be found in this [forum discussion](https://discuss.huggingface.co/t/train-bart-for-conditional-generation-e-g-summarization/1904).
|
||||
- [Distilled checkpoints](https://huggingface.co/models?search=distilbart) are described in this [paper](https://arxiv.org/abs/2010.13002).
|
||||
|
||||
@@ -46,7 +46,7 @@ Tips:
|
||||
- Sequence length must be divisible by block size.
|
||||
- Current implementation supports only **ITC**.
|
||||
- Current implementation doesn't support **num_random_blocks = 0**.
|
||||
- BigBirdPegasus uses the [PegasusTokenizer](https://github.com/huggingface/transformers/blob/master/src/transformers/models/pegasus/tokenization_pegasus.py).
|
||||
- BigBirdPegasus uses the [PegasusTokenizer](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pegasus/tokenization_pegasus.py).
|
||||
|
||||
The original code can be found [here](https://github.com/google-research/bigbird).
|
||||
|
||||
|
||||
@@ -43,7 +43,7 @@ Tips:
|
||||
necessary though, just let us know if you need this option.
|
||||
|
||||
This model was contributed by [victorsanh](https://huggingface.co/victorsanh). This model jax version was
|
||||
contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation).
|
||||
contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation).
|
||||
|
||||
|
||||
## DistilBertConfig
|
||||
|
||||
@@ -48,8 +48,8 @@ Translations should be similar, but not identical to output in the test set link
|
||||
|
||||
- Since Marian models are smaller than many other translation models available in the library, they can be useful for
|
||||
fine-tuning experiments and integration tests.
|
||||
- [Fine-tune on GPU](https://github.com/huggingface/transformers/blob/master/examples/research_projects/seq2seq-distillation/train_distil_marian_enro_teacher.sh)
|
||||
- [Fine-tune on GPU with pytorch-lightning](https://github.com/huggingface/transformers/blob/master/examples/research_projects/seq2seq-distillation/train_distil_marian_no_teacher.sh)
|
||||
- [Fine-tune on GPU](https://github.com/huggingface/transformers/blob/main/examples/research_projects/seq2seq-distillation/train_distil_marian_enro_teacher.sh)
|
||||
- [Fine-tune on GPU with pytorch-lightning](https://github.com/huggingface/transformers/blob/main/examples/research_projects/seq2seq-distillation/train_distil_marian_no_teacher.sh)
|
||||
|
||||
## Multilingual Models
|
||||
|
||||
|
||||
@@ -43,8 +43,8 @@ All the [checkpoints](https://huggingface.co/models?search=pegasus) are fine-tun
|
||||
|
||||
### Examples
|
||||
|
||||
- [Script](https://github.com/huggingface/transformers/tree/master/examples/research_projects/seq2seq-distillation/finetune_pegasus_xsum.sh) to fine-tune pegasus
|
||||
on the XSUM dataset. Data download instructions at [examples/pytorch/summarization/](https://github.com/huggingface/transformers/tree/master/examples/pytorch/summarization/README.md).
|
||||
- [Script](https://github.com/huggingface/transformers/tree/main/examples/research_projects/seq2seq-distillation/finetune_pegasus_xsum.sh) to fine-tune pegasus
|
||||
on the XSUM dataset. Data download instructions at [examples/pytorch/summarization/](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization/README.md).
|
||||
- FP16 is not supported (help/ideas on this appreciated!).
|
||||
- The adafactor optimizer is recommended for pegasus fine-tuning.
|
||||
|
||||
|
||||
@@ -19,7 +19,7 @@ Question Answering](https://yjernite.github.io/lfqa.html). RetriBERT is a small
|
||||
pair of BERT encoders with lower-dimension projection for dense semantic indexing of text.
|
||||
|
||||
This model was contributed by [yjernite](https://huggingface.co/yjernite). Code to train and use the model can be
|
||||
found [here](https://github.com/huggingface/transformers/tree/master/examples/research-projects/distillation).
|
||||
found [here](https://github.com/huggingface/transformers/tree/main/examples/research-projects/distillation).
|
||||
|
||||
|
||||
## RetriBertConfig
|
||||
|
||||
@@ -104,7 +104,7 @@ language modeling head on top of the decoder.
|
||||
loss = model(input_ids=input_ids, labels=labels).loss
|
||||
```
|
||||
|
||||
If you're interested in pre-training T5 on a new corpus, check out the [run_t5_mlm_flax.py](https://github.com/huggingface/transformers/tree/master/examples/flax/language-modeling) script in the Examples
|
||||
If you're interested in pre-training T5 on a new corpus, check out the [run_t5_mlm_flax.py](https://github.com/huggingface/transformers/tree/main/examples/flax/language-modeling) script in the Examples
|
||||
directory.
|
||||
|
||||
- Supervised training
|
||||
@@ -143,7 +143,7 @@ language modeling head on top of the decoder.
|
||||
In addition, we must make sure that padding token id's of the `labels` are not taken into account by the loss
|
||||
function. In PyTorch and Tensorflow, this can be done by replacing them with -100, which is the `ignore_index`
|
||||
of the `CrossEntropyLoss`. In Flax, one can use the `decoder_attention_mask` to ignore padded tokens from
|
||||
the loss (see the [Flax summarization script](https://github.com/huggingface/transformers/tree/master/examples/flax/summarization) for details). We also pass
|
||||
the loss (see the [Flax summarization script](https://github.com/huggingface/transformers/tree/main/examples/flax/summarization) for details). We also pass
|
||||
`attention_mask` as additional input to the model, which makes sure that padding tokens of the inputs are
|
||||
ignored. The code example below illustrates all of this.
|
||||
|
||||
@@ -272,13 +272,13 @@ If you'd like a faster training and inference performance, install [apex](https:
|
||||
|
||||
T5 is supported by several example scripts, both for pre-training and fine-tuning.
|
||||
|
||||
- pre-training: the [run_t5_mlm_flax.py](https://github.com/huggingface/transformers/blob/master/examples/flax/language-modeling/run_t5_mlm_flax.py)
|
||||
script allows you to further pre-train T5 or pre-train T5 from scratch on your own data. The [t5_tokenizer_model.py](https://github.com/huggingface/transformers/blob/master/examples/flax/language-modeling/t5_tokenizer_model.py)
|
||||
- pre-training: the [run_t5_mlm_flax.py](https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_t5_mlm_flax.py)
|
||||
script allows you to further pre-train T5 or pre-train T5 from scratch on your own data. The [t5_tokenizer_model.py](https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/t5_tokenizer_model.py)
|
||||
script allows you to further train a T5 tokenizer or train a T5 Tokenizer from scratch on your own data. Note that
|
||||
Flax (a neural network library on top of JAX) is particularly useful to train on TPU hardware.
|
||||
|
||||
- fine-tuning: T5 is supported by the official summarization scripts ([PyTorch](https://github.com/huggingface/transformers/tree/master/examples/pytorch/summarization), [Tensorflow](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/summarization), and [Flax](https://github.com/huggingface/transformers/tree/master/examples/flax/summarization)) and translation scripts
|
||||
([PyTorch](https://github.com/huggingface/transformers/tree/master/examples/pytorch/translation) and [Tensorflow](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/translation)). These scripts allow
|
||||
- fine-tuning: T5 is supported by the official summarization scripts ([PyTorch](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization), [Tensorflow](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/summarization), and [Flax](https://github.com/huggingface/transformers/tree/main/examples/flax/summarization)) and translation scripts
|
||||
([PyTorch](https://github.com/huggingface/transformers/tree/main/examples/pytorch/translation) and [Tensorflow](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/translation)). These scripts allow
|
||||
you to easily fine-tune T5 on custom data for summarization/translation.
|
||||
|
||||
## T5Config
|
||||
|
||||
@@ -56,7 +56,7 @@ appropriately for the textual and visual parts.
|
||||
The [`BertTokenizer`] is used to encode the text. A custom detector/feature extractor must be used
|
||||
to get the visual embeddings. The following example notebooks show how to use VisualBERT with Detectron-like models:
|
||||
|
||||
- [VisualBERT VQA demo notebook](https://github.com/huggingface/transformers/tree/master/examples/research_projects/visual_bert) : This notebook
|
||||
- [VisualBERT VQA demo notebook](https://github.com/huggingface/transformers/tree/main/examples/research_projects/visual_bert) : This notebook
|
||||
contains an example on VisualBERT VQA.
|
||||
|
||||
- [Generate Embeddings for VisualBERT (Colab Notebook)](https://colab.research.google.com/drive/1bLGxKdldwqnMVA5x4neY7-l_8fKGWQYI?usp=sharing) : This notebook contains
|
||||
|
||||
@@ -32,7 +32,7 @@ Tips:
|
||||
|
||||
- MAE (masked auto encoding) is a method for self-supervised pre-training of Vision Transformers (ViTs). The pre-training objective is relatively simple:
|
||||
by masking a large portion (75%) of the image patches, the model must reconstruct raw pixel values. One can use [`ViTMAEForPreTraining`] for this purpose.
|
||||
- An example Python script that illustrates how to pre-train [`ViTMAEForPreTraining`] from scratch can be found [here](https://github.com/huggingface/transformers/tree/master/examples/pytorch/image-pretraining).
|
||||
- An example Python script that illustrates how to pre-train [`ViTMAEForPreTraining`] from scratch can be found [here](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining).
|
||||
One can easily tweak it for their own use case.
|
||||
- A notebook that illustrates how to visualize reconstructed pixel values with [`ViTMAEForPreTraining`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/ViTMAE/ViT_MAE_visualization_demo.ipynb).
|
||||
- After pre-training, one "throws away" the decoder used to reconstruct pixels, and one uses the encoder for fine-tuning/linear probing. This means that after
|
||||
|
||||
Reference in New Issue
Block a user