Updates the default branch from master to main (#16326)

* Updates the default branch from master to main * Links from `master` to `main` * Typo * Update examples/flax/README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-23 08:46:59 +01:00
parent 7732148124
commit eca77f4719
101 changed files with 401 additions and 402 deletions
--- a/docs/source/model_doc/bart.mdx
+++ b/docs/source/model_doc/bart.mdx
@@ -38,7 +38,7 @@ This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The
 ### Examples

 - Examples and scripts for fine-tuning BART and other models for sequence to sequence tasks can be found in
-  [examples/pytorch/summarization/](https://github.com/huggingface/transformers/tree/master/examples/pytorch/summarization/README.md).
+  [examples/pytorch/summarization/](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization/README.md).
 - An example of how to train [`BartForConditionalGeneration`] with a Hugging Face `datasets`
  object can be found in this [forum discussion](https://discuss.huggingface.co/t/train-bart-for-conditional-generation-e-g-summarization/1904).
 - [Distilled checkpoints](https://huggingface.co/models?search=distilbart) are described in this [paper](https://arxiv.org/abs/2010.13002).
--- a/docs/source/model_doc/bigbird_pegasus.mdx
+++ b/docs/source/model_doc/bigbird_pegasus.mdx
@@ -46,7 +46,7 @@ Tips:
 - Sequence length must be divisible by block size.
 - Current implementation supports only **ITC**.
 - Current implementation doesn't support **num_random_blocks = 0**.
- BigBirdPegasus uses the [PegasusTokenizer](https://github.com/huggingface/transformers/blob/master/src/transformers/models/pegasus/tokenization_pegasus.py).
+- BigBirdPegasus uses the [PegasusTokenizer](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pegasus/tokenization_pegasus.py).

 The original code can be found [here](https://github.com/google-research/bigbird).

--- a/docs/source/model_doc/distilbert.mdx
+++ b/docs/source/model_doc/distilbert.mdx
@@ -43,7 +43,7 @@ Tips:
  necessary though, just let us know if you need this option.

 This model was contributed by [victorsanh](https://huggingface.co/victorsanh). This model jax version was
-contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation).
+contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation).


 ## DistilBertConfig
--- a/docs/source/model_doc/marian.mdx
+++ b/docs/source/model_doc/marian.mdx
@@ -48,8 +48,8 @@ Translations should be similar, but not identical to output in the test set link

 - Since Marian models are smaller than many other translation models available in the library, they can be useful for
  fine-tuning experiments and integration tests.
- [Fine-tune on GPU](https://github.com/huggingface/transformers/blob/master/examples/research_projects/seq2seq-distillation/train_distil_marian_enro_teacher.sh)
- [Fine-tune on GPU with pytorch-lightning](https://github.com/huggingface/transformers/blob/master/examples/research_projects/seq2seq-distillation/train_distil_marian_no_teacher.sh)
+- [Fine-tune on GPU](https://github.com/huggingface/transformers/blob/main/examples/research_projects/seq2seq-distillation/train_distil_marian_enro_teacher.sh)
+- [Fine-tune on GPU with pytorch-lightning](https://github.com/huggingface/transformers/blob/main/examples/research_projects/seq2seq-distillation/train_distil_marian_no_teacher.sh)

 ## Multilingual Models

--- a/docs/source/model_doc/pegasus.mdx
+++ b/docs/source/model_doc/pegasus.mdx
@@ -43,8 +43,8 @@ All the [checkpoints](https://huggingface.co/models?search=pegasus) are fine-tun

 ### Examples

- [Script](https://github.com/huggingface/transformers/tree/master/examples/research_projects/seq2seq-distillation/finetune_pegasus_xsum.sh) to fine-tune pegasus
-  on the XSUM dataset. Data download instructions at [examples/pytorch/summarization/](https://github.com/huggingface/transformers/tree/master/examples/pytorch/summarization/README.md).
+- [Script](https://github.com/huggingface/transformers/tree/main/examples/research_projects/seq2seq-distillation/finetune_pegasus_xsum.sh) to fine-tune pegasus
+  on the XSUM dataset. Data download instructions at [examples/pytorch/summarization/](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization/README.md).
 - FP16 is not supported (help/ideas on this appreciated!).
 - The adafactor optimizer is recommended for pegasus fine-tuning.

--- a/docs/source/model_doc/retribert.mdx
+++ b/docs/source/model_doc/retribert.mdx
@@ -19,7 +19,7 @@ Question Answering](https://yjernite.github.io/lfqa.html). RetriBERT is a small
 pair of BERT encoders with lower-dimension projection for dense semantic indexing of text.

 This model was contributed by [yjernite](https://huggingface.co/yjernite). Code to train and use the model can be
-found [here](https://github.com/huggingface/transformers/tree/master/examples/research-projects/distillation).
+found [here](https://github.com/huggingface/transformers/tree/main/examples/research-projects/distillation).


 ## RetriBertConfig
--- a/docs/source/model_doc/t5.mdx
+++ b/docs/source/model_doc/t5.mdx
@@ -104,7 +104,7 @@ language modeling head on top of the decoder.
  loss = model(input_ids=input_ids, labels=labels).loss
  ```

-  If you're interested in pre-training T5 on a new corpus, check out the [run_t5_mlm_flax.py](https://github.com/huggingface/transformers/tree/master/examples/flax/language-modeling) script in the Examples
+  If you're interested in pre-training T5 on a new corpus, check out the [run_t5_mlm_flax.py](https://github.com/huggingface/transformers/tree/main/examples/flax/language-modeling) script in the Examples
  directory.

 - Supervised training
@@ -143,7 +143,7 @@ language modeling head on top of the decoder.
  In addition, we must make sure that padding token id's of the `labels` are not taken into account by the loss
  function. In PyTorch and Tensorflow, this can be done by replacing them with -100, which is the `ignore_index`
  of the `CrossEntropyLoss`. In Flax, one can use the `decoder_attention_mask` to ignore padded tokens from
-  the loss (see the [Flax summarization script](https://github.com/huggingface/transformers/tree/master/examples/flax/summarization) for details). We also pass
+  the loss (see the [Flax summarization script](https://github.com/huggingface/transformers/tree/main/examples/flax/summarization) for details). We also pass
  `attention_mask` as additional input to the model, which makes sure that padding tokens of the inputs are
  ignored. The code example below illustrates all of this.

@@ -272,13 +272,13 @@ If you'd like a faster training and inference performance, install [apex](https:

 T5 is supported by several example scripts, both for pre-training and fine-tuning.

- pre-training: the [run_t5_mlm_flax.py](https://github.com/huggingface/transformers/blob/master/examples/flax/language-modeling/run_t5_mlm_flax.py)
-  script allows you to further pre-train T5 or pre-train T5 from scratch on your own data. The [t5_tokenizer_model.py](https://github.com/huggingface/transformers/blob/master/examples/flax/language-modeling/t5_tokenizer_model.py)
+- pre-training: the [run_t5_mlm_flax.py](https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_t5_mlm_flax.py)
+  script allows you to further pre-train T5 or pre-train T5 from scratch on your own data. The [t5_tokenizer_model.py](https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/t5_tokenizer_model.py)
  script allows you to further train a T5 tokenizer or train a T5 Tokenizer from scratch on your own data. Note that
  Flax (a neural network library on top of JAX) is particularly useful to train on TPU hardware.

- fine-tuning: T5 is supported by the official summarization scripts ([PyTorch](https://github.com/huggingface/transformers/tree/master/examples/pytorch/summarization), [Tensorflow](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/summarization), and [Flax](https://github.com/huggingface/transformers/tree/master/examples/flax/summarization)) and translation scripts
-  ([PyTorch](https://github.com/huggingface/transformers/tree/master/examples/pytorch/translation) and [Tensorflow](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/translation)). These scripts allow
+- fine-tuning: T5 is supported by the official summarization scripts ([PyTorch](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization), [Tensorflow](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/summarization), and [Flax](https://github.com/huggingface/transformers/tree/main/examples/flax/summarization)) and translation scripts
+  ([PyTorch](https://github.com/huggingface/transformers/tree/main/examples/pytorch/translation) and [Tensorflow](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/translation)). These scripts allow
  you to easily fine-tune T5 on custom data for summarization/translation.

 ## T5Config
--- a/docs/source/model_doc/visual_bert.mdx
+++ b/docs/source/model_doc/visual_bert.mdx
@@ -56,7 +56,7 @@ appropriately for the textual and visual parts.
 The [`BertTokenizer`] is used to encode the text. A custom detector/feature extractor must be used
 to get the visual embeddings. The following example notebooks show how to use VisualBERT with Detectron-like models:

- [VisualBERT VQA demo notebook](https://github.com/huggingface/transformers/tree/master/examples/research_projects/visual_bert) : This notebook
+- [VisualBERT VQA demo notebook](https://github.com/huggingface/transformers/tree/main/examples/research_projects/visual_bert) : This notebook
  contains an example on VisualBERT VQA.

 - [Generate Embeddings for VisualBERT (Colab Notebook)](https://colab.research.google.com/drive/1bLGxKdldwqnMVA5x4neY7-l_8fKGWQYI?usp=sharing) : This notebook contains
--- a/docs/source/model_doc/vit_mae.mdx
+++ b/docs/source/model_doc/vit_mae.mdx
@@ -32,7 +32,7 @@ Tips:

 - MAE (masked auto encoding) is a method for self-supervised pre-training of Vision Transformers (ViTs). The pre-training objective is relatively simple:
 by masking a large portion (75%) of the image patches, the model must reconstruct raw pixel values. One can use [`ViTMAEForPreTraining`] for this purpose.
- An example Python script that illustrates how to pre-train [`ViTMAEForPreTraining`] from scratch can be found [here](https://github.com/huggingface/transformers/tree/master/examples/pytorch/image-pretraining).
+- An example Python script that illustrates how to pre-train [`ViTMAEForPreTraining`] from scratch can be found [here](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining).
 One can easily tweak it for their own use case.
 - A notebook that illustrates how to visualize reconstructed pixel values with [`ViTMAEForPreTraining`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/ViTMAE/ViT_MAE_visualization_demo.ipynb).
 - After pre-training, one "throws away" the decoder used to reconstruct pixels, and one uses the encoder for fine-tuning/linear probing. This means that after