Update all references to canonical models (#29001)

* Script & Manual edition * Update
2024-02-16 08:16:58 +01:00
parent 1e402b957d
commit f497f564bb
561 changed files with 2682 additions and 2687 deletions
--- a/examples/legacy/benchmarking/README.md
+++ b/examples/legacy/benchmarking/README.md
@@ -22,5 +22,5 @@ If you would like to list benchmark results on your favorite models of the [mode

 | Benchmark description | Results | Environment info |      Author      |
 |:----------|:-------------|:-------------|------:|
-| PyTorch Benchmark on inference for `bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) | 
-| PyTorch Benchmark on inference for `bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) | 
+| PyTorch Benchmark on inference for `google-bert/bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) | 
+| PyTorch Benchmark on inference for `google-bert/bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) | 
--- a/examples/legacy/question-answering/README.md
+++ b/examples/legacy/question-answering/README.md
@@ -1,7 +1,7 @@
 #### Fine-tuning BERT on SQuAD1.0 with relative position embeddings

 The following examples show how to fine-tune BERT models with different relative position embeddings. The BERT model 
-`bert-base-uncased` was pretrained with default absolute position embeddings. We provide the following pretrained 
+`google-bert/bert-base-uncased` was pretrained with default absolute position embeddings. We provide the following pretrained 
 models which were pre-trained on the same training data (BooksCorpus and English Wikipedia) as in the BERT model 
 training, but with different relative position embeddings. 

@@ -10,7 +10,7 @@ Shaw et al., [Self-Attention with Relative Position Representations](https://arx
 * `zhiheng-huang/bert-base-uncased-embedding-relative-key-query`, trained from scratch with relative embedding method 4 
 in Huang et al. [Improve Transformer Models with Better Relative Position Embeddings](https://arxiv.org/abs/2009.13658)
 * `zhiheng-huang/bert-large-uncased-whole-word-masking-embedding-relative-key-query`, fine-tuned from model 
-`bert-large-uncased-whole-word-masking` with 3 additional epochs with relative embedding method 4 in Huang et al. 
+`google-bert/bert-large-uncased-whole-word-masking` with 3 additional epochs with relative embedding method 4 in Huang et al. 
 [Improve Transformer Models with Better Relative Position Embeddings](https://arxiv.org/abs/2009.13658)


@@ -61,7 +61,7 @@ torchrun --nproc_per_node=8 ./examples/question-answering/run_squad.py \
    --gradient_accumulation_steps 3
 ```
 Training with the above command leads to the f1 score of 93.52, which is slightly better than the f1 score of 93.15 for 
-`bert-large-uncased-whole-word-masking`.
+`google-bert/bert-large-uncased-whole-word-masking`.

 #### Distributed training

@@ -69,7 +69,7 @@ Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word

 ```bash
 torchrun --nproc_per_node=8 ./examples/question-answering/run_squad.py \
-    --model_name_or_path bert-large-uncased-whole-word-masking \
+    --model_name_or_path google-bert/bert-large-uncased-whole-word-masking \
    --dataset_name squad \
    --do_train \
    --do_eval \
@@ -90,7 +90,7 @@ exact_match = 86.91
 ```

 This fine-tuned model is available as a checkpoint under the reference
-[`bert-large-uncased-whole-word-masking-finetuned-squad`](https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad).
+[`google-bert/bert-large-uncased-whole-word-masking-finetuned-squad`](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad).

 ## Results

--- a/examples/legacy/run_camembert.py
+++ b/examples/legacy/run_camembert.py
@@ -39,8 +39,8 @@ def fill_mask(masked_input, model, tokenizer, topk=5):
    return topk_filled_outputs


-tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
-model = CamembertForMaskedLM.from_pretrained("camembert-base")
+tokenizer = CamembertTokenizer.from_pretrained("almanach/camembert-base")
+model = CamembertForMaskedLM.from_pretrained("almanach/camembert-base")
 model.eval()

 masked_input = "Le camembert est <mask> :)"
--- a/examples/legacy/run_openai_gpt.py
+++ b/examples/legacy/run_openai_gpt.py
@@ -20,7 +20,7 @@

    This script with default values fine-tunes and evaluate a pretrained OpenAI GPT on the RocStories dataset:
        python run_openai_gpt.py \
-          --model_name openai-gpt \
+          --model_name openai-community/openai-gpt \
          --do_train \
          --do_eval \
          --train_dataset "$ROC_STORIES_DIR/cloze_test_val__spring2016 - cloze_test_ALL_val.csv" \
@@ -104,7 +104,7 @@ def pre_process_datasets(encoded_datasets, input_len, cap_length, start_token, d

 def main():
    parser = argparse.ArgumentParser()
-    parser.add_argument("--model_name", type=str, default="openai-gpt", help="pretrained model name")
+    parser.add_argument("--model_name", type=str, default="openai-community/openai-gpt", help="pretrained model name")
    parser.add_argument("--do_train", action="store_true", help="Whether to run training.")
    parser.add_argument("--do_eval", action="store_true", help="Whether to run eval on the dev set.")
    parser.add_argument(
--- a/examples/legacy/run_transfo_xl.py
+++ b/examples/legacy/run_transfo_xl.py
@@ -40,7 +40,7 @@ logger = logging.getLogger(__name__)

 def main():
    parser = argparse.ArgumentParser(description="PyTorch Transformer Language Model")
-    parser.add_argument("--model_name", type=str, default="transfo-xl-wt103", help="pretrained model name")
+    parser.add_argument("--model_name", type=str, default="transfo-xl/transfo-xl-wt103", help="pretrained model name")
    parser.add_argument(
        "--split", type=str, default="test", choices=["all", "valid", "test"], help="which split to evaluate"
    )
--- a/examples/legacy/seq2seq/README.md
+++ b/examples/legacy/seq2seq/README.md
@@ -170,7 +170,7 @@ If 'translation' is in your task name, the computed metric will be BLEU. Otherwi
 For t5, you need to specify --task translation_{src}_to_{tgt} as follows:
 ```bash
 export DATA_DIR=wmt_en_ro
-./run_eval.py t5-base \
+./run_eval.py google-t5/t5-base \
    $DATA_DIR/val.source t5_val_generations.txt \
    --reference_path $DATA_DIR/val.target \
    --score_path enro_bleu.json \
--- a/examples/legacy/seq2seq/old_test_datasets.py
+++ b/examples/legacy/seq2seq/old_test_datasets.py
@@ -28,7 +28,7 @@ from transformers.testing_utils import TestCasePlus, slow
 from utils import FAIRSEQ_AVAILABLE, DistributedSortishSampler, LegacySeq2SeqDataset, Seq2SeqDataset


-BERT_BASE_CASED = "bert-base-cased"
+BERT_BASE_CASED = "google-bert/bert-base-cased"
 PEGASUS_XSUM = "google/pegasus-xsum"
 ARTICLES = [" Sam ate lunch today.", "Sams lunch ingredients."]
 SUMMARIES = ["A very interesting story about what I ate for lunch.", "Avocado, celery, turkey, coffee"]
--- a/examples/legacy/seq2seq/pack_dataset.py
+++ b/examples/legacy/seq2seq/pack_dataset.py
@@ -74,7 +74,7 @@ def pack_data_dir(tok, data_dir: Path, max_tokens, save_path):

 def packer_cli():
    parser = argparse.ArgumentParser()
-    parser.add_argument("--tok_name", type=str, help="like facebook/bart-large-cnn,t5-base, etc.")
+    parser.add_argument("--tok_name", type=str, help="like facebook/bart-large-cnn,google-t5/t5-base, etc.")
    parser.add_argument("--max_seq_len", type=int, default=128)
    parser.add_argument("--data_dir", type=str)
    parser.add_argument("--save_path", type=str)
--- a/examples/legacy/seq2seq/run_distributed_eval.py
+++ b/examples/legacy/seq2seq/run_distributed_eval.py
@@ -124,7 +124,7 @@ def run_generate():
    parser.add_argument(
        "--model_name",
        type=str,
-        help="like facebook/bart-large-cnn,t5-base, etc.",
+        help="like facebook/bart-large-cnn,google-t5/t5-base, etc.",
        default="sshleifer/distilbart-xsum-12-3",
    )
    parser.add_argument("--save_dir", type=str, help="where to save", default="tmp_gen")
--- a/examples/legacy/seq2seq/run_eval.py
+++ b/examples/legacy/seq2seq/run_eval.py
@@ -100,7 +100,7 @@ def run_generate(verbose=True):
    """

    parser = argparse.ArgumentParser()
-    parser.add_argument("model_name", type=str, help="like facebook/bart-large-cnn,t5-base, etc.")
+    parser.add_argument("model_name", type=str, help="like facebook/bart-large-cnn,google-t5/t5-base, etc.")
    parser.add_argument("input_path", type=str, help="like cnn_dm/test.source")
    parser.add_argument("save_path", type=str, help="where to save summaries")
    parser.add_argument("--reference_path", type=str, required=False, help="like cnn_dm/test.target")
--- a/examples/legacy/token-classification/README.md
+++ b/examples/legacy/token-classification/README.md
@@ -34,7 +34,7 @@ Let's define some variables that we need for further pre-processing steps and tr

 ```bash
 export MAX_LENGTH=128
-export BERT_MODEL=bert-base-multilingual-cased
+export BERT_MODEL=google-bert/bert-base-multilingual-cased
 ```

 Run the pre-processing script on training, dev and test datasets:
@@ -92,7 +92,7 @@ Instead of passing all parameters via commandline arguments, the `run_ner.py` sc
 {
    "data_dir": ".",
    "labels": "./labels.txt",
-    "model_name_or_path": "bert-base-multilingual-cased",
+    "model_name_or_path": "google-bert/bert-base-multilingual-cased",
    "output_dir": "germeval-model",
    "max_seq_length": 128,
    "num_train_epochs": 3,
@@ -222,7 +222,7 @@ Let's define some variables that we need for further pre-processing steps:

 ```bash
 export MAX_LENGTH=128
-export BERT_MODEL=bert-large-cased
+export BERT_MODEL=google-bert/bert-large-cased
 ```

 Here we use the English BERT large model for fine-tuning.
@@ -250,7 +250,7 @@ This configuration file looks like:
 {
    "data_dir": "./data_wnut_17",
    "labels": "./data_wnut_17/labels.txt",
-    "model_name_or_path": "bert-large-cased",
+    "model_name_or_path": "google-bert/bert-large-cased",
    "output_dir": "wnut-17-model-1",
    "max_seq_length": 128,
    "num_train_epochs": 3,
--- a/examples/legacy/token-classification/utils_ner.py
+++ b/examples/legacy/token-classification/utils_ner.py
@@ -113,7 +113,7 @@ class TokenClassificationTask:
            for word, label in zip(example.words, example.labels):
                word_tokens = tokenizer.tokenize(word)

-                # bert-base-multilingual-cased sometimes output "nothing ([]) when calling tokenize with just a space.
+                # google-bert/bert-base-multilingual-cased sometimes output "nothing ([]) when calling tokenize with just a space.
                if len(word_tokens) > 0:
                    tokens.extend(word_tokens)
                    # Use the real label id for the first token of the word, and padding ids for the remaining tokens