Update all references to canonical models (#29001)

* Script & Manual edition

* Update
This commit is contained in:
Lysandre Debut
2024-02-16 08:16:58 +01:00
committed by GitHub
parent 1e402b957d
commit f497f564bb
561 changed files with 2682 additions and 2687 deletions

View File

@@ -34,7 +34,7 @@ Next, we create a [FlaxVisionEncoderDecoderModel](https://huggingface.co/docs/tr
python3 create_model_from_encoder_decoder_models.py \
--output_dir model \
--encoder_model_name_or_path google/vit-base-patch16-224-in21k \
--decoder_model_name_or_path gpt2
--decoder_model_name_or_path openai-community/gpt2
```
### Train the model

View File

@@ -28,7 +28,7 @@ way which enables simple and efficient model parallelism.
In the following, we demonstrate how to train a bi-directional transformer model
using masked language modeling objective as introduced in [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
More specifically, we demonstrate how JAX/Flax can be leveraged
to pre-train [**`roberta-base`**](https://huggingface.co/roberta-base)
to pre-train [**`FacebookAI/roberta-base`**](https://huggingface.co/FacebookAI/roberta-base)
in Norwegian on a single TPUv3-8 pod.
The example script uses the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.
@@ -76,13 +76,13 @@ tokenizer.save("./norwegian-roberta-base/tokenizer.json")
### Create configuration
Next, we create the model's configuration file. This is as simple
as loading and storing [`**roberta-base**`](https://huggingface.co/roberta-base)
as loading and storing [`**FacebookAI/roberta-base**`](https://huggingface.co/FacebookAI/roberta-base)
in the local model folder:
```python
from transformers import RobertaConfig
config = RobertaConfig.from_pretrained("roberta-base", vocab_size=50265)
config = RobertaConfig.from_pretrained("FacebookAI/roberta-base", vocab_size=50265)
config.save_pretrained("./norwegian-roberta-base")
```
@@ -129,8 +129,8 @@ look at [this](https://colab.research.google.com/github/huggingface/notebooks/bl
In the following, we demonstrate how to train an auto-regressive causal transformer model
in JAX/Flax.
More specifically, we pretrain a randomly initialized [**`gpt2`**](https://huggingface.co/gpt2) model in Norwegian on a single TPUv3-8.
to pre-train 124M [**`gpt2`**](https://huggingface.co/gpt2)
More specifically, we pretrain a randomly initialized [**`openai-community/gpt2`**](https://huggingface.co/openai-community/gpt2) model in Norwegian on a single TPUv3-8.
to pre-train 124M [**`openai-community/gpt2`**](https://huggingface.co/openai-community/gpt2)
in Norwegian on a single TPUv3-8 pod.
The example script uses the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.
@@ -179,13 +179,13 @@ tokenizer.save("./norwegian-gpt2/tokenizer.json")
### Create configuration
Next, we create the model's configuration file. This is as simple
as loading and storing [`**gpt2**`](https://huggingface.co/gpt2)
as loading and storing [`**openai-community/gpt2**`](https://huggingface.co/openai-community/gpt2)
in the local model folder:
```python
from transformers import GPT2Config
config = GPT2Config.from_pretrained("gpt2", resid_pdrop=0.0, embd_pdrop=0.0, attn_pdrop=0.0, vocab_size=50257)
config = GPT2Config.from_pretrained("openai-community/gpt2", resid_pdrop=0.0, embd_pdrop=0.0, attn_pdrop=0.0, vocab_size=50257)
config.save_pretrained("./norwegian-gpt2")
```
@@ -199,7 +199,7 @@ Finally, we can run the example script to pretrain the model:
```bash
python run_clm_flax.py \
--output_dir="./norwegian-gpt2" \
--model_type="gpt2" \
--model_type="openai-community/gpt2" \
--config_name="./norwegian-gpt2" \
--tokenizer_name="./norwegian-gpt2" \
--dataset_name="oscar" \

View File

@@ -29,7 +29,7 @@ The following example fine-tunes BERT on SQuAD:
```bash
python run_qa.py \
--model_name_or_path bert-base-uncased \
--model_name_or_path google-bert/bert-base-uncased \
--dataset_name squad \
--do_train \
--do_eval \
@@ -67,7 +67,7 @@ Here is an example training on 4 TITAN RTX GPUs and Bert Whole Word Masking unca
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3
python run_qa.py \
--model_name_or_path bert-large-uncased-whole-word-masking \
--model_name_or_path google-bert/bert-large-uncased-whole-word-masking \
--dataset_name squad \
--do_train \
--do_eval \

View File

@@ -78,7 +78,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_glue.py
--model_name_or_path distilbert-base-uncased
--model_name_or_path distilbert/distilbert-base-uncased
--output_dir {tmp_dir}
--train_file ./tests/fixtures/tests_samples/MRPC/train.csv
--validation_file ./tests/fixtures/tests_samples/MRPC/dev.csv
@@ -101,7 +101,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_clm_flax.py
--model_name_or_path distilgpt2
--model_name_or_path distilbert/distilgpt2
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--do_train
@@ -125,7 +125,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_summarization.py
--model_name_or_path t5-small
--model_name_or_path google-t5/t5-small
--train_file tests/fixtures/tests_samples/xsum/sample.json
--validation_file tests/fixtures/tests_samples/xsum/sample.json
--test_file tests/fixtures/tests_samples/xsum/sample.json
@@ -155,7 +155,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_mlm.py
--model_name_or_path distilroberta-base
--model_name_or_path distilbert/distilroberta-base
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--output_dir {tmp_dir}
@@ -179,7 +179,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_t5_mlm_flax.py
--model_name_or_path t5-small
--model_name_or_path google-t5/t5-small
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--do_train
@@ -206,7 +206,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_flax_ner.py
--model_name_or_path bert-base-uncased
--model_name_or_path google-bert/bert-base-uncased
--train_file tests/fixtures/tests_samples/conll/sample.json
--validation_file tests/fixtures/tests_samples/conll/sample.json
--output_dir {tmp_dir}
@@ -233,7 +233,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_qa.py
--model_name_or_path bert-base-uncased
--model_name_or_path google-bert/bert-base-uncased
--version_2_with_negative
--train_file tests/fixtures/tests_samples/SQUAD/sample.json
--validation_file tests/fixtures/tests_samples/SQUAD/sample.json

View File

@@ -31,7 +31,7 @@ GLUE is made up of a total of 9 different tasks. Here is how to run the script o
export TASK_NAME=mrpc
python run_flax_glue.py \
--model_name_or_path bert-base-cased \
--model_name_or_path google-bert/bert-base-cased \
--task_name ${TASK_NAME} \
--max_seq_length 128 \
--learning_rate 2e-5 \

View File

@@ -25,7 +25,7 @@ The following example fine-tunes BERT on CoNLL-2003:
```bash
python run_flax_ner.py \
--model_name_or_path bert-base-cased \
--model_name_or_path google-bert/bert-base-cased \
--dataset_name conll2003 \
--max_seq_length 128 \
--learning_rate 2e-5 \