Update all references to canonical models (#29001)
* Script & Manual edition * Update
This commit is contained in:
@@ -34,7 +34,7 @@ Next, we create a [FlaxVisionEncoderDecoderModel](https://huggingface.co/docs/tr
|
||||
python3 create_model_from_encoder_decoder_models.py \
|
||||
--output_dir model \
|
||||
--encoder_model_name_or_path google/vit-base-patch16-224-in21k \
|
||||
--decoder_model_name_or_path gpt2
|
||||
--decoder_model_name_or_path openai-community/gpt2
|
||||
```
|
||||
|
||||
### Train the model
|
||||
|
||||
@@ -28,7 +28,7 @@ way which enables simple and efficient model parallelism.
|
||||
In the following, we demonstrate how to train a bi-directional transformer model
|
||||
using masked language modeling objective as introduced in [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
|
||||
More specifically, we demonstrate how JAX/Flax can be leveraged
|
||||
to pre-train [**`roberta-base`**](https://huggingface.co/roberta-base)
|
||||
to pre-train [**`FacebookAI/roberta-base`**](https://huggingface.co/FacebookAI/roberta-base)
|
||||
in Norwegian on a single TPUv3-8 pod.
|
||||
|
||||
The example script uses the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.
|
||||
@@ -76,13 +76,13 @@ tokenizer.save("./norwegian-roberta-base/tokenizer.json")
|
||||
### Create configuration
|
||||
|
||||
Next, we create the model's configuration file. This is as simple
|
||||
as loading and storing [`**roberta-base**`](https://huggingface.co/roberta-base)
|
||||
as loading and storing [`**FacebookAI/roberta-base**`](https://huggingface.co/FacebookAI/roberta-base)
|
||||
in the local model folder:
|
||||
|
||||
```python
|
||||
from transformers import RobertaConfig
|
||||
|
||||
config = RobertaConfig.from_pretrained("roberta-base", vocab_size=50265)
|
||||
config = RobertaConfig.from_pretrained("FacebookAI/roberta-base", vocab_size=50265)
|
||||
config.save_pretrained("./norwegian-roberta-base")
|
||||
```
|
||||
|
||||
@@ -129,8 +129,8 @@ look at [this](https://colab.research.google.com/github/huggingface/notebooks/bl
|
||||
|
||||
In the following, we demonstrate how to train an auto-regressive causal transformer model
|
||||
in JAX/Flax.
|
||||
More specifically, we pretrain a randomly initialized [**`gpt2`**](https://huggingface.co/gpt2) model in Norwegian on a single TPUv3-8.
|
||||
to pre-train 124M [**`gpt2`**](https://huggingface.co/gpt2)
|
||||
More specifically, we pretrain a randomly initialized [**`openai-community/gpt2`**](https://huggingface.co/openai-community/gpt2) model in Norwegian on a single TPUv3-8.
|
||||
to pre-train 124M [**`openai-community/gpt2`**](https://huggingface.co/openai-community/gpt2)
|
||||
in Norwegian on a single TPUv3-8 pod.
|
||||
|
||||
The example script uses the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.
|
||||
@@ -179,13 +179,13 @@ tokenizer.save("./norwegian-gpt2/tokenizer.json")
|
||||
### Create configuration
|
||||
|
||||
Next, we create the model's configuration file. This is as simple
|
||||
as loading and storing [`**gpt2**`](https://huggingface.co/gpt2)
|
||||
as loading and storing [`**openai-community/gpt2**`](https://huggingface.co/openai-community/gpt2)
|
||||
in the local model folder:
|
||||
|
||||
```python
|
||||
from transformers import GPT2Config
|
||||
|
||||
config = GPT2Config.from_pretrained("gpt2", resid_pdrop=0.0, embd_pdrop=0.0, attn_pdrop=0.0, vocab_size=50257)
|
||||
config = GPT2Config.from_pretrained("openai-community/gpt2", resid_pdrop=0.0, embd_pdrop=0.0, attn_pdrop=0.0, vocab_size=50257)
|
||||
config.save_pretrained("./norwegian-gpt2")
|
||||
```
|
||||
|
||||
@@ -199,7 +199,7 @@ Finally, we can run the example script to pretrain the model:
|
||||
```bash
|
||||
python run_clm_flax.py \
|
||||
--output_dir="./norwegian-gpt2" \
|
||||
--model_type="gpt2" \
|
||||
--model_type="openai-community/gpt2" \
|
||||
--config_name="./norwegian-gpt2" \
|
||||
--tokenizer_name="./norwegian-gpt2" \
|
||||
--dataset_name="oscar" \
|
||||
|
||||
@@ -29,7 +29,7 @@ The following example fine-tunes BERT on SQuAD:
|
||||
|
||||
```bash
|
||||
python run_qa.py \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--model_name_or_path google-bert/bert-base-uncased \
|
||||
--dataset_name squad \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
@@ -67,7 +67,7 @@ Here is an example training on 4 TITAN RTX GPUs and Bert Whole Word Masking unca
|
||||
```bash
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
python run_qa.py \
|
||||
--model_name_or_path bert-large-uncased-whole-word-masking \
|
||||
--model_name_or_path google-bert/bert-large-uncased-whole-word-masking \
|
||||
--dataset_name squad \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
|
||||
@@ -78,7 +78,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_glue.py
|
||||
--model_name_or_path distilbert-base-uncased
|
||||
--model_name_or_path distilbert/distilbert-base-uncased
|
||||
--output_dir {tmp_dir}
|
||||
--train_file ./tests/fixtures/tests_samples/MRPC/train.csv
|
||||
--validation_file ./tests/fixtures/tests_samples/MRPC/dev.csv
|
||||
@@ -101,7 +101,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_clm_flax.py
|
||||
--model_name_or_path distilgpt2
|
||||
--model_name_or_path distilbert/distilgpt2
|
||||
--train_file ./tests/fixtures/sample_text.txt
|
||||
--validation_file ./tests/fixtures/sample_text.txt
|
||||
--do_train
|
||||
@@ -125,7 +125,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_summarization.py
|
||||
--model_name_or_path t5-small
|
||||
--model_name_or_path google-t5/t5-small
|
||||
--train_file tests/fixtures/tests_samples/xsum/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/xsum/sample.json
|
||||
--test_file tests/fixtures/tests_samples/xsum/sample.json
|
||||
@@ -155,7 +155,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_mlm.py
|
||||
--model_name_or_path distilroberta-base
|
||||
--model_name_or_path distilbert/distilroberta-base
|
||||
--train_file ./tests/fixtures/sample_text.txt
|
||||
--validation_file ./tests/fixtures/sample_text.txt
|
||||
--output_dir {tmp_dir}
|
||||
@@ -179,7 +179,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_t5_mlm_flax.py
|
||||
--model_name_or_path t5-small
|
||||
--model_name_or_path google-t5/t5-small
|
||||
--train_file ./tests/fixtures/sample_text.txt
|
||||
--validation_file ./tests/fixtures/sample_text.txt
|
||||
--do_train
|
||||
@@ -206,7 +206,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_flax_ner.py
|
||||
--model_name_or_path bert-base-uncased
|
||||
--model_name_or_path google-bert/bert-base-uncased
|
||||
--train_file tests/fixtures/tests_samples/conll/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/conll/sample.json
|
||||
--output_dir {tmp_dir}
|
||||
@@ -233,7 +233,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_qa.py
|
||||
--model_name_or_path bert-base-uncased
|
||||
--model_name_or_path google-bert/bert-base-uncased
|
||||
--version_2_with_negative
|
||||
--train_file tests/fixtures/tests_samples/SQUAD/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/SQUAD/sample.json
|
||||
|
||||
@@ -31,7 +31,7 @@ GLUE is made up of a total of 9 different tasks. Here is how to run the script o
|
||||
export TASK_NAME=mrpc
|
||||
|
||||
python run_flax_glue.py \
|
||||
--model_name_or_path bert-base-cased \
|
||||
--model_name_or_path google-bert/bert-base-cased \
|
||||
--task_name ${TASK_NAME} \
|
||||
--max_seq_length 128 \
|
||||
--learning_rate 2e-5 \
|
||||
|
||||
@@ -25,7 +25,7 @@ The following example fine-tunes BERT on CoNLL-2003:
|
||||
|
||||
```bash
|
||||
python run_flax_ner.py \
|
||||
--model_name_or_path bert-base-cased \
|
||||
--model_name_or_path google-bert/bert-base-cased \
|
||||
--dataset_name conll2003 \
|
||||
--max_seq_length 128 \
|
||||
--learning_rate 2e-5 \
|
||||
|
||||
Reference in New Issue
Block a user