Update all references to canonical models (#29001)

* Script & Manual edition

* Update
This commit is contained in:
Lysandre Debut
2024-02-16 08:16:58 +01:00
committed by GitHub
parent 1e402b957d
commit f497f564bb
561 changed files with 2682 additions and 2687 deletions

View File

@@ -118,8 +118,8 @@ pip install runhouse
# For an on-demand V100 with whichever cloud provider you have configured:
python run_on_remote.py \
--example pytorch/text-generation/run_generation.py \
--model_type=gpt2 \
--model_name_or_path=gpt2 \
--model_type=openai-community/gpt2 \
--model_name_or_path=openai-community/gpt2 \
--prompt "I am a language model and"
# For byo (bring your own) cluster:

View File

@@ -34,7 +34,7 @@ Next, we create a [FlaxVisionEncoderDecoderModel](https://huggingface.co/docs/tr
python3 create_model_from_encoder_decoder_models.py \
--output_dir model \
--encoder_model_name_or_path google/vit-base-patch16-224-in21k \
--decoder_model_name_or_path gpt2
--decoder_model_name_or_path openai-community/gpt2
```
### Train the model

View File

@@ -28,7 +28,7 @@ way which enables simple and efficient model parallelism.
In the following, we demonstrate how to train a bi-directional transformer model
using masked language modeling objective as introduced in [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
More specifically, we demonstrate how JAX/Flax can be leveraged
to pre-train [**`roberta-base`**](https://huggingface.co/roberta-base)
to pre-train [**`FacebookAI/roberta-base`**](https://huggingface.co/FacebookAI/roberta-base)
in Norwegian on a single TPUv3-8 pod.
The example script uses the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.
@@ -76,13 +76,13 @@ tokenizer.save("./norwegian-roberta-base/tokenizer.json")
### Create configuration
Next, we create the model's configuration file. This is as simple
as loading and storing [`**roberta-base**`](https://huggingface.co/roberta-base)
as loading and storing [`**FacebookAI/roberta-base**`](https://huggingface.co/FacebookAI/roberta-base)
in the local model folder:
```python
from transformers import RobertaConfig
config = RobertaConfig.from_pretrained("roberta-base", vocab_size=50265)
config = RobertaConfig.from_pretrained("FacebookAI/roberta-base", vocab_size=50265)
config.save_pretrained("./norwegian-roberta-base")
```
@@ -129,8 +129,8 @@ look at [this](https://colab.research.google.com/github/huggingface/notebooks/bl
In the following, we demonstrate how to train an auto-regressive causal transformer model
in JAX/Flax.
More specifically, we pretrain a randomly initialized [**`gpt2`**](https://huggingface.co/gpt2) model in Norwegian on a single TPUv3-8.
to pre-train 124M [**`gpt2`**](https://huggingface.co/gpt2)
More specifically, we pretrain a randomly initialized [**`openai-community/gpt2`**](https://huggingface.co/openai-community/gpt2) model in Norwegian on a single TPUv3-8.
to pre-train 124M [**`openai-community/gpt2`**](https://huggingface.co/openai-community/gpt2)
in Norwegian on a single TPUv3-8 pod.
The example script uses the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.
@@ -179,13 +179,13 @@ tokenizer.save("./norwegian-gpt2/tokenizer.json")
### Create configuration
Next, we create the model's configuration file. This is as simple
as loading and storing [`**gpt2**`](https://huggingface.co/gpt2)
as loading and storing [`**openai-community/gpt2**`](https://huggingface.co/openai-community/gpt2)
in the local model folder:
```python
from transformers import GPT2Config
config = GPT2Config.from_pretrained("gpt2", resid_pdrop=0.0, embd_pdrop=0.0, attn_pdrop=0.0, vocab_size=50257)
config = GPT2Config.from_pretrained("openai-community/gpt2", resid_pdrop=0.0, embd_pdrop=0.0, attn_pdrop=0.0, vocab_size=50257)
config.save_pretrained("./norwegian-gpt2")
```
@@ -199,7 +199,7 @@ Finally, we can run the example script to pretrain the model:
```bash
python run_clm_flax.py \
--output_dir="./norwegian-gpt2" \
--model_type="gpt2" \
--model_type="openai-community/gpt2" \
--config_name="./norwegian-gpt2" \
--tokenizer_name="./norwegian-gpt2" \
--dataset_name="oscar" \

View File

@@ -29,7 +29,7 @@ The following example fine-tunes BERT on SQuAD:
```bash
python run_qa.py \
--model_name_or_path bert-base-uncased \
--model_name_or_path google-bert/bert-base-uncased \
--dataset_name squad \
--do_train \
--do_eval \
@@ -67,7 +67,7 @@ Here is an example training on 4 TITAN RTX GPUs and Bert Whole Word Masking unca
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3
python run_qa.py \
--model_name_or_path bert-large-uncased-whole-word-masking \
--model_name_or_path google-bert/bert-large-uncased-whole-word-masking \
--dataset_name squad \
--do_train \
--do_eval \

View File

@@ -78,7 +78,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_glue.py
--model_name_or_path distilbert-base-uncased
--model_name_or_path distilbert/distilbert-base-uncased
--output_dir {tmp_dir}
--train_file ./tests/fixtures/tests_samples/MRPC/train.csv
--validation_file ./tests/fixtures/tests_samples/MRPC/dev.csv
@@ -101,7 +101,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_clm_flax.py
--model_name_or_path distilgpt2
--model_name_or_path distilbert/distilgpt2
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--do_train
@@ -125,7 +125,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_summarization.py
--model_name_or_path t5-small
--model_name_or_path google-t5/t5-small
--train_file tests/fixtures/tests_samples/xsum/sample.json
--validation_file tests/fixtures/tests_samples/xsum/sample.json
--test_file tests/fixtures/tests_samples/xsum/sample.json
@@ -155,7 +155,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_mlm.py
--model_name_or_path distilroberta-base
--model_name_or_path distilbert/distilroberta-base
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--output_dir {tmp_dir}
@@ -179,7 +179,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_t5_mlm_flax.py
--model_name_or_path t5-small
--model_name_or_path google-t5/t5-small
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--do_train
@@ -206,7 +206,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_flax_ner.py
--model_name_or_path bert-base-uncased
--model_name_or_path google-bert/bert-base-uncased
--train_file tests/fixtures/tests_samples/conll/sample.json
--validation_file tests/fixtures/tests_samples/conll/sample.json
--output_dir {tmp_dir}
@@ -233,7 +233,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_qa.py
--model_name_or_path bert-base-uncased
--model_name_or_path google-bert/bert-base-uncased
--version_2_with_negative
--train_file tests/fixtures/tests_samples/SQUAD/sample.json
--validation_file tests/fixtures/tests_samples/SQUAD/sample.json

View File

@@ -31,7 +31,7 @@ GLUE is made up of a total of 9 different tasks. Here is how to run the script o
export TASK_NAME=mrpc
python run_flax_glue.py \
--model_name_or_path bert-base-cased \
--model_name_or_path google-bert/bert-base-cased \
--task_name ${TASK_NAME} \
--max_seq_length 128 \
--learning_rate 2e-5 \

View File

@@ -25,7 +25,7 @@ The following example fine-tunes BERT on CoNLL-2003:
```bash
python run_flax_ner.py \
--model_name_or_path bert-base-cased \
--model_name_or_path google-bert/bert-base-cased \
--dataset_name conll2003 \
--max_seq_length 128 \
--learning_rate 2e-5 \

View File

@@ -22,5 +22,5 @@ If you would like to list benchmark results on your favorite models of the [mode
| Benchmark description | Results | Environment info | Author |
|:----------|:-------------|:-------------|------:|
| PyTorch Benchmark on inference for `bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
| PyTorch Benchmark on inference for `bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
| PyTorch Benchmark on inference for `google-bert/bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
| PyTorch Benchmark on inference for `google-bert/bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |

View File

@@ -1,7 +1,7 @@
#### Fine-tuning BERT on SQuAD1.0 with relative position embeddings
The following examples show how to fine-tune BERT models with different relative position embeddings. The BERT model
`bert-base-uncased` was pretrained with default absolute position embeddings. We provide the following pretrained
`google-bert/bert-base-uncased` was pretrained with default absolute position embeddings. We provide the following pretrained
models which were pre-trained on the same training data (BooksCorpus and English Wikipedia) as in the BERT model
training, but with different relative position embeddings.
@@ -10,7 +10,7 @@ Shaw et al., [Self-Attention with Relative Position Representations](https://arx
* `zhiheng-huang/bert-base-uncased-embedding-relative-key-query`, trained from scratch with relative embedding method 4
in Huang et al. [Improve Transformer Models with Better Relative Position Embeddings](https://arxiv.org/abs/2009.13658)
* `zhiheng-huang/bert-large-uncased-whole-word-masking-embedding-relative-key-query`, fine-tuned from model
`bert-large-uncased-whole-word-masking` with 3 additional epochs with relative embedding method 4 in Huang et al.
`google-bert/bert-large-uncased-whole-word-masking` with 3 additional epochs with relative embedding method 4 in Huang et al.
[Improve Transformer Models with Better Relative Position Embeddings](https://arxiv.org/abs/2009.13658)
@@ -61,7 +61,7 @@ torchrun --nproc_per_node=8 ./examples/question-answering/run_squad.py \
--gradient_accumulation_steps 3
```
Training with the above command leads to the f1 score of 93.52, which is slightly better than the f1 score of 93.15 for
`bert-large-uncased-whole-word-masking`.
`google-bert/bert-large-uncased-whole-word-masking`.
#### Distributed training
@@ -69,7 +69,7 @@ Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word
```bash
torchrun --nproc_per_node=8 ./examples/question-answering/run_squad.py \
--model_name_or_path bert-large-uncased-whole-word-masking \
--model_name_or_path google-bert/bert-large-uncased-whole-word-masking \
--dataset_name squad \
--do_train \
--do_eval \
@@ -90,7 +90,7 @@ exact_match = 86.91
```
This fine-tuned model is available as a checkpoint under the reference
[`bert-large-uncased-whole-word-masking-finetuned-squad`](https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad).
[`google-bert/bert-large-uncased-whole-word-masking-finetuned-squad`](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad).
## Results

View File

@@ -39,8 +39,8 @@ def fill_mask(masked_input, model, tokenizer, topk=5):
return topk_filled_outputs
tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
model = CamembertForMaskedLM.from_pretrained("camembert-base")
tokenizer = CamembertTokenizer.from_pretrained("almanach/camembert-base")
model = CamembertForMaskedLM.from_pretrained("almanach/camembert-base")
model.eval()
masked_input = "Le camembert est <mask> :)"

View File

@@ -20,7 +20,7 @@
This script with default values fine-tunes and evaluate a pretrained OpenAI GPT on the RocStories dataset:
python run_openai_gpt.py \
--model_name openai-gpt \
--model_name openai-community/openai-gpt \
--do_train \
--do_eval \
--train_dataset "$ROC_STORIES_DIR/cloze_test_val__spring2016 - cloze_test_ALL_val.csv" \
@@ -104,7 +104,7 @@ def pre_process_datasets(encoded_datasets, input_len, cap_length, start_token, d
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--model_name", type=str, default="openai-gpt", help="pretrained model name")
parser.add_argument("--model_name", type=str, default="openai-community/openai-gpt", help="pretrained model name")
parser.add_argument("--do_train", action="store_true", help="Whether to run training.")
parser.add_argument("--do_eval", action="store_true", help="Whether to run eval on the dev set.")
parser.add_argument(

View File

@@ -40,7 +40,7 @@ logger = logging.getLogger(__name__)
def main():
parser = argparse.ArgumentParser(description="PyTorch Transformer Language Model")
parser.add_argument("--model_name", type=str, default="transfo-xl-wt103", help="pretrained model name")
parser.add_argument("--model_name", type=str, default="transfo-xl/transfo-xl-wt103", help="pretrained model name")
parser.add_argument(
"--split", type=str, default="test", choices=["all", "valid", "test"], help="which split to evaluate"
)

View File

@@ -170,7 +170,7 @@ If 'translation' is in your task name, the computed metric will be BLEU. Otherwi
For t5, you need to specify --task translation_{src}_to_{tgt} as follows:
```bash
export DATA_DIR=wmt_en_ro
./run_eval.py t5-base \
./run_eval.py google-t5/t5-base \
$DATA_DIR/val.source t5_val_generations.txt \
--reference_path $DATA_DIR/val.target \
--score_path enro_bleu.json \

View File

@@ -28,7 +28,7 @@ from transformers.testing_utils import TestCasePlus, slow
from utils import FAIRSEQ_AVAILABLE, DistributedSortishSampler, LegacySeq2SeqDataset, Seq2SeqDataset
BERT_BASE_CASED = "bert-base-cased"
BERT_BASE_CASED = "google-bert/bert-base-cased"
PEGASUS_XSUM = "google/pegasus-xsum"
ARTICLES = [" Sam ate lunch today.", "Sams lunch ingredients."]
SUMMARIES = ["A very interesting story about what I ate for lunch.", "Avocado, celery, turkey, coffee"]

View File

@@ -74,7 +74,7 @@ def pack_data_dir(tok, data_dir: Path, max_tokens, save_path):
def packer_cli():
parser = argparse.ArgumentParser()
parser.add_argument("--tok_name", type=str, help="like facebook/bart-large-cnn,t5-base, etc.")
parser.add_argument("--tok_name", type=str, help="like facebook/bart-large-cnn,google-t5/t5-base, etc.")
parser.add_argument("--max_seq_len", type=int, default=128)
parser.add_argument("--data_dir", type=str)
parser.add_argument("--save_path", type=str)

View File

@@ -124,7 +124,7 @@ def run_generate():
parser.add_argument(
"--model_name",
type=str,
help="like facebook/bart-large-cnn,t5-base, etc.",
help="like facebook/bart-large-cnn,google-t5/t5-base, etc.",
default="sshleifer/distilbart-xsum-12-3",
)
parser.add_argument("--save_dir", type=str, help="where to save", default="tmp_gen")

View File

@@ -100,7 +100,7 @@ def run_generate(verbose=True):
"""
parser = argparse.ArgumentParser()
parser.add_argument("model_name", type=str, help="like facebook/bart-large-cnn,t5-base, etc.")
parser.add_argument("model_name", type=str, help="like facebook/bart-large-cnn,google-t5/t5-base, etc.")
parser.add_argument("input_path", type=str, help="like cnn_dm/test.source")
parser.add_argument("save_path", type=str, help="where to save summaries")
parser.add_argument("--reference_path", type=str, required=False, help="like cnn_dm/test.target")

View File

@@ -34,7 +34,7 @@ Let's define some variables that we need for further pre-processing steps and tr
```bash
export MAX_LENGTH=128
export BERT_MODEL=bert-base-multilingual-cased
export BERT_MODEL=google-bert/bert-base-multilingual-cased
```
Run the pre-processing script on training, dev and test datasets:
@@ -92,7 +92,7 @@ Instead of passing all parameters via commandline arguments, the `run_ner.py` sc
{
"data_dir": ".",
"labels": "./labels.txt",
"model_name_or_path": "bert-base-multilingual-cased",
"model_name_or_path": "google-bert/bert-base-multilingual-cased",
"output_dir": "germeval-model",
"max_seq_length": 128,
"num_train_epochs": 3,
@@ -222,7 +222,7 @@ Let's define some variables that we need for further pre-processing steps:
```bash
export MAX_LENGTH=128
export BERT_MODEL=bert-large-cased
export BERT_MODEL=google-bert/bert-large-cased
```
Here we use the English BERT large model for fine-tuning.
@@ -250,7 +250,7 @@ This configuration file looks like:
{
"data_dir": "./data_wnut_17",
"labels": "./data_wnut_17/labels.txt",
"model_name_or_path": "bert-large-cased",
"model_name_or_path": "google-bert/bert-large-cased",
"output_dir": "wnut-17-model-1",
"max_seq_length": 128,
"num_train_epochs": 3,

View File

@@ -113,7 +113,7 @@ class TokenClassificationTask:
for word, label in zip(example.words, example.labels):
word_tokens = tokenizer.tokenize(word)
# bert-base-multilingual-cased sometimes output "nothing ([]) when calling tokenize with just a space.
# google-bert/bert-base-multilingual-cased sometimes output "nothing ([]) when calling tokenize with just a space.
if len(word_tokens) > 0:
tokens.extend(word_tokens)
# Use the real label id for the first token of the word, and padding ids for the remaining tokens

View File

@@ -109,7 +109,7 @@ classification MNLI task using the `run_glue` script, with 8 GPUs:
```bash
torchrun \
--nproc_per_node 8 pytorch/text-classification/run_glue.py \
--model_name_or_path bert-large-uncased-whole-word-masking \
--model_name_or_path google-bert/bert-large-uncased-whole-word-masking \
--task_name mnli \
--do_train \
--do_eval \
@@ -153,7 +153,7 @@ classification MNLI task using the `run_glue` script, with 8 TPUs (from this fol
```bash
python xla_spawn.py --num_cores 8 \
text-classification/run_glue.py \
--model_name_or_path bert-large-uncased-whole-word-masking \
--model_name_or_path google-bert/bert-large-uncased-whole-word-masking \
--task_name mnli \
--do_train \
--do_eval \

View File

@@ -64,10 +64,10 @@ from transformers import (
)
model = VisionTextDualEncoderModel.from_vision_text_pretrained(
"openai/clip-vit-base-patch32", "roberta-base"
"openai/clip-vit-base-patch32", "FacebookAI/roberta-base"
)
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
tokenizer = AutoTokenizer.from_pretrained("FacebookAI/roberta-base")
image_processor = AutoImageProcessor.from_pretrained("openai/clip-vit-base-patch32")
processor = VisionTextDualEncoderProcessor(image_processor, tokenizer)

View File

@@ -36,7 +36,7 @@ the tokenization). The loss here is that of causal language modeling.
```bash
python run_clm.py \
--model_name_or_path gpt2 \
--model_name_or_path openai-community/gpt2 \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--per_device_train_batch_size 8 \
@@ -53,7 +53,7 @@ To run on your own training and validation files, use the following command:
```bash
python run_clm.py \
--model_name_or_path gpt2 \
--model_name_or_path openai-community/gpt2 \
--train_file path_to_train_file \
--validation_file path_to_validation_file \
--per_device_train_batch_size 8 \
@@ -69,7 +69,7 @@ This uses the built in HuggingFace `Trainer` for training. If you want to use a
python run_clm_no_trainer.py \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--model_name_or_path gpt2 \
--model_name_or_path openai-community/gpt2 \
--output_dir /tmp/test-clm
```
@@ -84,7 +84,7 @@ converge slightly slower (over-fitting takes more epochs).
```bash
python run_mlm.py \
--model_name_or_path roberta-base \
--model_name_or_path FacebookAI/roberta-base \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--per_device_train_batch_size 8 \
@@ -98,7 +98,7 @@ To run on your own training and validation files, use the following command:
```bash
python run_mlm.py \
--model_name_or_path roberta-base \
--model_name_or_path FacebookAI/roberta-base \
--train_file path_to_train_file \
--validation_file path_to_validation_file \
--per_device_train_batch_size 8 \
@@ -117,7 +117,7 @@ This uses the built in HuggingFace `Trainer` for training. If you want to use a
python run_mlm_no_trainer.py \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--model_name_or_path roberta-base \
--model_name_or_path FacebookAI/roberta-base \
--output_dir /tmp/test-mlm
```
@@ -144,7 +144,7 @@ Here is how to fine-tune XLNet on wikitext-2:
```bash
python run_plm.py \
--model_name_or_path=xlnet-base-cased \
--model_name_or_path=xlnet/xlnet-base-cased \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--per_device_train_batch_size 8 \
@@ -158,7 +158,7 @@ To fine-tune it on your own training and validation file, run:
```bash
python run_plm.py \
--model_name_or_path=xlnet-base-cased \
--model_name_or_path=xlnet/xlnet-base-cased \
--train_file path_to_train_file \
--validation_file path_to_validation_file \
--per_device_train_batch_size 8 \
@@ -188,7 +188,7 @@ When training a model from scratch, configuration values may be overridden with
```bash
python run_clm.py --model_type gpt2 --tokenizer_name gpt2 \ --config_overrides="n_embd=1024,n_head=16,n_layer=48,n_positions=102" \
python run_clm.py --model_type openai-community/gpt2 --tokenizer_name openai-community/gpt2 \ --config_overrides="n_embd=1024,n_head=16,n_layer=48,n_positions=102" \
[...]
```

View File

@@ -22,7 +22,7 @@ limitations under the License.
```bash
python examples/multiple-choice/run_swag.py \
--model_name_or_path roberta-base \
--model_name_or_path FacebookAI/roberta-base \
--do_train \
--do_eval \
--learning_rate 5e-5 \
@@ -62,7 +62,7 @@ then
export DATASET_NAME=swag
python run_swag_no_trainer.py \
--model_name_or_path bert-base-cased \
--model_name_or_path google-bert/bert-base-cased \
--dataset_name $DATASET_NAME \
--max_seq_length 128 \
--per_device_train_batch_size 32 \
@@ -89,7 +89,7 @@ that will check everything is ready for training. Finally, you can launch traini
export DATASET_NAME=swag
accelerate launch run_swag_no_trainer.py \
--model_name_or_path bert-base-cased \
--model_name_or_path google-bert/bert-base-cased \
--dataset_name $DATASET_NAME \
--max_seq_length 128 \
--per_device_train_batch_size 32 \

View File

@@ -54,7 +54,7 @@ class TorchXLAExamplesTests(TestCasePlus):
./examples/pytorch/text-classification/run_glue.py
--num_cores=8
./examples/pytorch/text-classification/run_glue.py
--model_name_or_path distilbert-base-uncased
--model_name_or_path distilbert/distilbert-base-uncased
--output_dir {tmp_dir}
--overwrite_output_dir
--train_file ./tests/fixtures/tests_samples/MRPC/train.csv

View File

@@ -40,7 +40,7 @@ on a single tesla V100 16GB.
```bash
python run_qa.py \
--model_name_or_path bert-base-uncased \
--model_name_or_path google-bert/bert-base-uncased \
--dataset_name squad \
--do_train \
--do_eval \
@@ -67,7 +67,7 @@ The [`run_qa_beam_search.py`](https://github.com/huggingface/transformers/blob/m
```bash
python run_qa_beam_search.py \
--model_name_or_path xlnet-large-cased \
--model_name_or_path xlnet/xlnet-large-cased \
--dataset_name squad \
--do_train \
--do_eval \
@@ -87,7 +87,7 @@ python run_qa_beam_search.py \
export SQUAD_DIR=/path/to/SQUAD
python run_qa_beam_search.py \
--model_name_or_path xlnet-large-cased \
--model_name_or_path xlnet/xlnet-large-cased \
--dataset_name squad_v2 \
--do_train \
--do_eval \
@@ -111,7 +111,7 @@ This example code fine-tunes T5 on the SQuAD2.0 dataset.
```bash
python run_seq2seq_qa.py \
--model_name_or_path t5-small \
--model_name_or_path google-t5/t5-small \
--dataset_name squad_v2 \
--context_column context \
--question_column question \
@@ -143,7 +143,7 @@ then
```bash
python run_qa_no_trainer.py \
--model_name_or_path bert-base-uncased \
--model_name_or_path google-bert/bert-base-uncased \
--dataset_name squad \
--max_seq_length 384 \
--doc_stride 128 \
@@ -166,7 +166,7 @@ that will check everything is ready for training. Finally, you can launch traini
```bash
accelerate launch run_qa_no_trainer.py \
--model_name_or_path bert-base-uncased \
--model_name_or_path google-bert/bert-base-uncased \
--dataset_name squad \
--max_seq_length 384 \
--doc_stride 128 \

View File

@@ -41,7 +41,7 @@ and you also will find examples of these below.
Here is an example on a summarization task:
```bash
python examples/pytorch/summarization/run_summarization.py \
--model_name_or_path t5-small \
--model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -54,9 +54,9 @@ python examples/pytorch/summarization/run_summarization.py \
--predict_with_generate
```
Only T5 models `t5-small`, `t5-base`, `t5-large`, `t5-3b` and `t5-11b` must use an additional argument: `--source_prefix "summarize: "`.
Only T5 models `google-t5/t5-small`, `google-t5/t5-base`, `google-t5/t5-large`, `google-t5/t5-3b` and `google-t5/t5-11b` must use an additional argument: `--source_prefix "summarize: "`.
We used CNN/DailyMail dataset in this example as `t5-small` was trained on it and one can get good scores even when pre-training with a very small sample.
We used CNN/DailyMail dataset in this example as `google-t5/t5-small` was trained on it and one can get good scores even when pre-training with a very small sample.
Extreme Summarization (XSum) Dataset is another commonly used dataset for the task of summarization. To use it replace `--dataset_name cnn_dailymail --dataset_config "3.0.0"` with `--dataset_name xsum`.
@@ -65,7 +65,7 @@ And here is how you would use it on your own files, after adjusting the values f
```bash
python examples/pytorch/summarization/run_summarization.py \
--model_name_or_path t5-small \
--model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--train_file path_to_csv_or_jsonlines_file \
@@ -156,7 +156,7 @@ then
```bash
python run_summarization_no_trainer.py \
--model_name_or_path t5-small \
--model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--source_prefix "summarize: " \
@@ -179,7 +179,7 @@ that will check everything is ready for training. Finally, you can launch traini
```bash
accelerate launch run_summarization_no_trainer.py \
--model_name_or_path t5-small \
--model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--source_prefix "summarize: " \

View File

@@ -368,11 +368,11 @@ def main():
logger.info(f"Training/evaluation parameters {training_args}")
if data_args.source_prefix is None and model_args.model_name_or_path in [
"t5-small",
"t5-base",
"t5-large",
"t5-3b",
"t5-11b",
"google-t5/t5-small",
"google-t5/t5-base",
"google-t5/t5-large",
"google-t5/t5-3b",
"google-t5/t5-11b",
]:
logger.warning(
"You're running a t5 model but didn't provide a source prefix, which is the expected, e.g. with "

View File

@@ -339,11 +339,11 @@ def main():
accelerator = Accelerator(gradient_accumulation_steps=args.gradient_accumulation_steps, **accelerator_log_kwargs)
if args.source_prefix is None and args.model_name_or_path in [
"t5-small",
"t5-base",
"t5-large",
"t5-3b",
"t5-11b",
"google-t5/t5-small",
"google-t5/t5-base",
"google-t5/t5-large",
"google-t5/t5-3b",
"google-t5/t5-11b",
]:
logger.warning(
"You're running a t5 model but didn't provide a source prefix, which is the expected, e.g. with "

View File

@@ -80,7 +80,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
{self.examples_dir}/pytorch/text-classification/run_glue_no_trainer.py
--model_name_or_path distilbert-base-uncased
--model_name_or_path distilbert/distilbert-base-uncased
--output_dir {tmp_dir}
--train_file ./tests/fixtures/tests_samples/MRPC/train.csv
--validation_file ./tests/fixtures/tests_samples/MRPC/dev.csv
@@ -105,7 +105,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
{self.examples_dir}/pytorch/language-modeling/run_clm_no_trainer.py
--model_name_or_path distilgpt2
--model_name_or_path distilbert/distilgpt2
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--block_size 128
@@ -133,7 +133,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
{self.examples_dir}/pytorch/language-modeling/run_mlm_no_trainer.py
--model_name_or_path distilroberta-base
--model_name_or_path distilbert/distilroberta-base
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--output_dir {tmp_dir}
@@ -156,7 +156,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
{self.examples_dir}/pytorch/token-classification/run_ner_no_trainer.py
--model_name_or_path bert-base-uncased
--model_name_or_path google-bert/bert-base-uncased
--train_file tests/fixtures/tests_samples/conll/sample.json
--validation_file tests/fixtures/tests_samples/conll/sample.json
--output_dir {tmp_dir}
@@ -181,7 +181,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
{self.examples_dir}/pytorch/question-answering/run_qa_no_trainer.py
--model_name_or_path bert-base-uncased
--model_name_or_path google-bert/bert-base-uncased
--version_2_with_negative
--train_file tests/fixtures/tests_samples/SQUAD/sample.json
--validation_file tests/fixtures/tests_samples/SQUAD/sample.json
@@ -209,7 +209,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
{self.examples_dir}/pytorch/multiple-choice/run_swag_no_trainer.py
--model_name_or_path bert-base-uncased
--model_name_or_path google-bert/bert-base-uncased
--train_file tests/fixtures/tests_samples/swag/sample.json
--validation_file tests/fixtures/tests_samples/swag/sample.json
--output_dir {tmp_dir}
@@ -232,7 +232,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
{self.examples_dir}/pytorch/summarization/run_summarization_no_trainer.py
--model_name_or_path t5-small
--model_name_or_path google-t5/t5-small
--train_file tests/fixtures/tests_samples/xsum/sample.json
--validation_file tests/fixtures/tests_samples/xsum/sample.json
--output_dir {tmp_dir}

View File

@@ -99,7 +99,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_glue.py
--model_name_or_path distilbert-base-uncased
--model_name_or_path distilbert/distilbert-base-uncased
--output_dir {tmp_dir}
--overwrite_output_dir
--train_file ./tests/fixtures/tests_samples/MRPC/train.csv
@@ -127,7 +127,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_clm.py
--model_name_or_path distilgpt2
--model_name_or_path distilbert/distilgpt2
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--do_train
@@ -160,7 +160,7 @@ class ExamplesTests(TestCasePlus):
testargs = f"""
run_clm.py
--model_type gpt2
--tokenizer_name gpt2
--tokenizer_name openai-community/gpt2
--train_file ./tests/fixtures/sample_text.txt
--output_dir {tmp_dir}
--config_overrides n_embd=10,n_head=2
@@ -181,7 +181,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_mlm.py
--model_name_or_path distilroberta-base
--model_name_or_path distilbert/distilroberta-base
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--output_dir {tmp_dir}
@@ -207,7 +207,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_ner.py
--model_name_or_path bert-base-uncased
--model_name_or_path google-bert/bert-base-uncased
--train_file tests/fixtures/tests_samples/conll/sample.json
--validation_file tests/fixtures/tests_samples/conll/sample.json
--output_dir {tmp_dir}
@@ -235,7 +235,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_qa.py
--model_name_or_path bert-base-uncased
--model_name_or_path google-bert/bert-base-uncased
--version_2_with_negative
--train_file tests/fixtures/tests_samples/SQUAD/sample.json
--validation_file tests/fixtures/tests_samples/SQUAD/sample.json
@@ -260,7 +260,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_seq2seq_qa.py
--model_name_or_path t5-small
--model_name_or_path google-t5/t5-small
--context_column context
--question_column question
--answer_column answers
@@ -289,7 +289,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_swag.py
--model_name_or_path bert-base-uncased
--model_name_or_path google-bert/bert-base-uncased
--train_file tests/fixtures/tests_samples/swag/sample.json
--validation_file tests/fixtures/tests_samples/swag/sample.json
--output_dir {tmp_dir}
@@ -327,7 +327,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_summarization.py
--model_name_or_path t5-small
--model_name_or_path google-t5/t5-small
--train_file tests/fixtures/tests_samples/xsum/sample.json
--validation_file tests/fixtures/tests_samples/xsum/sample.json
--output_dir {tmp_dir}

View File

@@ -31,7 +31,7 @@ GLUE is made up of a total of 9 different tasks. Here is how to run the script o
export TASK_NAME=mrpc
python run_glue.py \
--model_name_or_path bert-base-cased \
--model_name_or_path google-bert/bert-base-cased \
--task_name $TASK_NAME \
--do_train \
--do_eval \
@@ -68,7 +68,7 @@ The following example fine-tunes BERT on the `imdb` dataset hosted on our [hub](
```bash
python run_glue.py \
--model_name_or_path bert-base-cased \
--model_name_or_path google-bert/bert-base-cased \
--dataset_name imdb \
--do_train \
--do_predict \
@@ -90,7 +90,7 @@ We can specify the metric, the label column and aso choose which text columns to
dataset="amazon_reviews_multi"
subset="en"
python run_classification.py \
--model_name_or_path bert-base-uncased \
--model_name_or_path google-bert/bert-base-uncased \
--dataset_name ${dataset} \
--dataset_config_name ${subset} \
--shuffle_train_dataset \
@@ -113,7 +113,7 @@ The following is a multi-label classification example. It fine-tunes BERT on the
dataset="reuters21578"
subset="ModApte"
python run_classification.py \
--model_name_or_path bert-base-uncased \
--model_name_or_path google-bert/bert-base-uncased \
--dataset_name ${dataset} \
--dataset_config_name ${subset} \
--shuffle_train_dataset \
@@ -175,7 +175,7 @@ then
export TASK_NAME=mrpc
python run_glue_no_trainer.py \
--model_name_or_path bert-base-cased \
--model_name_or_path google-bert/bert-base-cased \
--task_name $TASK_NAME \
--max_length 128 \
--per_device_train_batch_size 32 \
@@ -202,7 +202,7 @@ that will check everything is ready for training. Finally, you can launch traini
export TASK_NAME=mrpc
accelerate launch run_glue_no_trainer.py \
--model_name_or_path bert-base-cased \
--model_name_or_path google-bert/bert-base-cased \
--task_name $TASK_NAME \
--max_length 128 \
--per_device_train_batch_size 32 \
@@ -232,7 +232,7 @@ This example code fine-tunes mBERT (multi-lingual BERT) on the XNLI dataset. It
```bash
python run_xnli.py \
--model_name_or_path bert-base-multilingual-cased \
--model_name_or_path google-bert/bert-base-multilingual-cased \
--language de \
--train_language en \
--do_train \

View File

@@ -26,6 +26,6 @@ Example usage:
```bash
python run_generation.py \
--model_type=gpt2 \
--model_name_or_path=gpt2
--model_type=openai-community/gpt2 \
--model_name_or_path=openai-community/gpt2
```

View File

@@ -16,7 +16,7 @@
""" The examples of running contrastive search on the auto-APIs;
Running this example:
python run_generation_contrastive_search.py --model_name_or_path=gpt2-large --penalty_alpha=0.6 --k=4 --length=256
python run_generation_contrastive_search.py --model_name_or_path=openai-community/gpt2-large --penalty_alpha=0.6 --k=4 --length=256
"""

View File

@@ -29,7 +29,7 @@ The following example fine-tunes BERT on CoNLL-2003:
```bash
python run_ner.py \
--model_name_or_path bert-base-uncased \
--model_name_or_path google-bert/bert-base-uncased \
--dataset_name conll2003 \
--output_dir /tmp/test-ner \
--do_train \
@@ -42,7 +42,7 @@ To run on your own training and validation files, use the following command:
```bash
python run_ner.py \
--model_name_or_path bert-base-uncased \
--model_name_or_path google-bert/bert-base-uncased \
--train_file path_to_train_file \
--validation_file path_to_validation_file \
--output_dir /tmp/test-ner \
@@ -84,7 +84,7 @@ then
export TASK_NAME=ner
python run_ner_no_trainer.py \
--model_name_or_path bert-base-cased \
--model_name_or_path google-bert/bert-base-cased \
--dataset_name conll2003 \
--task_name $TASK_NAME \
--max_length 128 \
@@ -112,7 +112,7 @@ that will check everything is ready for training. Finally, you can launch traini
export TASK_NAME=ner
accelerate launch run_ner_no_trainer.py \
--model_name_or_path bert-base-cased \
--model_name_or_path google-bert/bert-base-cased \
--dataset_name conll2003 \
--task_name $TASK_NAME \
--max_length 128 \

View File

@@ -59,11 +59,11 @@ python examples/pytorch/translation/run_translation.py \
MBart and some T5 models require special handling.
T5 models `t5-small`, `t5-base`, `t5-large`, `t5-3b` and `t5-11b` must use an additional argument: `--source_prefix "translate {source_lang} to {target_lang}"`. For example:
T5 models `google-t5/t5-small`, `google-t5/t5-base`, `google-t5/t5-large`, `google-t5/t5-3b` and `google-t5/t5-11b` must use an additional argument: `--source_prefix "translate {source_lang} to {target_lang}"`. For example:
```bash
python examples/pytorch/translation/run_translation.py \
--model_name_or_path t5-small \
--model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--source_lang en \
@@ -105,7 +105,7 @@ values for the arguments `--train_file`, `--validation_file` to match your setup
```bash
python examples/pytorch/translation/run_translation.py \
--model_name_or_path t5-small \
--model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--source_lang en \
@@ -134,7 +134,7 @@ If you want to use a pre-processed dataset that leads to high BLEU scores, but f
```bash
python examples/pytorch/translation/run_translation.py \
--model_name_or_path t5-small \
--model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--source_lang en \

View File

@@ -317,11 +317,11 @@ def main():
logger.info(f"Training/evaluation parameters {training_args}")
if data_args.source_prefix is None and model_args.model_name_or_path in [
"t5-small",
"t5-base",
"t5-large",
"t5-3b",
"t5-11b",
"google-t5/t5-small",
"google-t5/t5-base",
"google-t5/t5-large",
"google-t5/t5-3b",
"google-t5/t5-11b",
]:
logger.warning(
"You're running a t5 model but didn't provide a source prefix, which is expected, e.g. with "

View File

@@ -15,7 +15,7 @@ export TASK_NAME=MRPC
python ./run_glue_with_pabee.py \
--model_type albert \
--model_name_or_path bert-base-uncased/albert-base-v2 \
--model_name_or_path google-bert/bert-base-uncased/albert/albert-base-v2 \
--task_name $TASK_NAME \
--do_train \
--do_eval \

View File

@@ -276,8 +276,8 @@ class AlbertForSequenceClassificationWithPabee(AlbertPreTrainedModel):
from torch import nn
import torch
tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')
model = AlbertForSequenceClassificationWithPabee.from_pretrained('albert-base-v2')
tokenizer = AlbertTokenizer.from_pretrained('albert/albert-base-v2')
model = AlbertForSequenceClassificationWithPabee.from_pretrained('albert/albert-base-v2')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute")).unsqueeze(0) # Batch size 1
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
outputs = model(input_ids, labels=labels)

View File

@@ -300,8 +300,8 @@ class BertForSequenceClassificationWithPabee(BertPreTrainedModel):
from torch import nn
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassificationWithPabee.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('google-bert/bert-base-uncased')
model = BertForSequenceClassificationWithPabee.from_pretrained('google-bert/bert-base-uncased')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1

View File

@@ -29,7 +29,7 @@ class PabeeTests(TestCasePlus):
testargs = f"""
run_glue_with_pabee.py
--model_type albert
--model_name_or_path albert-base-v2
--model_name_or_path albert/albert-base-v2
--data_dir ./tests/fixtures/tests_samples/MRPC/
--output_dir {tmp_dir}
--overwrite_output_dir

View File

@@ -107,7 +107,7 @@ def convert_bertabs_checkpoints(path_to_checkpoints, dump_path):
# ----------------------------------
logging.info("Make sure that the models' outputs are identical")
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
# prepare the model inputs
encoder_input_ids = tokenizer.encode("This is sample éàalj'-.")

View File

@@ -128,7 +128,7 @@ class Bert(nn.Module):
def __init__(self):
super().__init__()
config = BertConfig.from_pretrained("bert-base-uncased")
config = BertConfig.from_pretrained("google-bert/bert-base-uncased")
self.model = BertModel(config)
def forward(self, input_ids, attention_mask=None, token_type_ids=None, **kwargs):

View File

@@ -29,7 +29,7 @@ Batch = namedtuple("Batch", ["document_names", "batch_size", "src", "segs", "mas
def evaluate(args):
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased", do_lower_case=True)
tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased", do_lower_case=True)
model = BertAbs.from_pretrained("remi/bertabs-finetuned-extractive-abstractive-summarization")
model.to(args.device)
model.eval()

View File

@@ -79,7 +79,7 @@ python scripts/pretokenizing.py \
Before training a new model for code we create a new tokenizer that is efficient at code tokenization. To train the tokenizer you can run the following command:
```bash
python scripts/bpe_training.py \
--base_tokenizer gpt2 \
--base_tokenizer openai-community/gpt2 \
--dataset_name codeparrot/codeparrot-clean-train
```
@@ -90,12 +90,12 @@ The models are randomly initialized and trained from scratch. To initialize a ne
```bash
python scripts/initialize_model.py \
--config_name gpt2-large \
--config_name openai-community/gpt2-large \
--tokenizer_name codeparrot/codeparrot \
--model_name codeparrot \
--push_to_hub True
```
This will initialize a new model with the architecture and configuration of `gpt2-large` and use the tokenizer to appropriately size the input embeddings. Finally, the initilaized model is pushed the hub.
This will initialize a new model with the architecture and configuration of `openai-community/gpt2-large` and use the tokenizer to appropriately size the input embeddings. Finally, the initilaized model is pushed the hub.
We can either pass the name of a text dataset or a pretokenized dataset which speeds up training a bit.
Now that the tokenizer and model are also ready we can start training the model. The main training script is built with `accelerate` to scale across a wide range of platforms and infrastructure scales. We train two models with [110M](https://huggingface.co/codeparrot/codeparrot-small/) and [1.5B](https://huggingface.co/codeparrot/codeparrot/) parameters for 25-30B tokens on a 16xA100 (40GB) machine which takes 1 day and 1 week, respectively.

View File

@@ -172,7 +172,7 @@ class TokenizerTrainingArguments:
"""
base_tokenizer: Optional[str] = field(
default="gpt2", metadata={"help": "Base tokenizer to build new tokenizer from."}
default="openai-community/gpt2", metadata={"help": "Base tokenizer to build new tokenizer from."}
)
dataset_name: Optional[str] = field(
default="transformersbook/codeparrot-train", metadata={"help": "Dataset to train tokenizer on."}
@@ -211,7 +211,7 @@ class InitializationArguments:
"""
config_name: Optional[str] = field(
default="gpt2-large", metadata={"help": "Configuration to use for model initialization."}
default="openai-community/gpt2-large", metadata={"help": "Configuration to use for model initialization."}
)
tokenizer_name: Optional[str] = field(
default="codeparrot/codeparrot", metadata={"help": "Tokenizer attached to model."}

View File

@@ -48,7 +48,7 @@ class DeeBertTests(TestCasePlus):
def test_glue_deebert_train(self):
train_args = """
--model_type roberta
--model_name_or_path roberta-base
--model_name_or_path FacebookAI/roberta-base
--task_name MRPC
--do_train
--do_eval
@@ -61,7 +61,7 @@ class DeeBertTests(TestCasePlus):
--num_train_epochs 3
--overwrite_output_dir
--seed 42
--output_dir ./examples/deebert/saved_models/roberta-base/MRPC/two_stage
--output_dir ./examples/deebert/saved_models/FacebookAI/roberta-base/MRPC/two_stage
--plot_data_dir ./examples/deebert/results/
--save_steps 0
--overwrite_cache
@@ -71,12 +71,12 @@ class DeeBertTests(TestCasePlus):
eval_args = """
--model_type roberta
--model_name_or_path ./examples/deebert/saved_models/roberta-base/MRPC/two_stage
--model_name_or_path ./examples/deebert/saved_models/FacebookAI/roberta-base/MRPC/two_stage
--task_name MRPC
--do_eval
--do_lower_case
--data_dir ./tests/fixtures/tests_samples/MRPC/
--output_dir ./examples/deebert/saved_models/roberta-base/MRPC/two_stage
--output_dir ./examples/deebert/saved_models/FacebookAI/roberta-base/MRPC/two_stage
--plot_data_dir ./examples/deebert/results/
--max_seq_length 128
--eval_each_highway
@@ -88,12 +88,12 @@ class DeeBertTests(TestCasePlus):
entropy_eval_args = """
--model_type roberta
--model_name_or_path ./examples/deebert/saved_models/roberta-base/MRPC/two_stage
--model_name_or_path ./examples/deebert/saved_models/FacebookAI/roberta-base/MRPC/two_stage
--task_name MRPC
--do_eval
--do_lower_case
--data_dir ./tests/fixtures/tests_samples/MRPC/
--output_dir ./examples/deebert/saved_models/roberta-base/MRPC/two_stage
--output_dir ./examples/deebert/saved_models/FacebookAI/roberta-base/MRPC/two_stage
--plot_data_dir ./examples/deebert/results/
--max_seq_length 128
--early_exit_entropy 0.1

View File

@@ -64,7 +64,7 @@ To fine-tune a transformer model with IGF on a language modeling task, use the f
```python
python run_clm_igf.py\
--model_name_or_path "gpt2" \
--model_name_or_path "openai-community/gpt2" \
--data_file="data/tokenized_stories_train_wikitext103" \
--igf_data_file="data/IGF_values" \
--context_len 32 \

View File

@@ -69,9 +69,9 @@ def compute_perplexity(model, test_data, context_len):
return perplexity
def load_gpt2(model_name="gpt2"):
def load_gpt2(model_name="openai-community/gpt2"):
"""
load original gpt2 and save off for quicker loading
load original openai-community/gpt2 and save off for quicker loading
Args:
model_name: GPT-2

View File

@@ -84,7 +84,7 @@ def generate_n_pairs(
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# load pretrained model
model = load_gpt2("gpt2").to(device)
model = load_gpt2("openai-community/gpt2").to(device)
print("computing perplexity on objective set")
orig_perp = compute_perplexity(model, objective_set, context_len).item()
print("perplexity on objective set:", orig_perp)
@@ -121,7 +121,7 @@ def training_secondary_learner(
set_seed(42)
# Load pre-trained model
model = GPT2LMHeadModel.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2")
# Initialize secondary learner to use embedding weights of model
secondary_learner = SecondaryLearner(model)
@@ -153,7 +153,7 @@ def finetune(
recopy_model=recopy_gpt2,
secondary_learner=None,
eval_interval=10,
finetuned_model_name="gpt2_finetuned.pt",
finetuned_model_name="openai-community/gpt2_finetuned.pt",
):
"""
fine-tune with IGF if secondary_learner is not None, else standard fine-tuning
@@ -346,7 +346,10 @@ def main():
)
parser.add_argument(
"--batch_size", default=16, type=int, help="batch size of training data of language model(gpt2) "
"--batch_size",
default=16,
type=int,
help="batch size of training data of language model(openai-community/gpt2) ",
)
parser.add_argument(
@@ -383,7 +386,9 @@ def main():
),
)
parser.add_argument("--finetuned_model_name", default="gpt2_finetuned.pt", type=str, help="finetuned_model_name")
parser.add_argument(
"--finetuned_model_name", default="openai-community/gpt2_finetuned.pt", type=str, help="finetuned_model_name"
)
parser.add_argument(
"--recopy_model",
@@ -416,16 +421,16 @@ def main():
igf_model_path="igf_model.pt",
)
# load pretrained gpt2 model
model = GPT2LMHeadModel.from_pretrained("gpt2")
# load pretrained openai-community/gpt2 model
model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2")
set_seed(42)
# Generate train and test data to train and evaluate gpt2 model
# Generate train and test data to train and evaluate openai-community/gpt2 model
train_dataset, test_dataset = generate_datasets(
context_len=32, file="data/tokenized_stories_train_wikitext103.jbl", number=100, min_len=1026, trim=True
)
# fine-tuning of the gpt2 model using igf (Information Gain Filtration)
# fine-tuning of the openai-community/gpt2 model using igf (Information Gain Filtration)
finetune(
model,
train_dataset,
@@ -437,7 +442,7 @@ def main():
recopy_model=recopy_gpt2,
secondary_learner=secondary_learner,
eval_interval=10,
finetuned_model_name="gpt2_finetuned.pt",
finetuned_model_name="openai-community/gpt2_finetuned.pt",
)

View File

@@ -159,13 +159,13 @@ to be used, but that everybody in team is on the same page on what type of model
To give an example, a well-defined project would be the following:
- task: summarization
- model: [t5-small](https://huggingface.co/t5-small)
- model: [google-t5/t5-small](https://huggingface.co/google-t5/t5-small)
- dataset: [CNN/Daily mail](https://huggingface.co/datasets/cnn_dailymail)
- training script: [run_summarization_flax.py](https://github.com/huggingface/transformers/blob/main/examples/flax/summarization/run_summarization_flax.py)
- outcome: t5 model that can summarize news
- work flow: adapt `run_summarization_flax.py` to work with `t5-small`.
- work flow: adapt `run_summarization_flax.py` to work with `google-t5/t5-small`.
This example is a very easy and not the most interesting project since a `t5-small`
This example is a very easy and not the most interesting project since a `google-t5/t5-small`
summarization model exists already for CNN/Daily mail and pretty much no code has to be
written.
A well-defined project does not need to have the dataset be part of
@@ -335,7 +335,7 @@ dataset = load_dataset('oscar', "unshuffled_deduplicated_en", split='train', str
dummy_input = next(iter(dataset))["text"]
tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
tokenizer = RobertaTokenizerFast.from_pretrained("FacebookAI/roberta-base")
input_ids = tokenizer(dummy_input, return_tensors="np").input_ids[:, :10]
model = FlaxRobertaModel.from_pretrained("julien-c/dummy-unknown")
@@ -492,7 +492,7 @@ dataset = load_dataset('oscar', "unshuffled_deduplicated_en", split='train', str
dummy_input = next(iter(dataset))["text"]
tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
tokenizer = RobertaTokenizerFast.from_pretrained("FacebookAI/roberta-base")
input_ids = tokenizer(dummy_input, return_tensors="np").input_ids[:, :10]
model = FlaxRobertaModel.from_pretrained("julien-c/dummy-unknown")
@@ -518,7 +518,7 @@ be available in a couple of days.
- [BigBird](https://github.com/huggingface/transformers/blob/main/src/transformers/models/big_bird/modeling_flax_big_bird.py)
- [CLIP](https://github.com/huggingface/transformers/blob/main/src/transformers/models/clip/modeling_flax_clip.py)
- [ELECTRA](https://github.com/huggingface/transformers/blob/main/src/transformers/models/electra/modeling_flax_electra.py)
- [GPT2](https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_flax_gpt2.py)
- [GPT2](https://github.com/huggingface/transformers/blob/main/src/transformers/models/openai-community/gpt2/modeling_flax_gpt2.py)
- [(TODO) MBART](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mbart/modeling_flax_mbart.py)
- [RoBERTa](https://github.com/huggingface/transformers/blob/main/src/transformers/models/roberta/modeling_flax_roberta.py)
- [T5](https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_flax_t5.py)
@@ -729,7 +729,7 @@ Let's use the base `FlaxRobertaModel` without any heads as an example.
from transformers import FlaxRobertaModel, RobertaTokenizerFast
import jax
tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
tokenizer = RobertaTokenizerFast.from_pretrained("FacebookAI/roberta-base")
inputs = tokenizer("JAX/Flax is amazing ", padding="max_length", max_length=128, return_tensors="np")
model = FlaxRobertaModel.from_pretrained("julien-c/dummy-unknown")
@@ -1011,7 +1011,7 @@ and run the following commands in a Python shell to save a config.
```python
from transformers import RobertaConfig
config = RobertaConfig.from_pretrained("roberta-base")
config = RobertaConfig.from_pretrained("FacebookAI/roberta-base")
config.save_pretrained("./")
```
@@ -1193,12 +1193,12 @@ All the widgets are open sourced in the `huggingface_hub` [repo](https://github.
**NLP**
* **Conversational:** To have the best conversations!. [Example](https://huggingface.co/microsoft/DialoGPT-large?).
* **Feature Extraction:** Retrieve the input embeddings. [Example](https://huggingface.co/sentence-transformers/distilbert-base-nli-mean-tokens?text=test).
* **Fill Mask:** Predict potential words for a mask token. [Example](https://huggingface.co/bert-base-uncased?).
* **Question Answering:** Given a context and a question, predict the answer. [Example](https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad).
* **Fill Mask:** Predict potential words for a mask token. [Example](https://huggingface.co/google-bert/bert-base-uncased?).
* **Question Answering:** Given a context and a question, predict the answer. [Example](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad).
* **Sentence Simmilarity:** Predict how similar a set of sentences are. Useful for Sentence Transformers.
* **Summarization:** Given a text, output a summary of it. [Example](https://huggingface.co/sshleifer/distilbart-cnn-12-6).
* **Table Question Answering:** Given a table and a question, predict the answer. [Example](https://huggingface.co/google/tapas-base-finetuned-wtq).
* **Text Generation:** Generate text based on a prompt. [Example](https://huggingface.co/gpt2)
* **Text Generation:** Generate text based on a prompt. [Example](https://huggingface.co/openai-community/gpt2)
* **Token Classification:** Useful for tasks such as Named Entity Recognition and Part of Speech. [Example](https://huggingface.co/dslim/bert-base-NER).
* **Zero-Shot Classification:** Too cool to explain with words. Here is an [example](https://huggingface.co/typeform/distilbert-base-uncased-mnli)
* ([WIP](https://github.com/huggingface/huggingface_hub/issues/99)) **Table to Text Generation**.

View File

@@ -31,7 +31,7 @@ without ever having to download the full dataset.
In the following, we demonstrate how to train a bi-directional transformer model
using masked language modeling objective as introduced in [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
More specifically, we demonstrate how JAX/Flax and dataset streaming can be leveraged
to pre-train [**`roberta-base`**](https://huggingface.co/roberta-base)
to pre-train [**`FacebookAI/roberta-base`**](https://huggingface.co/FacebookAI/roberta-base)
in English on a single TPUv3-8 pod for 10000 update steps.
The example script uses the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.
@@ -80,8 +80,8 @@ from transformers import RobertaTokenizerFast, RobertaConfig
model_dir = "./english-roberta-base-dummy"
tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
config = RobertaConfig.from_pretrained("roberta-base")
tokenizer = RobertaTokenizerFast.from_pretrained("FacebookAI/roberta-base")
config = RobertaConfig.from_pretrained("FacebookAI/roberta-base")
tokenizer.save_pretrained(model_dir)
config.save_pretrained(model_dir)

View File

@@ -32,7 +32,7 @@ Models written in JAX/Flax are **immutable** and updated in a purely functional
way which enables simple and efficient model parallelism.
In this example we will use the vision model from [CLIP](https://huggingface.co/models?filter=clip)
as the image encoder and [`roberta-base`](https://huggingface.co/roberta-base) as the text encoder.
as the image encoder and [`FacebookAI/roberta-base`](https://huggingface.co/FacebookAI/roberta-base) as the text encoder.
Note that one can also use the [ViT](https://huggingface.co/models?filter=vit) model as image encoder and any other BERT or ROBERTa model as text encoder.
To train the model on languages other than English one should choose a text encoder trained on the desired
language and a image-text dataset in that language. One such dataset is [WIT](https://github.com/google-research-datasets/wit).
@@ -76,7 +76,7 @@ Here is an example of how to load the model using pre-trained text and vision mo
```python
from modeling_hybrid_clip import FlaxHybridCLIP
model = FlaxHybridCLIP.from_text_vision_pretrained("bert-base-uncased", "openai/clip-vit-base-patch32")
model = FlaxHybridCLIP.from_text_vision_pretrained("google-bert/bert-base-uncased", "openai/clip-vit-base-patch32")
# save the model
model.save_pretrained("bert-clip")
@@ -89,7 +89,7 @@ If the checkpoints are in PyTorch then one could pass `text_from_pt=True` and `v
PyTorch checkpoints convert them to flax and load the model.
```python
model = FlaxHybridCLIP.from_text_vision_pretrained("bert-base-uncased", "openai/clip-vit-base-patch32", text_from_pt=True, vision_from_pt=True)
model = FlaxHybridCLIP.from_text_vision_pretrained("google-bert/bert-base-uncased", "openai/clip-vit-base-patch32", text_from_pt=True, vision_from_pt=True)
```
This loads both the text and vision encoders using pre-trained weights, the projection layers are randomly
@@ -154,9 +154,9 @@ Next we can run the example script to train the model:
```bash
python run_hybrid_clip.py \
--output_dir ${MODEL_DIR} \
--text_model_name_or_path="roberta-base" \
--text_model_name_or_path="FacebookAI/roberta-base" \
--vision_model_name_or_path="openai/clip-vit-base-patch32" \
--tokenizer_name="roberta-base" \
--tokenizer_name="FacebookAI/roberta-base" \
--train_file="coco_dataset/train_dataset.json" \
--validation_file="coco_dataset/validation_dataset.json" \
--do_train --do_eval \

View File

@@ -314,8 +314,6 @@ class FlaxHybridCLIP(FlaxPreTrainedModel):
Information necessary to initiate the text model. Can be either:
- A string, the `model id` of a pretrained model hosted inside a model repo on huggingface.co.
Valid model ids can be located at the root-level, like ``bert-base-uncased``, or namespaced under
a user or organization name, like ``dbmdz/bert-base-german-cased``.
- A path to a `directory` containing model weights saved using
:func:`~transformers.FlaxPreTrainedModel.save_pretrained`, e.g., ``./my_model_directory/``.
- A path or url to a `PyTorch checkpoint folder` (e.g, ``./pt_model``). In
@@ -327,8 +325,6 @@ class FlaxHybridCLIP(FlaxPreTrainedModel):
Information necessary to initiate the vision model. Can be either:
- A string, the `model id` of a pretrained model hosted inside a model repo on huggingface.co.
Valid model ids can be located at the root-level, like ``bert-base-uncased``, or namespaced under
a user or organization name, like ``dbmdz/bert-base-german-cased``.
- A path to a `directory` containing model weights saved using
:func:`~transformers.FlaxPreTrainedModel.save_pretrained`, e.g., ``./my_model_directory/``.
- A path or url to a `PyTorch checkpoint folder` (e.g, ``./pt_model``). In
@@ -354,7 +350,7 @@ class FlaxHybridCLIP(FlaxPreTrainedModel):
>>> from transformers import FlaxHybridCLIP
>>> # initialize a model from pretrained BERT and CLIP models. Note that the projection layers will be randomly initialized.
>>> # If using CLIP's vision model the vision projection layer will be initialized using pre-trained weights
>>> model = FlaxHybridCLIP.from_text_vision_pretrained('bert-base-uncased', 'openai/clip-vit-base-patch32')
>>> model = FlaxHybridCLIP.from_text_vision_pretrained('google-bert/bert-base-uncased', 'openai/clip-vit-base-patch32')
>>> # saving model after fine-tuning
>>> model.save_pretrained("./bert-clip")
>>> # load fine-tuned model

View File

@@ -54,7 +54,7 @@ model.save_pretrained("gpt-neo-1.3B")
```bash
python run_clm_mp.py \
--model_name_or_path gpt-neo-1.3B \
--tokenizer_name gpt2 \
--tokenizer_name openai-community/gpt2 \
--dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --do_eval \
--block_size 1024 \

View File

@@ -36,7 +36,7 @@ def load_models():
_ = s2s_model.eval()
else:
s2s_tokenizer, s2s_model = make_qa_s2s_model(
model_name="t5-small", from_file="seq2seq_models/eli5_t5_model_1024_4.pth", device="cuda:0"
model_name="google-t5/t5-small", from_file="seq2seq_models/eli5_t5_model_1024_4.pth", device="cuda:0"
)
return (qar_tokenizer, qar_model, s2s_tokenizer, s2s_model)

View File

@@ -32,7 +32,7 @@ to that word). This technique has been refined for Chinese in [this paper](https
To fine-tune a model using whole word masking, use the following script:
```bash
python run_mlm_wwm.py \
--model_name_or_path roberta-base \
--model_name_or_path FacebookAI/roberta-base \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--do_train \
@@ -83,7 +83,7 @@ export VALIDATION_REF_FILE=/path/to/validation/chinese_ref/file
export OUTPUT_DIR=/tmp/test-mlm-wwm
python run_mlm_wwm.py \
--model_name_or_path roberta-base \
--model_name_or_path FacebookAI/roberta-base \
--train_file $TRAIN_FILE \
--validation_file $VALIDATION_FILE \
--train_ref_file $TRAIN_REF_FILE \

View File

@@ -10,7 +10,7 @@ Based on the script [`run_mmimdb.py`](https://github.com/huggingface/transformer
python run_mmimdb.py \
--data_dir /path/to/mmimdb/dataset/ \
--model_type bert \
--model_name_or_path bert-base-uncased \
--model_name_or_path google-bert/bert-base-uncased \
--output_dir /path/to/save/dir/ \
--do_train \
--do_eval \

View File

@@ -61,7 +61,7 @@ python examples/movement-pruning/masked_run_squad.py \
--predict_file dev-v1.1.json \
--do_train --do_eval --do_lower_case \
--model_type masked_bert \
--model_name_or_path bert-base-uncased \
--model_name_or_path google-bert/bert-base-uncased \
--per_gpu_train_batch_size 16 \
--warmup_steps 5400 \
--num_train_epochs 10 \
@@ -84,7 +84,7 @@ python examples/movement-pruning/masked_run_squad.py \
--predict_file dev-v1.1.json \
--do_train --do_eval --do_lower_case \
--model_type masked_bert \
--model_name_or_path bert-base-uncased \
--model_name_or_path google-bert/bert-base-uncased \
--per_gpu_train_batch_size 16 \
--warmup_steps 5400 \
--num_train_epochs 10 \
@@ -104,7 +104,7 @@ python examples/movement-pruning/masked_run_squad.py \
--predict_file dev-v1.1.json \
--do_train --do_eval --do_lower_case \
--model_type masked_bert \
--model_name_or_path bert-base-uncased \
--model_name_or_path google-bert/bert-base-uncased \
--per_gpu_train_batch_size 16 \
--warmup_steps 5400 \
--num_train_epochs 10 \
@@ -124,7 +124,7 @@ python examples/movement-pruning/masked_run_squad.py \
--predict_file dev-v1.1.json \
--do_train --do_eval --do_lower_case \
--model_type masked_bert \
--model_name_or_path bert-base-uncased \
--model_name_or_path google-bert/bert-base-uncased \
--per_gpu_train_batch_size 16 \
--warmup_steps 5400 \
--num_train_epochs 10 \

View File

@@ -10,8 +10,8 @@ Paper authors: Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyo
## Examples
`sanity_script.sh` will launch performer fine-tuning from the bert-base-cased checkpoint on the Simple Wikipedia dataset (a small, easy-language English Wikipedia) from `datasets`.
`full_script.sh` will launch performer fine-tuning from the bert-large-cased checkpoint on the English Wikipedia dataset from `datasets`.
`sanity_script.sh` will launch performer fine-tuning from the google-bert/bert-base-cased checkpoint on the Simple Wikipedia dataset (a small, easy-language English Wikipedia) from `datasets`.
`full_script.sh` will launch performer fine-tuning from the google-bert/bert-large-cased checkpoint on the English Wikipedia dataset from `datasets`.
Here are a few key arguments:
- Remove the `--performer` argument to use a standard Bert model.

View File

@@ -61,7 +61,7 @@ DISCRIMINATOR_MODELS_PARAMS = {
"embed_size": 1024,
"class_vocab": {"non_clickbait": 0, "clickbait": 1},
"default_class": 1,
"pretrained_model": "gpt2-medium",
"pretrained_model": "openai-community/gpt2-medium",
},
"sentiment": {
"url": "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/discriminators/SST_classifier_head.pt",
@@ -69,7 +69,7 @@ DISCRIMINATOR_MODELS_PARAMS = {
"embed_size": 1024,
"class_vocab": {"very_positive": 2, "very_negative": 3},
"default_class": 3,
"pretrained_model": "gpt2-medium",
"pretrained_model": "openai-community/gpt2-medium",
},
}
@@ -585,7 +585,7 @@ def set_generic_model_params(discrim_weights, discrim_meta):
def run_pplm_example(
pretrained_model="gpt2-medium",
pretrained_model="openai-community/gpt2-medium",
cond_text="",
uncond=False,
num_samples=1,
@@ -738,7 +738,7 @@ if __name__ == "__main__":
"--pretrained_model",
"-M",
type=str,
default="gpt2-medium",
default="openai-community/gpt2-medium",
help="pretrained model name or path to local checkpoint",
)
parser.add_argument("--cond_text", type=str, default="The lake", help="Prefix texts to condition on")

View File

@@ -45,7 +45,7 @@ max_length_seq = 100
class Discriminator(nn.Module):
"""Transformer encoder followed by a Classification Head"""
def __init__(self, class_size, pretrained_model="gpt2-medium", cached_mode=False, device="cpu"):
def __init__(self, class_size, pretrained_model="openai-community/gpt2-medium", cached_mode=False, device="cpu"):
super().__init__()
self.tokenizer = GPT2Tokenizer.from_pretrained(pretrained_model)
self.encoder = GPT2LMHeadModel.from_pretrained(pretrained_model)
@@ -218,7 +218,7 @@ def get_cached_data_loader(dataset, batch_size, discriminator, shuffle=False, de
def train_discriminator(
dataset,
dataset_fp=None,
pretrained_model="gpt2-medium",
pretrained_model="openai-community/gpt2-medium",
epochs=10,
batch_size=64,
log_interval=10,
@@ -502,7 +502,10 @@ if __name__ == "__main__":
help="File path of the dataset to use. Needed only in case of generic datadset",
)
parser.add_argument(
"--pretrained_model", type=str, default="gpt2-medium", help="Pretrained model to use as encoder"
"--pretrained_model",
type=str,
default="openai-community/gpt2-medium",
help="Pretrained model to use as encoder",
)
parser.add_argument("--epochs", type=int, default=10, metavar="N", help="Number of training epochs")
parser.add_argument(

View File

@@ -50,11 +50,11 @@ Calibrate the pretrained model and finetune with quantization awared:
```bash
python3 run_quant_qa.py \
--model_name_or_path bert-base-uncased \
--model_name_or_path google-bert/bert-base-uncased \
--dataset_name squad \
--max_seq_length 128 \
--doc_stride 32 \
--output_dir calib/bert-base-uncased \
--output_dir calib/google-bert/bert-base-uncased \
--do_calib \
--calibrator percentile \
--percentile 99.99
@@ -62,7 +62,7 @@ python3 run_quant_qa.py \
```bash
python3 run_quant_qa.py \
--model_name_or_path calib/bert-base-uncased \
--model_name_or_path calib/google-bert/bert-base-uncased \
--dataset_name squad \
--do_train \
--do_eval \
@@ -71,8 +71,8 @@ python3 run_quant_qa.py \
--num_train_epochs 2 \
--max_seq_length 128 \
--doc_stride 32 \
--output_dir finetuned_int8/bert-base-uncased \
--tokenizer_name bert-base-uncased \
--output_dir finetuned_int8/google-bert/bert-base-uncased \
--tokenizer_name google-bert/bert-base-uncased \
--save_steps 0
```
@@ -82,14 +82,14 @@ To export the QAT model finetuned above:
```bash
python3 run_quant_qa.py \
--model_name_or_path finetuned_int8/bert-base-uncased \
--model_name_or_path finetuned_int8/google-bert/bert-base-uncased \
--output_dir ./ \
--save_onnx \
--per_device_eval_batch_size 1 \
--max_seq_length 128 \
--doc_stride 32 \
--dataset_name squad \
--tokenizer_name bert-base-uncased
--tokenizer_name google-bert/bert-base-uncased
```
Use `--recalibrate-weights` to calibrate the weight ranges according to the quantizer axis. Use `--quant-per-tensor` for per tensor quantization (default is per channel).
@@ -117,7 +117,7 @@ python3 evaluate-hf-trt-qa.py \
--max_seq_length 128 \
--doc_stride 32 \
--dataset_name squad \
--tokenizer_name bert-base-uncased \
--tokenizer_name google-bert/bert-base-uncased \
--int8 \
--seed 42
```
@@ -128,14 +128,14 @@ Finetune a fp32 precision model with [transformers/examples/pytorch/question-ans
```bash
python3 ../../pytorch/question-answering/run_qa.py \
--model_name_or_path bert-base-uncased \
--model_name_or_path google-bert/bert-base-uncased \
--dataset_name squad \
--per_device_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 128 \
--doc_stride 32 \
--output_dir ./finetuned_fp32/bert-base-uncased \
--output_dir ./finetuned_fp32/google-bert/bert-base-uncased \
--save_steps 0 \
--do_train \
--do_eval
@@ -147,13 +147,13 @@ python3 ../../pytorch/question-answering/run_qa.py \
```bash
python3 run_quant_qa.py \
--model_name_or_path ./finetuned_fp32/bert-base-uncased \
--model_name_or_path ./finetuned_fp32/google-bert/bert-base-uncased \
--dataset_name squad \
--calibrator percentile \
--percentile 99.99 \
--max_seq_length 128 \
--doc_stride 32 \
--output_dir ./calib/bert-base-uncased \
--output_dir ./calib/google-bert/bert-base-uncased \
--save_steps 0 \
--do_calib \
--do_eval
@@ -163,14 +163,14 @@ python3 run_quant_qa.py \
```bash
python3 run_quant_qa.py \
--model_name_or_path ./calib/bert-base-uncased \
--model_name_or_path ./calib/google-bert/bert-base-uncased \
--output_dir ./ \
--save_onnx \
--per_device_eval_batch_size 1 \
--max_seq_length 128 \
--doc_stride 32 \
--dataset_name squad \
--tokenizer_name bert-base-uncased
--tokenizer_name google-bert/bert-base-uncased
```
### Evaluate the INT8 PTQ ONNX model inference with TensorRT
@@ -183,7 +183,7 @@ python3 evaluate-hf-trt-qa.py \
--max_seq_length 128 \
--doc_stride 32 \
--dataset_name squad \
--tokenizer_name bert-base-uncased \
--tokenizer_name google-bert/bert-base-uncased \
--int8 \
--seed 42
```

View File

@@ -22,5 +22,5 @@ If you would like to list benchmark results on your favorite models of the [mode
| Benchmark description | Results | Environment info | Author |
|:----------|:-------------|:-------------|------:|
| PyTorch Benchmark on inference for `bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
| PyTorch Benchmark on inference for `bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
| PyTorch Benchmark on inference for `google-bert/bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
| PyTorch Benchmark on inference for `google-bert/bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |

View File

@@ -65,7 +65,7 @@ Finally, we can run the example script to train the model:
python examples/tensorflow/contrastive-image-text/run_clip.py \
--output_dir ./clip-roberta-finetuned \
--vision_model_name_or_path openai/clip-vit-base-patch32 \
--text_model_name_or_path roberta-base \
--text_model_name_or_path FacebookAI/roberta-base \
--data_dir $PWD/data \
--dataset_name ydshieh/coco_dataset_script \
--dataset_config_name=2017 \

View File

@@ -57,7 +57,7 @@ def parse_args():
parser.add_argument(
"--pretrained_model_config",
type=str,
default="roberta-base",
default="FacebookAI/roberta-base",
help="The model config to use. Note that we don't copy the model's weights, only the config!",
)
parser.add_argument(

View File

@@ -43,7 +43,7 @@ This script trains a masked language model.
### Example command
```bash
python run_mlm.py \
--model_name_or_path distilbert-base-cased \
--model_name_or_path distilbert/distilbert-base-cased \
--output_dir output \
--dataset_name wikitext \
--dataset_config_name wikitext-103-raw-v1
@@ -52,7 +52,7 @@ python run_mlm.py \
When using a custom dataset, the validation file can be separately passed as an input argument. Otherwise some split (customizable) of training data is used as validation.
```bash
python run_mlm.py \
--model_name_or_path distilbert-base-cased \
--model_name_or_path distilbert/distilbert-base-cased \
--output_dir output \
--train_file train_file_path
```
@@ -64,7 +64,7 @@ This script trains a causal language model.
### Example command
```bash
python run_clm.py \
--model_name_or_path distilgpt2 \
--model_name_or_path distilbert/distilgpt2 \
--output_dir output \
--dataset_name wikitext \
--dataset_config_name wikitext-103-raw-v1
@@ -74,7 +74,7 @@ When using a custom dataset, the validation file can be separately passed as an
```bash
python run_clm.py \
--model_name_or_path distilgpt2 \
--model_name_or_path distilbert/distilgpt2 \
--output_dir output \
--train_file train_file_path
```

View File

@@ -36,7 +36,7 @@ README, but for more information you can see the 'Input Datasets' section of
### Example command
```bash
python run_swag.py \
--model_name_or_path distilbert-base-cased \
--model_name_or_path distilbert/distilbert-base-cased \
--output_dir output \
--do_eval \
--do_train

View File

@@ -47,7 +47,7 @@ README, but for more information you can see the 'Input Datasets' section of
### Example command
```bash
python run_qa.py \
--model_name_or_path distilbert-base-cased \
--model_name_or_path distilbert/distilbert-base-cased \
--output_dir output \
--dataset_name squad \
--do_train \

View File

@@ -334,11 +334,11 @@ def main():
# region T5 special-casing
if data_args.source_prefix is None and model_args.model_name_or_path in [
"t5-small",
"t5-base",
"t5-large",
"t5-3b",
"t5-11b",
"google-t5/t5-small",
"google-t5/t5-base",
"google-t5/t5-large",
"google-t5/t5-3b",
"google-t5/t5-11b",
]:
logger.warning(
"You're running a t5 model but didn't provide a source prefix, which is the expected, e.g. with "

View File

@@ -107,7 +107,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_text_classification.py
--model_name_or_path distilbert-base-uncased
--model_name_or_path distilbert/distilbert-base-uncased
--output_dir {tmp_dir}
--overwrite_output_dir
--train_file ./tests/fixtures/tests_samples/MRPC/train.csv
@@ -137,7 +137,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_clm.py
--model_name_or_path distilgpt2
--model_name_or_path distilbert/distilgpt2
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--do_train
@@ -163,7 +163,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_mlm.py
--model_name_or_path distilroberta-base
--model_name_or_path distilbert/distilroberta-base
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--max_seq_length 64
@@ -188,7 +188,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_ner.py
--model_name_or_path bert-base-uncased
--model_name_or_path google-bert/bert-base-uncased
--train_file tests/fixtures/tests_samples/conll/sample.json
--validation_file tests/fixtures/tests_samples/conll/sample.json
--output_dir {tmp_dir}
@@ -212,7 +212,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_qa.py
--model_name_or_path bert-base-uncased
--model_name_or_path google-bert/bert-base-uncased
--version_2_with_negative
--train_file tests/fixtures/tests_samples/SQUAD/sample.json
--validation_file tests/fixtures/tests_samples/SQUAD/sample.json
@@ -237,7 +237,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_swag.py
--model_name_or_path bert-base-uncased
--model_name_or_path google-bert/bert-base-uncased
--train_file tests/fixtures/tests_samples/swag/sample.json
--validation_file tests/fixtures/tests_samples/swag/sample.json
--output_dir {tmp_dir}
@@ -261,7 +261,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_summarization.py
--model_name_or_path t5-small
--model_name_or_path google-t5/t5-small
--train_file tests/fixtures/tests_samples/xsum/sample.json
--validation_file tests/fixtures/tests_samples/xsum/sample.json
--output_dir {tmp_dir}

View File

@@ -71,7 +71,7 @@ README, but for more information you can see the 'Input Datasets' section of
### Example command
```bash
python run_text_classification.py \
--model_name_or_path distilbert-base-cased \
--model_name_or_path distilbert/distilbert-base-cased \
--train_file training_data.json \
--validation_file validation_data.json \
--output_dir output/ \
@@ -103,7 +103,7 @@ README, but for more information you can see the 'Input Datasets' section of
### Example command
```bash
python run_glue.py \
--model_name_or_path distilbert-base-cased \
--model_name_or_path distilbert/distilbert-base-cased \
--task_name mnli \
--do_train \
--do_eval \

View File

@@ -27,7 +27,7 @@ The following example fine-tunes BERT on CoNLL-2003:
```bash
python run_ner.py \
--model_name_or_path bert-base-uncased \
--model_name_or_path google-bert/bert-base-uncased \
--dataset_name conll2003 \
--output_dir /tmp/test-ner
```
@@ -36,7 +36,7 @@ To run on your own training and validation files, use the following command:
```bash
python run_ner.py \
--model_name_or_path bert-base-uncased \
--model_name_or_path google-bert/bert-base-uncased \
--train_file path_to_train_file \
--validation_file path_to_validation_file \
--output_dir /tmp/test-ner

View File

@@ -29,11 +29,11 @@ can also be used by passing the name of the TPU resource with the `--tpu` argume
MBart and some T5 models require special handling.
T5 models `t5-small`, `t5-base`, `t5-large`, `t5-3b` and `t5-11b` must use an additional argument: `--source_prefix "translate {source_lang} to {target_lang}"`. For example:
T5 models `google-t5/t5-small`, `google-t5/t5-base`, `google-t5/t5-large`, `google-t5/t5-3b` and `google-t5/t5-11b` must use an additional argument: `--source_prefix "translate {source_lang} to {target_lang}"`. For example:
```bash
python run_translation.py \
--model_name_or_path t5-small \
--model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--source_lang en \