Update all references to canonical models (#29001)
* Script & Manual edition * Update
This commit is contained in:
@@ -118,8 +118,8 @@ pip install runhouse
|
||||
# For an on-demand V100 with whichever cloud provider you have configured:
|
||||
python run_on_remote.py \
|
||||
--example pytorch/text-generation/run_generation.py \
|
||||
--model_type=gpt2 \
|
||||
--model_name_or_path=gpt2 \
|
||||
--model_type=openai-community/gpt2 \
|
||||
--model_name_or_path=openai-community/gpt2 \
|
||||
--prompt "I am a language model and"
|
||||
|
||||
# For byo (bring your own) cluster:
|
||||
|
||||
@@ -34,7 +34,7 @@ Next, we create a [FlaxVisionEncoderDecoderModel](https://huggingface.co/docs/tr
|
||||
python3 create_model_from_encoder_decoder_models.py \
|
||||
--output_dir model \
|
||||
--encoder_model_name_or_path google/vit-base-patch16-224-in21k \
|
||||
--decoder_model_name_or_path gpt2
|
||||
--decoder_model_name_or_path openai-community/gpt2
|
||||
```
|
||||
|
||||
### Train the model
|
||||
|
||||
@@ -28,7 +28,7 @@ way which enables simple and efficient model parallelism.
|
||||
In the following, we demonstrate how to train a bi-directional transformer model
|
||||
using masked language modeling objective as introduced in [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
|
||||
More specifically, we demonstrate how JAX/Flax can be leveraged
|
||||
to pre-train [**`roberta-base`**](https://huggingface.co/roberta-base)
|
||||
to pre-train [**`FacebookAI/roberta-base`**](https://huggingface.co/FacebookAI/roberta-base)
|
||||
in Norwegian on a single TPUv3-8 pod.
|
||||
|
||||
The example script uses the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.
|
||||
@@ -76,13 +76,13 @@ tokenizer.save("./norwegian-roberta-base/tokenizer.json")
|
||||
### Create configuration
|
||||
|
||||
Next, we create the model's configuration file. This is as simple
|
||||
as loading and storing [`**roberta-base**`](https://huggingface.co/roberta-base)
|
||||
as loading and storing [`**FacebookAI/roberta-base**`](https://huggingface.co/FacebookAI/roberta-base)
|
||||
in the local model folder:
|
||||
|
||||
```python
|
||||
from transformers import RobertaConfig
|
||||
|
||||
config = RobertaConfig.from_pretrained("roberta-base", vocab_size=50265)
|
||||
config = RobertaConfig.from_pretrained("FacebookAI/roberta-base", vocab_size=50265)
|
||||
config.save_pretrained("./norwegian-roberta-base")
|
||||
```
|
||||
|
||||
@@ -129,8 +129,8 @@ look at [this](https://colab.research.google.com/github/huggingface/notebooks/bl
|
||||
|
||||
In the following, we demonstrate how to train an auto-regressive causal transformer model
|
||||
in JAX/Flax.
|
||||
More specifically, we pretrain a randomly initialized [**`gpt2`**](https://huggingface.co/gpt2) model in Norwegian on a single TPUv3-8.
|
||||
to pre-train 124M [**`gpt2`**](https://huggingface.co/gpt2)
|
||||
More specifically, we pretrain a randomly initialized [**`openai-community/gpt2`**](https://huggingface.co/openai-community/gpt2) model in Norwegian on a single TPUv3-8.
|
||||
to pre-train 124M [**`openai-community/gpt2`**](https://huggingface.co/openai-community/gpt2)
|
||||
in Norwegian on a single TPUv3-8 pod.
|
||||
|
||||
The example script uses the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.
|
||||
@@ -179,13 +179,13 @@ tokenizer.save("./norwegian-gpt2/tokenizer.json")
|
||||
### Create configuration
|
||||
|
||||
Next, we create the model's configuration file. This is as simple
|
||||
as loading and storing [`**gpt2**`](https://huggingface.co/gpt2)
|
||||
as loading and storing [`**openai-community/gpt2**`](https://huggingface.co/openai-community/gpt2)
|
||||
in the local model folder:
|
||||
|
||||
```python
|
||||
from transformers import GPT2Config
|
||||
|
||||
config = GPT2Config.from_pretrained("gpt2", resid_pdrop=0.0, embd_pdrop=0.0, attn_pdrop=0.0, vocab_size=50257)
|
||||
config = GPT2Config.from_pretrained("openai-community/gpt2", resid_pdrop=0.0, embd_pdrop=0.0, attn_pdrop=0.0, vocab_size=50257)
|
||||
config.save_pretrained("./norwegian-gpt2")
|
||||
```
|
||||
|
||||
@@ -199,7 +199,7 @@ Finally, we can run the example script to pretrain the model:
|
||||
```bash
|
||||
python run_clm_flax.py \
|
||||
--output_dir="./norwegian-gpt2" \
|
||||
--model_type="gpt2" \
|
||||
--model_type="openai-community/gpt2" \
|
||||
--config_name="./norwegian-gpt2" \
|
||||
--tokenizer_name="./norwegian-gpt2" \
|
||||
--dataset_name="oscar" \
|
||||
|
||||
@@ -29,7 +29,7 @@ The following example fine-tunes BERT on SQuAD:
|
||||
|
||||
```bash
|
||||
python run_qa.py \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--model_name_or_path google-bert/bert-base-uncased \
|
||||
--dataset_name squad \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
@@ -67,7 +67,7 @@ Here is an example training on 4 TITAN RTX GPUs and Bert Whole Word Masking unca
|
||||
```bash
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
python run_qa.py \
|
||||
--model_name_or_path bert-large-uncased-whole-word-masking \
|
||||
--model_name_or_path google-bert/bert-large-uncased-whole-word-masking \
|
||||
--dataset_name squad \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
|
||||
@@ -78,7 +78,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_glue.py
|
||||
--model_name_or_path distilbert-base-uncased
|
||||
--model_name_or_path distilbert/distilbert-base-uncased
|
||||
--output_dir {tmp_dir}
|
||||
--train_file ./tests/fixtures/tests_samples/MRPC/train.csv
|
||||
--validation_file ./tests/fixtures/tests_samples/MRPC/dev.csv
|
||||
@@ -101,7 +101,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_clm_flax.py
|
||||
--model_name_or_path distilgpt2
|
||||
--model_name_or_path distilbert/distilgpt2
|
||||
--train_file ./tests/fixtures/sample_text.txt
|
||||
--validation_file ./tests/fixtures/sample_text.txt
|
||||
--do_train
|
||||
@@ -125,7 +125,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_summarization.py
|
||||
--model_name_or_path t5-small
|
||||
--model_name_or_path google-t5/t5-small
|
||||
--train_file tests/fixtures/tests_samples/xsum/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/xsum/sample.json
|
||||
--test_file tests/fixtures/tests_samples/xsum/sample.json
|
||||
@@ -155,7 +155,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_mlm.py
|
||||
--model_name_or_path distilroberta-base
|
||||
--model_name_or_path distilbert/distilroberta-base
|
||||
--train_file ./tests/fixtures/sample_text.txt
|
||||
--validation_file ./tests/fixtures/sample_text.txt
|
||||
--output_dir {tmp_dir}
|
||||
@@ -179,7 +179,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_t5_mlm_flax.py
|
||||
--model_name_or_path t5-small
|
||||
--model_name_or_path google-t5/t5-small
|
||||
--train_file ./tests/fixtures/sample_text.txt
|
||||
--validation_file ./tests/fixtures/sample_text.txt
|
||||
--do_train
|
||||
@@ -206,7 +206,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_flax_ner.py
|
||||
--model_name_or_path bert-base-uncased
|
||||
--model_name_or_path google-bert/bert-base-uncased
|
||||
--train_file tests/fixtures/tests_samples/conll/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/conll/sample.json
|
||||
--output_dir {tmp_dir}
|
||||
@@ -233,7 +233,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_qa.py
|
||||
--model_name_or_path bert-base-uncased
|
||||
--model_name_or_path google-bert/bert-base-uncased
|
||||
--version_2_with_negative
|
||||
--train_file tests/fixtures/tests_samples/SQUAD/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/SQUAD/sample.json
|
||||
|
||||
@@ -31,7 +31,7 @@ GLUE is made up of a total of 9 different tasks. Here is how to run the script o
|
||||
export TASK_NAME=mrpc
|
||||
|
||||
python run_flax_glue.py \
|
||||
--model_name_or_path bert-base-cased \
|
||||
--model_name_or_path google-bert/bert-base-cased \
|
||||
--task_name ${TASK_NAME} \
|
||||
--max_seq_length 128 \
|
||||
--learning_rate 2e-5 \
|
||||
|
||||
@@ -25,7 +25,7 @@ The following example fine-tunes BERT on CoNLL-2003:
|
||||
|
||||
```bash
|
||||
python run_flax_ner.py \
|
||||
--model_name_or_path bert-base-cased \
|
||||
--model_name_or_path google-bert/bert-base-cased \
|
||||
--dataset_name conll2003 \
|
||||
--max_seq_length 128 \
|
||||
--learning_rate 2e-5 \
|
||||
|
||||
@@ -22,5 +22,5 @@ If you would like to list benchmark results on your favorite models of the [mode
|
||||
|
||||
| Benchmark description | Results | Environment info | Author |
|
||||
|:----------|:-------------|:-------------|------:|
|
||||
| PyTorch Benchmark on inference for `bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
|
||||
| PyTorch Benchmark on inference for `bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
|
||||
| PyTorch Benchmark on inference for `google-bert/bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
|
||||
| PyTorch Benchmark on inference for `google-bert/bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
#### Fine-tuning BERT on SQuAD1.0 with relative position embeddings
|
||||
|
||||
The following examples show how to fine-tune BERT models with different relative position embeddings. The BERT model
|
||||
`bert-base-uncased` was pretrained with default absolute position embeddings. We provide the following pretrained
|
||||
`google-bert/bert-base-uncased` was pretrained with default absolute position embeddings. We provide the following pretrained
|
||||
models which were pre-trained on the same training data (BooksCorpus and English Wikipedia) as in the BERT model
|
||||
training, but with different relative position embeddings.
|
||||
|
||||
@@ -10,7 +10,7 @@ Shaw et al., [Self-Attention with Relative Position Representations](https://arx
|
||||
* `zhiheng-huang/bert-base-uncased-embedding-relative-key-query`, trained from scratch with relative embedding method 4
|
||||
in Huang et al. [Improve Transformer Models with Better Relative Position Embeddings](https://arxiv.org/abs/2009.13658)
|
||||
* `zhiheng-huang/bert-large-uncased-whole-word-masking-embedding-relative-key-query`, fine-tuned from model
|
||||
`bert-large-uncased-whole-word-masking` with 3 additional epochs with relative embedding method 4 in Huang et al.
|
||||
`google-bert/bert-large-uncased-whole-word-masking` with 3 additional epochs with relative embedding method 4 in Huang et al.
|
||||
[Improve Transformer Models with Better Relative Position Embeddings](https://arxiv.org/abs/2009.13658)
|
||||
|
||||
|
||||
@@ -61,7 +61,7 @@ torchrun --nproc_per_node=8 ./examples/question-answering/run_squad.py \
|
||||
--gradient_accumulation_steps 3
|
||||
```
|
||||
Training with the above command leads to the f1 score of 93.52, which is slightly better than the f1 score of 93.15 for
|
||||
`bert-large-uncased-whole-word-masking`.
|
||||
`google-bert/bert-large-uncased-whole-word-masking`.
|
||||
|
||||
#### Distributed training
|
||||
|
||||
@@ -69,7 +69,7 @@ Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word
|
||||
|
||||
```bash
|
||||
torchrun --nproc_per_node=8 ./examples/question-answering/run_squad.py \
|
||||
--model_name_or_path bert-large-uncased-whole-word-masking \
|
||||
--model_name_or_path google-bert/bert-large-uncased-whole-word-masking \
|
||||
--dataset_name squad \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
@@ -90,7 +90,7 @@ exact_match = 86.91
|
||||
```
|
||||
|
||||
This fine-tuned model is available as a checkpoint under the reference
|
||||
[`bert-large-uncased-whole-word-masking-finetuned-squad`](https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad).
|
||||
[`google-bert/bert-large-uncased-whole-word-masking-finetuned-squad`](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad).
|
||||
|
||||
## Results
|
||||
|
||||
|
||||
@@ -39,8 +39,8 @@ def fill_mask(masked_input, model, tokenizer, topk=5):
|
||||
return topk_filled_outputs
|
||||
|
||||
|
||||
tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
|
||||
model = CamembertForMaskedLM.from_pretrained("camembert-base")
|
||||
tokenizer = CamembertTokenizer.from_pretrained("almanach/camembert-base")
|
||||
model = CamembertForMaskedLM.from_pretrained("almanach/camembert-base")
|
||||
model.eval()
|
||||
|
||||
masked_input = "Le camembert est <mask> :)"
|
||||
|
||||
@@ -20,7 +20,7 @@
|
||||
|
||||
This script with default values fine-tunes and evaluate a pretrained OpenAI GPT on the RocStories dataset:
|
||||
python run_openai_gpt.py \
|
||||
--model_name openai-gpt \
|
||||
--model_name openai-community/openai-gpt \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--train_dataset "$ROC_STORIES_DIR/cloze_test_val__spring2016 - cloze_test_ALL_val.csv" \
|
||||
@@ -104,7 +104,7 @@ def pre_process_datasets(encoded_datasets, input_len, cap_length, start_token, d
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--model_name", type=str, default="openai-gpt", help="pretrained model name")
|
||||
parser.add_argument("--model_name", type=str, default="openai-community/openai-gpt", help="pretrained model name")
|
||||
parser.add_argument("--do_train", action="store_true", help="Whether to run training.")
|
||||
parser.add_argument("--do_eval", action="store_true", help="Whether to run eval on the dev set.")
|
||||
parser.add_argument(
|
||||
|
||||
@@ -40,7 +40,7 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="PyTorch Transformer Language Model")
|
||||
parser.add_argument("--model_name", type=str, default="transfo-xl-wt103", help="pretrained model name")
|
||||
parser.add_argument("--model_name", type=str, default="transfo-xl/transfo-xl-wt103", help="pretrained model name")
|
||||
parser.add_argument(
|
||||
"--split", type=str, default="test", choices=["all", "valid", "test"], help="which split to evaluate"
|
||||
)
|
||||
|
||||
@@ -170,7 +170,7 @@ If 'translation' is in your task name, the computed metric will be BLEU. Otherwi
|
||||
For t5, you need to specify --task translation_{src}_to_{tgt} as follows:
|
||||
```bash
|
||||
export DATA_DIR=wmt_en_ro
|
||||
./run_eval.py t5-base \
|
||||
./run_eval.py google-t5/t5-base \
|
||||
$DATA_DIR/val.source t5_val_generations.txt \
|
||||
--reference_path $DATA_DIR/val.target \
|
||||
--score_path enro_bleu.json \
|
||||
|
||||
@@ -28,7 +28,7 @@ from transformers.testing_utils import TestCasePlus, slow
|
||||
from utils import FAIRSEQ_AVAILABLE, DistributedSortishSampler, LegacySeq2SeqDataset, Seq2SeqDataset
|
||||
|
||||
|
||||
BERT_BASE_CASED = "bert-base-cased"
|
||||
BERT_BASE_CASED = "google-bert/bert-base-cased"
|
||||
PEGASUS_XSUM = "google/pegasus-xsum"
|
||||
ARTICLES = [" Sam ate lunch today.", "Sams lunch ingredients."]
|
||||
SUMMARIES = ["A very interesting story about what I ate for lunch.", "Avocado, celery, turkey, coffee"]
|
||||
|
||||
@@ -74,7 +74,7 @@ def pack_data_dir(tok, data_dir: Path, max_tokens, save_path):
|
||||
|
||||
def packer_cli():
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--tok_name", type=str, help="like facebook/bart-large-cnn,t5-base, etc.")
|
||||
parser.add_argument("--tok_name", type=str, help="like facebook/bart-large-cnn,google-t5/t5-base, etc.")
|
||||
parser.add_argument("--max_seq_len", type=int, default=128)
|
||||
parser.add_argument("--data_dir", type=str)
|
||||
parser.add_argument("--save_path", type=str)
|
||||
|
||||
@@ -124,7 +124,7 @@ def run_generate():
|
||||
parser.add_argument(
|
||||
"--model_name",
|
||||
type=str,
|
||||
help="like facebook/bart-large-cnn,t5-base, etc.",
|
||||
help="like facebook/bart-large-cnn,google-t5/t5-base, etc.",
|
||||
default="sshleifer/distilbart-xsum-12-3",
|
||||
)
|
||||
parser.add_argument("--save_dir", type=str, help="where to save", default="tmp_gen")
|
||||
|
||||
@@ -100,7 +100,7 @@ def run_generate(verbose=True):
|
||||
"""
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("model_name", type=str, help="like facebook/bart-large-cnn,t5-base, etc.")
|
||||
parser.add_argument("model_name", type=str, help="like facebook/bart-large-cnn,google-t5/t5-base, etc.")
|
||||
parser.add_argument("input_path", type=str, help="like cnn_dm/test.source")
|
||||
parser.add_argument("save_path", type=str, help="where to save summaries")
|
||||
parser.add_argument("--reference_path", type=str, required=False, help="like cnn_dm/test.target")
|
||||
|
||||
@@ -34,7 +34,7 @@ Let's define some variables that we need for further pre-processing steps and tr
|
||||
|
||||
```bash
|
||||
export MAX_LENGTH=128
|
||||
export BERT_MODEL=bert-base-multilingual-cased
|
||||
export BERT_MODEL=google-bert/bert-base-multilingual-cased
|
||||
```
|
||||
|
||||
Run the pre-processing script on training, dev and test datasets:
|
||||
@@ -92,7 +92,7 @@ Instead of passing all parameters via commandline arguments, the `run_ner.py` sc
|
||||
{
|
||||
"data_dir": ".",
|
||||
"labels": "./labels.txt",
|
||||
"model_name_or_path": "bert-base-multilingual-cased",
|
||||
"model_name_or_path": "google-bert/bert-base-multilingual-cased",
|
||||
"output_dir": "germeval-model",
|
||||
"max_seq_length": 128,
|
||||
"num_train_epochs": 3,
|
||||
@@ -222,7 +222,7 @@ Let's define some variables that we need for further pre-processing steps:
|
||||
|
||||
```bash
|
||||
export MAX_LENGTH=128
|
||||
export BERT_MODEL=bert-large-cased
|
||||
export BERT_MODEL=google-bert/bert-large-cased
|
||||
```
|
||||
|
||||
Here we use the English BERT large model for fine-tuning.
|
||||
@@ -250,7 +250,7 @@ This configuration file looks like:
|
||||
{
|
||||
"data_dir": "./data_wnut_17",
|
||||
"labels": "./data_wnut_17/labels.txt",
|
||||
"model_name_or_path": "bert-large-cased",
|
||||
"model_name_or_path": "google-bert/bert-large-cased",
|
||||
"output_dir": "wnut-17-model-1",
|
||||
"max_seq_length": 128,
|
||||
"num_train_epochs": 3,
|
||||
|
||||
@@ -113,7 +113,7 @@ class TokenClassificationTask:
|
||||
for word, label in zip(example.words, example.labels):
|
||||
word_tokens = tokenizer.tokenize(word)
|
||||
|
||||
# bert-base-multilingual-cased sometimes output "nothing ([]) when calling tokenize with just a space.
|
||||
# google-bert/bert-base-multilingual-cased sometimes output "nothing ([]) when calling tokenize with just a space.
|
||||
if len(word_tokens) > 0:
|
||||
tokens.extend(word_tokens)
|
||||
# Use the real label id for the first token of the word, and padding ids for the remaining tokens
|
||||
|
||||
@@ -109,7 +109,7 @@ classification MNLI task using the `run_glue` script, with 8 GPUs:
|
||||
```bash
|
||||
torchrun \
|
||||
--nproc_per_node 8 pytorch/text-classification/run_glue.py \
|
||||
--model_name_or_path bert-large-uncased-whole-word-masking \
|
||||
--model_name_or_path google-bert/bert-large-uncased-whole-word-masking \
|
||||
--task_name mnli \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
@@ -153,7 +153,7 @@ classification MNLI task using the `run_glue` script, with 8 TPUs (from this fol
|
||||
```bash
|
||||
python xla_spawn.py --num_cores 8 \
|
||||
text-classification/run_glue.py \
|
||||
--model_name_or_path bert-large-uncased-whole-word-masking \
|
||||
--model_name_or_path google-bert/bert-large-uncased-whole-word-masking \
|
||||
--task_name mnli \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
|
||||
@@ -64,10 +64,10 @@ from transformers import (
|
||||
)
|
||||
|
||||
model = VisionTextDualEncoderModel.from_vision_text_pretrained(
|
||||
"openai/clip-vit-base-patch32", "roberta-base"
|
||||
"openai/clip-vit-base-patch32", "FacebookAI/roberta-base"
|
||||
)
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
|
||||
tokenizer = AutoTokenizer.from_pretrained("FacebookAI/roberta-base")
|
||||
image_processor = AutoImageProcessor.from_pretrained("openai/clip-vit-base-patch32")
|
||||
processor = VisionTextDualEncoderProcessor(image_processor, tokenizer)
|
||||
|
||||
|
||||
@@ -36,7 +36,7 @@ the tokenization). The loss here is that of causal language modeling.
|
||||
|
||||
```bash
|
||||
python run_clm.py \
|
||||
--model_name_or_path gpt2 \
|
||||
--model_name_or_path openai-community/gpt2 \
|
||||
--dataset_name wikitext \
|
||||
--dataset_config_name wikitext-2-raw-v1 \
|
||||
--per_device_train_batch_size 8 \
|
||||
@@ -53,7 +53,7 @@ To run on your own training and validation files, use the following command:
|
||||
|
||||
```bash
|
||||
python run_clm.py \
|
||||
--model_name_or_path gpt2 \
|
||||
--model_name_or_path openai-community/gpt2 \
|
||||
--train_file path_to_train_file \
|
||||
--validation_file path_to_validation_file \
|
||||
--per_device_train_batch_size 8 \
|
||||
@@ -69,7 +69,7 @@ This uses the built in HuggingFace `Trainer` for training. If you want to use a
|
||||
python run_clm_no_trainer.py \
|
||||
--dataset_name wikitext \
|
||||
--dataset_config_name wikitext-2-raw-v1 \
|
||||
--model_name_or_path gpt2 \
|
||||
--model_name_or_path openai-community/gpt2 \
|
||||
--output_dir /tmp/test-clm
|
||||
```
|
||||
|
||||
@@ -84,7 +84,7 @@ converge slightly slower (over-fitting takes more epochs).
|
||||
|
||||
```bash
|
||||
python run_mlm.py \
|
||||
--model_name_or_path roberta-base \
|
||||
--model_name_or_path FacebookAI/roberta-base \
|
||||
--dataset_name wikitext \
|
||||
--dataset_config_name wikitext-2-raw-v1 \
|
||||
--per_device_train_batch_size 8 \
|
||||
@@ -98,7 +98,7 @@ To run on your own training and validation files, use the following command:
|
||||
|
||||
```bash
|
||||
python run_mlm.py \
|
||||
--model_name_or_path roberta-base \
|
||||
--model_name_or_path FacebookAI/roberta-base \
|
||||
--train_file path_to_train_file \
|
||||
--validation_file path_to_validation_file \
|
||||
--per_device_train_batch_size 8 \
|
||||
@@ -117,7 +117,7 @@ This uses the built in HuggingFace `Trainer` for training. If you want to use a
|
||||
python run_mlm_no_trainer.py \
|
||||
--dataset_name wikitext \
|
||||
--dataset_config_name wikitext-2-raw-v1 \
|
||||
--model_name_or_path roberta-base \
|
||||
--model_name_or_path FacebookAI/roberta-base \
|
||||
--output_dir /tmp/test-mlm
|
||||
```
|
||||
|
||||
@@ -144,7 +144,7 @@ Here is how to fine-tune XLNet on wikitext-2:
|
||||
|
||||
```bash
|
||||
python run_plm.py \
|
||||
--model_name_or_path=xlnet-base-cased \
|
||||
--model_name_or_path=xlnet/xlnet-base-cased \
|
||||
--dataset_name wikitext \
|
||||
--dataset_config_name wikitext-2-raw-v1 \
|
||||
--per_device_train_batch_size 8 \
|
||||
@@ -158,7 +158,7 @@ To fine-tune it on your own training and validation file, run:
|
||||
|
||||
```bash
|
||||
python run_plm.py \
|
||||
--model_name_or_path=xlnet-base-cased \
|
||||
--model_name_or_path=xlnet/xlnet-base-cased \
|
||||
--train_file path_to_train_file \
|
||||
--validation_file path_to_validation_file \
|
||||
--per_device_train_batch_size 8 \
|
||||
@@ -188,7 +188,7 @@ When training a model from scratch, configuration values may be overridden with
|
||||
|
||||
|
||||
```bash
|
||||
python run_clm.py --model_type gpt2 --tokenizer_name gpt2 \ --config_overrides="n_embd=1024,n_head=16,n_layer=48,n_positions=102" \
|
||||
python run_clm.py --model_type openai-community/gpt2 --tokenizer_name openai-community/gpt2 \ --config_overrides="n_embd=1024,n_head=16,n_layer=48,n_positions=102" \
|
||||
[...]
|
||||
```
|
||||
|
||||
|
||||
@@ -22,7 +22,7 @@ limitations under the License.
|
||||
|
||||
```bash
|
||||
python examples/multiple-choice/run_swag.py \
|
||||
--model_name_or_path roberta-base \
|
||||
--model_name_or_path FacebookAI/roberta-base \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--learning_rate 5e-5 \
|
||||
@@ -62,7 +62,7 @@ then
|
||||
export DATASET_NAME=swag
|
||||
|
||||
python run_swag_no_trainer.py \
|
||||
--model_name_or_path bert-base-cased \
|
||||
--model_name_or_path google-bert/bert-base-cased \
|
||||
--dataset_name $DATASET_NAME \
|
||||
--max_seq_length 128 \
|
||||
--per_device_train_batch_size 32 \
|
||||
@@ -89,7 +89,7 @@ that will check everything is ready for training. Finally, you can launch traini
|
||||
export DATASET_NAME=swag
|
||||
|
||||
accelerate launch run_swag_no_trainer.py \
|
||||
--model_name_or_path bert-base-cased \
|
||||
--model_name_or_path google-bert/bert-base-cased \
|
||||
--dataset_name $DATASET_NAME \
|
||||
--max_seq_length 128 \
|
||||
--per_device_train_batch_size 32 \
|
||||
|
||||
@@ -54,7 +54,7 @@ class TorchXLAExamplesTests(TestCasePlus):
|
||||
./examples/pytorch/text-classification/run_glue.py
|
||||
--num_cores=8
|
||||
./examples/pytorch/text-classification/run_glue.py
|
||||
--model_name_or_path distilbert-base-uncased
|
||||
--model_name_or_path distilbert/distilbert-base-uncased
|
||||
--output_dir {tmp_dir}
|
||||
--overwrite_output_dir
|
||||
--train_file ./tests/fixtures/tests_samples/MRPC/train.csv
|
||||
|
||||
@@ -40,7 +40,7 @@ on a single tesla V100 16GB.
|
||||
|
||||
```bash
|
||||
python run_qa.py \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--model_name_or_path google-bert/bert-base-uncased \
|
||||
--dataset_name squad \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
@@ -67,7 +67,7 @@ The [`run_qa_beam_search.py`](https://github.com/huggingface/transformers/blob/m
|
||||
|
||||
```bash
|
||||
python run_qa_beam_search.py \
|
||||
--model_name_or_path xlnet-large-cased \
|
||||
--model_name_or_path xlnet/xlnet-large-cased \
|
||||
--dataset_name squad \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
@@ -87,7 +87,7 @@ python run_qa_beam_search.py \
|
||||
export SQUAD_DIR=/path/to/SQUAD
|
||||
|
||||
python run_qa_beam_search.py \
|
||||
--model_name_or_path xlnet-large-cased \
|
||||
--model_name_or_path xlnet/xlnet-large-cased \
|
||||
--dataset_name squad_v2 \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
@@ -111,7 +111,7 @@ This example code fine-tunes T5 on the SQuAD2.0 dataset.
|
||||
|
||||
```bash
|
||||
python run_seq2seq_qa.py \
|
||||
--model_name_or_path t5-small \
|
||||
--model_name_or_path google-t5/t5-small \
|
||||
--dataset_name squad_v2 \
|
||||
--context_column context \
|
||||
--question_column question \
|
||||
@@ -143,7 +143,7 @@ then
|
||||
|
||||
```bash
|
||||
python run_qa_no_trainer.py \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--model_name_or_path google-bert/bert-base-uncased \
|
||||
--dataset_name squad \
|
||||
--max_seq_length 384 \
|
||||
--doc_stride 128 \
|
||||
@@ -166,7 +166,7 @@ that will check everything is ready for training. Finally, you can launch traini
|
||||
|
||||
```bash
|
||||
accelerate launch run_qa_no_trainer.py \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--model_name_or_path google-bert/bert-base-uncased \
|
||||
--dataset_name squad \
|
||||
--max_seq_length 384 \
|
||||
--doc_stride 128 \
|
||||
|
||||
@@ -41,7 +41,7 @@ and you also will find examples of these below.
|
||||
Here is an example on a summarization task:
|
||||
```bash
|
||||
python examples/pytorch/summarization/run_summarization.py \
|
||||
--model_name_or_path t5-small \
|
||||
--model_name_or_path google-t5/t5-small \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--dataset_name cnn_dailymail \
|
||||
@@ -54,9 +54,9 @@ python examples/pytorch/summarization/run_summarization.py \
|
||||
--predict_with_generate
|
||||
```
|
||||
|
||||
Only T5 models `t5-small`, `t5-base`, `t5-large`, `t5-3b` and `t5-11b` must use an additional argument: `--source_prefix "summarize: "`.
|
||||
Only T5 models `google-t5/t5-small`, `google-t5/t5-base`, `google-t5/t5-large`, `google-t5/t5-3b` and `google-t5/t5-11b` must use an additional argument: `--source_prefix "summarize: "`.
|
||||
|
||||
We used CNN/DailyMail dataset in this example as `t5-small` was trained on it and one can get good scores even when pre-training with a very small sample.
|
||||
We used CNN/DailyMail dataset in this example as `google-t5/t5-small` was trained on it and one can get good scores even when pre-training with a very small sample.
|
||||
|
||||
Extreme Summarization (XSum) Dataset is another commonly used dataset for the task of summarization. To use it replace `--dataset_name cnn_dailymail --dataset_config "3.0.0"` with `--dataset_name xsum`.
|
||||
|
||||
@@ -65,7 +65,7 @@ And here is how you would use it on your own files, after adjusting the values f
|
||||
|
||||
```bash
|
||||
python examples/pytorch/summarization/run_summarization.py \
|
||||
--model_name_or_path t5-small \
|
||||
--model_name_or_path google-t5/t5-small \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--train_file path_to_csv_or_jsonlines_file \
|
||||
@@ -156,7 +156,7 @@ then
|
||||
|
||||
```bash
|
||||
python run_summarization_no_trainer.py \
|
||||
--model_name_or_path t5-small \
|
||||
--model_name_or_path google-t5/t5-small \
|
||||
--dataset_name cnn_dailymail \
|
||||
--dataset_config "3.0.0" \
|
||||
--source_prefix "summarize: " \
|
||||
@@ -179,7 +179,7 @@ that will check everything is ready for training. Finally, you can launch traini
|
||||
|
||||
```bash
|
||||
accelerate launch run_summarization_no_trainer.py \
|
||||
--model_name_or_path t5-small \
|
||||
--model_name_or_path google-t5/t5-small \
|
||||
--dataset_name cnn_dailymail \
|
||||
--dataset_config "3.0.0" \
|
||||
--source_prefix "summarize: " \
|
||||
|
||||
@@ -368,11 +368,11 @@ def main():
|
||||
logger.info(f"Training/evaluation parameters {training_args}")
|
||||
|
||||
if data_args.source_prefix is None and model_args.model_name_or_path in [
|
||||
"t5-small",
|
||||
"t5-base",
|
||||
"t5-large",
|
||||
"t5-3b",
|
||||
"t5-11b",
|
||||
"google-t5/t5-small",
|
||||
"google-t5/t5-base",
|
||||
"google-t5/t5-large",
|
||||
"google-t5/t5-3b",
|
||||
"google-t5/t5-11b",
|
||||
]:
|
||||
logger.warning(
|
||||
"You're running a t5 model but didn't provide a source prefix, which is the expected, e.g. with "
|
||||
|
||||
@@ -339,11 +339,11 @@ def main():
|
||||
|
||||
accelerator = Accelerator(gradient_accumulation_steps=args.gradient_accumulation_steps, **accelerator_log_kwargs)
|
||||
if args.source_prefix is None and args.model_name_or_path in [
|
||||
"t5-small",
|
||||
"t5-base",
|
||||
"t5-large",
|
||||
"t5-3b",
|
||||
"t5-11b",
|
||||
"google-t5/t5-small",
|
||||
"google-t5/t5-base",
|
||||
"google-t5/t5-large",
|
||||
"google-t5/t5-3b",
|
||||
"google-t5/t5-11b",
|
||||
]:
|
||||
logger.warning(
|
||||
"You're running a t5 model but didn't provide a source prefix, which is the expected, e.g. with "
|
||||
|
||||
@@ -80,7 +80,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
{self.examples_dir}/pytorch/text-classification/run_glue_no_trainer.py
|
||||
--model_name_or_path distilbert-base-uncased
|
||||
--model_name_or_path distilbert/distilbert-base-uncased
|
||||
--output_dir {tmp_dir}
|
||||
--train_file ./tests/fixtures/tests_samples/MRPC/train.csv
|
||||
--validation_file ./tests/fixtures/tests_samples/MRPC/dev.csv
|
||||
@@ -105,7 +105,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
{self.examples_dir}/pytorch/language-modeling/run_clm_no_trainer.py
|
||||
--model_name_or_path distilgpt2
|
||||
--model_name_or_path distilbert/distilgpt2
|
||||
--train_file ./tests/fixtures/sample_text.txt
|
||||
--validation_file ./tests/fixtures/sample_text.txt
|
||||
--block_size 128
|
||||
@@ -133,7 +133,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
{self.examples_dir}/pytorch/language-modeling/run_mlm_no_trainer.py
|
||||
--model_name_or_path distilroberta-base
|
||||
--model_name_or_path distilbert/distilroberta-base
|
||||
--train_file ./tests/fixtures/sample_text.txt
|
||||
--validation_file ./tests/fixtures/sample_text.txt
|
||||
--output_dir {tmp_dir}
|
||||
@@ -156,7 +156,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
{self.examples_dir}/pytorch/token-classification/run_ner_no_trainer.py
|
||||
--model_name_or_path bert-base-uncased
|
||||
--model_name_or_path google-bert/bert-base-uncased
|
||||
--train_file tests/fixtures/tests_samples/conll/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/conll/sample.json
|
||||
--output_dir {tmp_dir}
|
||||
@@ -181,7 +181,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
{self.examples_dir}/pytorch/question-answering/run_qa_no_trainer.py
|
||||
--model_name_or_path bert-base-uncased
|
||||
--model_name_or_path google-bert/bert-base-uncased
|
||||
--version_2_with_negative
|
||||
--train_file tests/fixtures/tests_samples/SQUAD/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/SQUAD/sample.json
|
||||
@@ -209,7 +209,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
{self.examples_dir}/pytorch/multiple-choice/run_swag_no_trainer.py
|
||||
--model_name_or_path bert-base-uncased
|
||||
--model_name_or_path google-bert/bert-base-uncased
|
||||
--train_file tests/fixtures/tests_samples/swag/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/swag/sample.json
|
||||
--output_dir {tmp_dir}
|
||||
@@ -232,7 +232,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
{self.examples_dir}/pytorch/summarization/run_summarization_no_trainer.py
|
||||
--model_name_or_path t5-small
|
||||
--model_name_or_path google-t5/t5-small
|
||||
--train_file tests/fixtures/tests_samples/xsum/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/xsum/sample.json
|
||||
--output_dir {tmp_dir}
|
||||
|
||||
@@ -99,7 +99,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_glue.py
|
||||
--model_name_or_path distilbert-base-uncased
|
||||
--model_name_or_path distilbert/distilbert-base-uncased
|
||||
--output_dir {tmp_dir}
|
||||
--overwrite_output_dir
|
||||
--train_file ./tests/fixtures/tests_samples/MRPC/train.csv
|
||||
@@ -127,7 +127,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_clm.py
|
||||
--model_name_or_path distilgpt2
|
||||
--model_name_or_path distilbert/distilgpt2
|
||||
--train_file ./tests/fixtures/sample_text.txt
|
||||
--validation_file ./tests/fixtures/sample_text.txt
|
||||
--do_train
|
||||
@@ -160,7 +160,7 @@ class ExamplesTests(TestCasePlus):
|
||||
testargs = f"""
|
||||
run_clm.py
|
||||
--model_type gpt2
|
||||
--tokenizer_name gpt2
|
||||
--tokenizer_name openai-community/gpt2
|
||||
--train_file ./tests/fixtures/sample_text.txt
|
||||
--output_dir {tmp_dir}
|
||||
--config_overrides n_embd=10,n_head=2
|
||||
@@ -181,7 +181,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_mlm.py
|
||||
--model_name_or_path distilroberta-base
|
||||
--model_name_or_path distilbert/distilroberta-base
|
||||
--train_file ./tests/fixtures/sample_text.txt
|
||||
--validation_file ./tests/fixtures/sample_text.txt
|
||||
--output_dir {tmp_dir}
|
||||
@@ -207,7 +207,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_ner.py
|
||||
--model_name_or_path bert-base-uncased
|
||||
--model_name_or_path google-bert/bert-base-uncased
|
||||
--train_file tests/fixtures/tests_samples/conll/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/conll/sample.json
|
||||
--output_dir {tmp_dir}
|
||||
@@ -235,7 +235,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_qa.py
|
||||
--model_name_or_path bert-base-uncased
|
||||
--model_name_or_path google-bert/bert-base-uncased
|
||||
--version_2_with_negative
|
||||
--train_file tests/fixtures/tests_samples/SQUAD/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/SQUAD/sample.json
|
||||
@@ -260,7 +260,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_seq2seq_qa.py
|
||||
--model_name_or_path t5-small
|
||||
--model_name_or_path google-t5/t5-small
|
||||
--context_column context
|
||||
--question_column question
|
||||
--answer_column answers
|
||||
@@ -289,7 +289,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_swag.py
|
||||
--model_name_or_path bert-base-uncased
|
||||
--model_name_or_path google-bert/bert-base-uncased
|
||||
--train_file tests/fixtures/tests_samples/swag/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/swag/sample.json
|
||||
--output_dir {tmp_dir}
|
||||
@@ -327,7 +327,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_summarization.py
|
||||
--model_name_or_path t5-small
|
||||
--model_name_or_path google-t5/t5-small
|
||||
--train_file tests/fixtures/tests_samples/xsum/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/xsum/sample.json
|
||||
--output_dir {tmp_dir}
|
||||
|
||||
@@ -31,7 +31,7 @@ GLUE is made up of a total of 9 different tasks. Here is how to run the script o
|
||||
export TASK_NAME=mrpc
|
||||
|
||||
python run_glue.py \
|
||||
--model_name_or_path bert-base-cased \
|
||||
--model_name_or_path google-bert/bert-base-cased \
|
||||
--task_name $TASK_NAME \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
@@ -68,7 +68,7 @@ The following example fine-tunes BERT on the `imdb` dataset hosted on our [hub](
|
||||
|
||||
```bash
|
||||
python run_glue.py \
|
||||
--model_name_or_path bert-base-cased \
|
||||
--model_name_or_path google-bert/bert-base-cased \
|
||||
--dataset_name imdb \
|
||||
--do_train \
|
||||
--do_predict \
|
||||
@@ -90,7 +90,7 @@ We can specify the metric, the label column and aso choose which text columns to
|
||||
dataset="amazon_reviews_multi"
|
||||
subset="en"
|
||||
python run_classification.py \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--model_name_or_path google-bert/bert-base-uncased \
|
||||
--dataset_name ${dataset} \
|
||||
--dataset_config_name ${subset} \
|
||||
--shuffle_train_dataset \
|
||||
@@ -113,7 +113,7 @@ The following is a multi-label classification example. It fine-tunes BERT on the
|
||||
dataset="reuters21578"
|
||||
subset="ModApte"
|
||||
python run_classification.py \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--model_name_or_path google-bert/bert-base-uncased \
|
||||
--dataset_name ${dataset} \
|
||||
--dataset_config_name ${subset} \
|
||||
--shuffle_train_dataset \
|
||||
@@ -175,7 +175,7 @@ then
|
||||
export TASK_NAME=mrpc
|
||||
|
||||
python run_glue_no_trainer.py \
|
||||
--model_name_or_path bert-base-cased \
|
||||
--model_name_or_path google-bert/bert-base-cased \
|
||||
--task_name $TASK_NAME \
|
||||
--max_length 128 \
|
||||
--per_device_train_batch_size 32 \
|
||||
@@ -202,7 +202,7 @@ that will check everything is ready for training. Finally, you can launch traini
|
||||
export TASK_NAME=mrpc
|
||||
|
||||
accelerate launch run_glue_no_trainer.py \
|
||||
--model_name_or_path bert-base-cased \
|
||||
--model_name_or_path google-bert/bert-base-cased \
|
||||
--task_name $TASK_NAME \
|
||||
--max_length 128 \
|
||||
--per_device_train_batch_size 32 \
|
||||
@@ -232,7 +232,7 @@ This example code fine-tunes mBERT (multi-lingual BERT) on the XNLI dataset. It
|
||||
|
||||
```bash
|
||||
python run_xnli.py \
|
||||
--model_name_or_path bert-base-multilingual-cased \
|
||||
--model_name_or_path google-bert/bert-base-multilingual-cased \
|
||||
--language de \
|
||||
--train_language en \
|
||||
--do_train \
|
||||
|
||||
@@ -26,6 +26,6 @@ Example usage:
|
||||
|
||||
```bash
|
||||
python run_generation.py \
|
||||
--model_type=gpt2 \
|
||||
--model_name_or_path=gpt2
|
||||
--model_type=openai-community/gpt2 \
|
||||
--model_name_or_path=openai-community/gpt2
|
||||
```
|
||||
|
||||
@@ -16,7 +16,7 @@
|
||||
""" The examples of running contrastive search on the auto-APIs;
|
||||
|
||||
Running this example:
|
||||
python run_generation_contrastive_search.py --model_name_or_path=gpt2-large --penalty_alpha=0.6 --k=4 --length=256
|
||||
python run_generation_contrastive_search.py --model_name_or_path=openai-community/gpt2-large --penalty_alpha=0.6 --k=4 --length=256
|
||||
"""
|
||||
|
||||
|
||||
|
||||
@@ -29,7 +29,7 @@ The following example fine-tunes BERT on CoNLL-2003:
|
||||
|
||||
```bash
|
||||
python run_ner.py \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--model_name_or_path google-bert/bert-base-uncased \
|
||||
--dataset_name conll2003 \
|
||||
--output_dir /tmp/test-ner \
|
||||
--do_train \
|
||||
@@ -42,7 +42,7 @@ To run on your own training and validation files, use the following command:
|
||||
|
||||
```bash
|
||||
python run_ner.py \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--model_name_or_path google-bert/bert-base-uncased \
|
||||
--train_file path_to_train_file \
|
||||
--validation_file path_to_validation_file \
|
||||
--output_dir /tmp/test-ner \
|
||||
@@ -84,7 +84,7 @@ then
|
||||
export TASK_NAME=ner
|
||||
|
||||
python run_ner_no_trainer.py \
|
||||
--model_name_or_path bert-base-cased \
|
||||
--model_name_or_path google-bert/bert-base-cased \
|
||||
--dataset_name conll2003 \
|
||||
--task_name $TASK_NAME \
|
||||
--max_length 128 \
|
||||
@@ -112,7 +112,7 @@ that will check everything is ready for training. Finally, you can launch traini
|
||||
export TASK_NAME=ner
|
||||
|
||||
accelerate launch run_ner_no_trainer.py \
|
||||
--model_name_or_path bert-base-cased \
|
||||
--model_name_or_path google-bert/bert-base-cased \
|
||||
--dataset_name conll2003 \
|
||||
--task_name $TASK_NAME \
|
||||
--max_length 128 \
|
||||
|
||||
@@ -59,11 +59,11 @@ python examples/pytorch/translation/run_translation.py \
|
||||
|
||||
MBart and some T5 models require special handling.
|
||||
|
||||
T5 models `t5-small`, `t5-base`, `t5-large`, `t5-3b` and `t5-11b` must use an additional argument: `--source_prefix "translate {source_lang} to {target_lang}"`. For example:
|
||||
T5 models `google-t5/t5-small`, `google-t5/t5-base`, `google-t5/t5-large`, `google-t5/t5-3b` and `google-t5/t5-11b` must use an additional argument: `--source_prefix "translate {source_lang} to {target_lang}"`. For example:
|
||||
|
||||
```bash
|
||||
python examples/pytorch/translation/run_translation.py \
|
||||
--model_name_or_path t5-small \
|
||||
--model_name_or_path google-t5/t5-small \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--source_lang en \
|
||||
@@ -105,7 +105,7 @@ values for the arguments `--train_file`, `--validation_file` to match your setup
|
||||
|
||||
```bash
|
||||
python examples/pytorch/translation/run_translation.py \
|
||||
--model_name_or_path t5-small \
|
||||
--model_name_or_path google-t5/t5-small \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--source_lang en \
|
||||
@@ -134,7 +134,7 @@ If you want to use a pre-processed dataset that leads to high BLEU scores, but f
|
||||
|
||||
```bash
|
||||
python examples/pytorch/translation/run_translation.py \
|
||||
--model_name_or_path t5-small \
|
||||
--model_name_or_path google-t5/t5-small \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--source_lang en \
|
||||
|
||||
@@ -317,11 +317,11 @@ def main():
|
||||
logger.info(f"Training/evaluation parameters {training_args}")
|
||||
|
||||
if data_args.source_prefix is None and model_args.model_name_or_path in [
|
||||
"t5-small",
|
||||
"t5-base",
|
||||
"t5-large",
|
||||
"t5-3b",
|
||||
"t5-11b",
|
||||
"google-t5/t5-small",
|
||||
"google-t5/t5-base",
|
||||
"google-t5/t5-large",
|
||||
"google-t5/t5-3b",
|
||||
"google-t5/t5-11b",
|
||||
]:
|
||||
logger.warning(
|
||||
"You're running a t5 model but didn't provide a source prefix, which is expected, e.g. with "
|
||||
|
||||
@@ -15,7 +15,7 @@ export TASK_NAME=MRPC
|
||||
|
||||
python ./run_glue_with_pabee.py \
|
||||
--model_type albert \
|
||||
--model_name_or_path bert-base-uncased/albert-base-v2 \
|
||||
--model_name_or_path google-bert/bert-base-uncased/albert/albert-base-v2 \
|
||||
--task_name $TASK_NAME \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
|
||||
@@ -276,8 +276,8 @@ class AlbertForSequenceClassificationWithPabee(AlbertPreTrainedModel):
|
||||
from torch import nn
|
||||
import torch
|
||||
|
||||
tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')
|
||||
model = AlbertForSequenceClassificationWithPabee.from_pretrained('albert-base-v2')
|
||||
tokenizer = AlbertTokenizer.from_pretrained('albert/albert-base-v2')
|
||||
model = AlbertForSequenceClassificationWithPabee.from_pretrained('albert/albert-base-v2')
|
||||
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute")).unsqueeze(0) # Batch size 1
|
||||
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
|
||||
outputs = model(input_ids, labels=labels)
|
||||
|
||||
@@ -300,8 +300,8 @@ class BertForSequenceClassificationWithPabee(BertPreTrainedModel):
|
||||
from torch import nn
|
||||
import torch
|
||||
|
||||
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
||||
model = BertForSequenceClassificationWithPabee.from_pretrained('bert-base-uncased')
|
||||
tokenizer = BertTokenizer.from_pretrained('google-bert/bert-base-uncased')
|
||||
model = BertForSequenceClassificationWithPabee.from_pretrained('google-bert/bert-base-uncased')
|
||||
|
||||
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
|
||||
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
|
||||
|
||||
@@ -29,7 +29,7 @@ class PabeeTests(TestCasePlus):
|
||||
testargs = f"""
|
||||
run_glue_with_pabee.py
|
||||
--model_type albert
|
||||
--model_name_or_path albert-base-v2
|
||||
--model_name_or_path albert/albert-base-v2
|
||||
--data_dir ./tests/fixtures/tests_samples/MRPC/
|
||||
--output_dir {tmp_dir}
|
||||
--overwrite_output_dir
|
||||
|
||||
@@ -107,7 +107,7 @@ def convert_bertabs_checkpoints(path_to_checkpoints, dump_path):
|
||||
# ----------------------------------
|
||||
|
||||
logging.info("Make sure that the models' outputs are identical")
|
||||
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
|
||||
tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
|
||||
|
||||
# prepare the model inputs
|
||||
encoder_input_ids = tokenizer.encode("This is sample éàalj'-.")
|
||||
|
||||
@@ -128,7 +128,7 @@ class Bert(nn.Module):
|
||||
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
config = BertConfig.from_pretrained("bert-base-uncased")
|
||||
config = BertConfig.from_pretrained("google-bert/bert-base-uncased")
|
||||
self.model = BertModel(config)
|
||||
|
||||
def forward(self, input_ids, attention_mask=None, token_type_ids=None, **kwargs):
|
||||
|
||||
@@ -29,7 +29,7 @@ Batch = namedtuple("Batch", ["document_names", "batch_size", "src", "segs", "mas
|
||||
|
||||
|
||||
def evaluate(args):
|
||||
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased", do_lower_case=True)
|
||||
tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased", do_lower_case=True)
|
||||
model = BertAbs.from_pretrained("remi/bertabs-finetuned-extractive-abstractive-summarization")
|
||||
model.to(args.device)
|
||||
model.eval()
|
||||
|
||||
@@ -79,7 +79,7 @@ python scripts/pretokenizing.py \
|
||||
Before training a new model for code we create a new tokenizer that is efficient at code tokenization. To train the tokenizer you can run the following command:
|
||||
```bash
|
||||
python scripts/bpe_training.py \
|
||||
--base_tokenizer gpt2 \
|
||||
--base_tokenizer openai-community/gpt2 \
|
||||
--dataset_name codeparrot/codeparrot-clean-train
|
||||
```
|
||||
|
||||
@@ -90,12 +90,12 @@ The models are randomly initialized and trained from scratch. To initialize a ne
|
||||
|
||||
```bash
|
||||
python scripts/initialize_model.py \
|
||||
--config_name gpt2-large \
|
||||
--config_name openai-community/gpt2-large \
|
||||
--tokenizer_name codeparrot/codeparrot \
|
||||
--model_name codeparrot \
|
||||
--push_to_hub True
|
||||
```
|
||||
This will initialize a new model with the architecture and configuration of `gpt2-large` and use the tokenizer to appropriately size the input embeddings. Finally, the initilaized model is pushed the hub.
|
||||
This will initialize a new model with the architecture and configuration of `openai-community/gpt2-large` and use the tokenizer to appropriately size the input embeddings. Finally, the initilaized model is pushed the hub.
|
||||
|
||||
We can either pass the name of a text dataset or a pretokenized dataset which speeds up training a bit.
|
||||
Now that the tokenizer and model are also ready we can start training the model. The main training script is built with `accelerate` to scale across a wide range of platforms and infrastructure scales. We train two models with [110M](https://huggingface.co/codeparrot/codeparrot-small/) and [1.5B](https://huggingface.co/codeparrot/codeparrot/) parameters for 25-30B tokens on a 16xA100 (40GB) machine which takes 1 day and 1 week, respectively.
|
||||
|
||||
@@ -172,7 +172,7 @@ class TokenizerTrainingArguments:
|
||||
"""
|
||||
|
||||
base_tokenizer: Optional[str] = field(
|
||||
default="gpt2", metadata={"help": "Base tokenizer to build new tokenizer from."}
|
||||
default="openai-community/gpt2", metadata={"help": "Base tokenizer to build new tokenizer from."}
|
||||
)
|
||||
dataset_name: Optional[str] = field(
|
||||
default="transformersbook/codeparrot-train", metadata={"help": "Dataset to train tokenizer on."}
|
||||
@@ -211,7 +211,7 @@ class InitializationArguments:
|
||||
"""
|
||||
|
||||
config_name: Optional[str] = field(
|
||||
default="gpt2-large", metadata={"help": "Configuration to use for model initialization."}
|
||||
default="openai-community/gpt2-large", metadata={"help": "Configuration to use for model initialization."}
|
||||
)
|
||||
tokenizer_name: Optional[str] = field(
|
||||
default="codeparrot/codeparrot", metadata={"help": "Tokenizer attached to model."}
|
||||
|
||||
@@ -48,7 +48,7 @@ class DeeBertTests(TestCasePlus):
|
||||
def test_glue_deebert_train(self):
|
||||
train_args = """
|
||||
--model_type roberta
|
||||
--model_name_or_path roberta-base
|
||||
--model_name_or_path FacebookAI/roberta-base
|
||||
--task_name MRPC
|
||||
--do_train
|
||||
--do_eval
|
||||
@@ -61,7 +61,7 @@ class DeeBertTests(TestCasePlus):
|
||||
--num_train_epochs 3
|
||||
--overwrite_output_dir
|
||||
--seed 42
|
||||
--output_dir ./examples/deebert/saved_models/roberta-base/MRPC/two_stage
|
||||
--output_dir ./examples/deebert/saved_models/FacebookAI/roberta-base/MRPC/two_stage
|
||||
--plot_data_dir ./examples/deebert/results/
|
||||
--save_steps 0
|
||||
--overwrite_cache
|
||||
@@ -71,12 +71,12 @@ class DeeBertTests(TestCasePlus):
|
||||
|
||||
eval_args = """
|
||||
--model_type roberta
|
||||
--model_name_or_path ./examples/deebert/saved_models/roberta-base/MRPC/two_stage
|
||||
--model_name_or_path ./examples/deebert/saved_models/FacebookAI/roberta-base/MRPC/two_stage
|
||||
--task_name MRPC
|
||||
--do_eval
|
||||
--do_lower_case
|
||||
--data_dir ./tests/fixtures/tests_samples/MRPC/
|
||||
--output_dir ./examples/deebert/saved_models/roberta-base/MRPC/two_stage
|
||||
--output_dir ./examples/deebert/saved_models/FacebookAI/roberta-base/MRPC/two_stage
|
||||
--plot_data_dir ./examples/deebert/results/
|
||||
--max_seq_length 128
|
||||
--eval_each_highway
|
||||
@@ -88,12 +88,12 @@ class DeeBertTests(TestCasePlus):
|
||||
|
||||
entropy_eval_args = """
|
||||
--model_type roberta
|
||||
--model_name_or_path ./examples/deebert/saved_models/roberta-base/MRPC/two_stage
|
||||
--model_name_or_path ./examples/deebert/saved_models/FacebookAI/roberta-base/MRPC/two_stage
|
||||
--task_name MRPC
|
||||
--do_eval
|
||||
--do_lower_case
|
||||
--data_dir ./tests/fixtures/tests_samples/MRPC/
|
||||
--output_dir ./examples/deebert/saved_models/roberta-base/MRPC/two_stage
|
||||
--output_dir ./examples/deebert/saved_models/FacebookAI/roberta-base/MRPC/two_stage
|
||||
--plot_data_dir ./examples/deebert/results/
|
||||
--max_seq_length 128
|
||||
--early_exit_entropy 0.1
|
||||
|
||||
@@ -64,7 +64,7 @@ To fine-tune a transformer model with IGF on a language modeling task, use the f
|
||||
|
||||
```python
|
||||
python run_clm_igf.py\
|
||||
--model_name_or_path "gpt2" \
|
||||
--model_name_or_path "openai-community/gpt2" \
|
||||
--data_file="data/tokenized_stories_train_wikitext103" \
|
||||
--igf_data_file="data/IGF_values" \
|
||||
--context_len 32 \
|
||||
|
||||
@@ -69,9 +69,9 @@ def compute_perplexity(model, test_data, context_len):
|
||||
return perplexity
|
||||
|
||||
|
||||
def load_gpt2(model_name="gpt2"):
|
||||
def load_gpt2(model_name="openai-community/gpt2"):
|
||||
"""
|
||||
load original gpt2 and save off for quicker loading
|
||||
load original openai-community/gpt2 and save off for quicker loading
|
||||
|
||||
Args:
|
||||
model_name: GPT-2
|
||||
|
||||
@@ -84,7 +84,7 @@ def generate_n_pairs(
|
||||
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
|
||||
|
||||
# load pretrained model
|
||||
model = load_gpt2("gpt2").to(device)
|
||||
model = load_gpt2("openai-community/gpt2").to(device)
|
||||
print("computing perplexity on objective set")
|
||||
orig_perp = compute_perplexity(model, objective_set, context_len).item()
|
||||
print("perplexity on objective set:", orig_perp)
|
||||
@@ -121,7 +121,7 @@ def training_secondary_learner(
|
||||
set_seed(42)
|
||||
|
||||
# Load pre-trained model
|
||||
model = GPT2LMHeadModel.from_pretrained("gpt2")
|
||||
model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2")
|
||||
|
||||
# Initialize secondary learner to use embedding weights of model
|
||||
secondary_learner = SecondaryLearner(model)
|
||||
@@ -153,7 +153,7 @@ def finetune(
|
||||
recopy_model=recopy_gpt2,
|
||||
secondary_learner=None,
|
||||
eval_interval=10,
|
||||
finetuned_model_name="gpt2_finetuned.pt",
|
||||
finetuned_model_name="openai-community/gpt2_finetuned.pt",
|
||||
):
|
||||
"""
|
||||
fine-tune with IGF if secondary_learner is not None, else standard fine-tuning
|
||||
@@ -346,7 +346,10 @@ def main():
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--batch_size", default=16, type=int, help="batch size of training data of language model(gpt2) "
|
||||
"--batch_size",
|
||||
default=16,
|
||||
type=int,
|
||||
help="batch size of training data of language model(openai-community/gpt2) ",
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
@@ -383,7 +386,9 @@ def main():
|
||||
),
|
||||
)
|
||||
|
||||
parser.add_argument("--finetuned_model_name", default="gpt2_finetuned.pt", type=str, help="finetuned_model_name")
|
||||
parser.add_argument(
|
||||
"--finetuned_model_name", default="openai-community/gpt2_finetuned.pt", type=str, help="finetuned_model_name"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--recopy_model",
|
||||
@@ -416,16 +421,16 @@ def main():
|
||||
igf_model_path="igf_model.pt",
|
||||
)
|
||||
|
||||
# load pretrained gpt2 model
|
||||
model = GPT2LMHeadModel.from_pretrained("gpt2")
|
||||
# load pretrained openai-community/gpt2 model
|
||||
model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2")
|
||||
set_seed(42)
|
||||
|
||||
# Generate train and test data to train and evaluate gpt2 model
|
||||
# Generate train and test data to train and evaluate openai-community/gpt2 model
|
||||
train_dataset, test_dataset = generate_datasets(
|
||||
context_len=32, file="data/tokenized_stories_train_wikitext103.jbl", number=100, min_len=1026, trim=True
|
||||
)
|
||||
|
||||
# fine-tuning of the gpt2 model using igf (Information Gain Filtration)
|
||||
# fine-tuning of the openai-community/gpt2 model using igf (Information Gain Filtration)
|
||||
finetune(
|
||||
model,
|
||||
train_dataset,
|
||||
@@ -437,7 +442,7 @@ def main():
|
||||
recopy_model=recopy_gpt2,
|
||||
secondary_learner=secondary_learner,
|
||||
eval_interval=10,
|
||||
finetuned_model_name="gpt2_finetuned.pt",
|
||||
finetuned_model_name="openai-community/gpt2_finetuned.pt",
|
||||
)
|
||||
|
||||
|
||||
|
||||
@@ -159,13 +159,13 @@ to be used, but that everybody in team is on the same page on what type of model
|
||||
To give an example, a well-defined project would be the following:
|
||||
|
||||
- task: summarization
|
||||
- model: [t5-small](https://huggingface.co/t5-small)
|
||||
- model: [google-t5/t5-small](https://huggingface.co/google-t5/t5-small)
|
||||
- dataset: [CNN/Daily mail](https://huggingface.co/datasets/cnn_dailymail)
|
||||
- training script: [run_summarization_flax.py](https://github.com/huggingface/transformers/blob/main/examples/flax/summarization/run_summarization_flax.py)
|
||||
- outcome: t5 model that can summarize news
|
||||
- work flow: adapt `run_summarization_flax.py` to work with `t5-small`.
|
||||
- work flow: adapt `run_summarization_flax.py` to work with `google-t5/t5-small`.
|
||||
|
||||
This example is a very easy and not the most interesting project since a `t5-small`
|
||||
This example is a very easy and not the most interesting project since a `google-t5/t5-small`
|
||||
summarization model exists already for CNN/Daily mail and pretty much no code has to be
|
||||
written.
|
||||
A well-defined project does not need to have the dataset be part of
|
||||
@@ -335,7 +335,7 @@ dataset = load_dataset('oscar', "unshuffled_deduplicated_en", split='train', str
|
||||
|
||||
dummy_input = next(iter(dataset))["text"]
|
||||
|
||||
tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
|
||||
tokenizer = RobertaTokenizerFast.from_pretrained("FacebookAI/roberta-base")
|
||||
input_ids = tokenizer(dummy_input, return_tensors="np").input_ids[:, :10]
|
||||
|
||||
model = FlaxRobertaModel.from_pretrained("julien-c/dummy-unknown")
|
||||
@@ -492,7 +492,7 @@ dataset = load_dataset('oscar', "unshuffled_deduplicated_en", split='train', str
|
||||
|
||||
dummy_input = next(iter(dataset))["text"]
|
||||
|
||||
tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
|
||||
tokenizer = RobertaTokenizerFast.from_pretrained("FacebookAI/roberta-base")
|
||||
input_ids = tokenizer(dummy_input, return_tensors="np").input_ids[:, :10]
|
||||
|
||||
model = FlaxRobertaModel.from_pretrained("julien-c/dummy-unknown")
|
||||
@@ -518,7 +518,7 @@ be available in a couple of days.
|
||||
- [BigBird](https://github.com/huggingface/transformers/blob/main/src/transformers/models/big_bird/modeling_flax_big_bird.py)
|
||||
- [CLIP](https://github.com/huggingface/transformers/blob/main/src/transformers/models/clip/modeling_flax_clip.py)
|
||||
- [ELECTRA](https://github.com/huggingface/transformers/blob/main/src/transformers/models/electra/modeling_flax_electra.py)
|
||||
- [GPT2](https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_flax_gpt2.py)
|
||||
- [GPT2](https://github.com/huggingface/transformers/blob/main/src/transformers/models/openai-community/gpt2/modeling_flax_gpt2.py)
|
||||
- [(TODO) MBART](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mbart/modeling_flax_mbart.py)
|
||||
- [RoBERTa](https://github.com/huggingface/transformers/blob/main/src/transformers/models/roberta/modeling_flax_roberta.py)
|
||||
- [T5](https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_flax_t5.py)
|
||||
@@ -729,7 +729,7 @@ Let's use the base `FlaxRobertaModel` without any heads as an example.
|
||||
from transformers import FlaxRobertaModel, RobertaTokenizerFast
|
||||
import jax
|
||||
|
||||
tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
|
||||
tokenizer = RobertaTokenizerFast.from_pretrained("FacebookAI/roberta-base")
|
||||
inputs = tokenizer("JAX/Flax is amazing ", padding="max_length", max_length=128, return_tensors="np")
|
||||
|
||||
model = FlaxRobertaModel.from_pretrained("julien-c/dummy-unknown")
|
||||
@@ -1011,7 +1011,7 @@ and run the following commands in a Python shell to save a config.
|
||||
```python
|
||||
from transformers import RobertaConfig
|
||||
|
||||
config = RobertaConfig.from_pretrained("roberta-base")
|
||||
config = RobertaConfig.from_pretrained("FacebookAI/roberta-base")
|
||||
config.save_pretrained("./")
|
||||
```
|
||||
|
||||
@@ -1193,12 +1193,12 @@ All the widgets are open sourced in the `huggingface_hub` [repo](https://github.
|
||||
**NLP**
|
||||
* **Conversational:** To have the best conversations!. [Example](https://huggingface.co/microsoft/DialoGPT-large?).
|
||||
* **Feature Extraction:** Retrieve the input embeddings. [Example](https://huggingface.co/sentence-transformers/distilbert-base-nli-mean-tokens?text=test).
|
||||
* **Fill Mask:** Predict potential words for a mask token. [Example](https://huggingface.co/bert-base-uncased?).
|
||||
* **Question Answering:** Given a context and a question, predict the answer. [Example](https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad).
|
||||
* **Fill Mask:** Predict potential words for a mask token. [Example](https://huggingface.co/google-bert/bert-base-uncased?).
|
||||
* **Question Answering:** Given a context and a question, predict the answer. [Example](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad).
|
||||
* **Sentence Simmilarity:** Predict how similar a set of sentences are. Useful for Sentence Transformers.
|
||||
* **Summarization:** Given a text, output a summary of it. [Example](https://huggingface.co/sshleifer/distilbart-cnn-12-6).
|
||||
* **Table Question Answering:** Given a table and a question, predict the answer. [Example](https://huggingface.co/google/tapas-base-finetuned-wtq).
|
||||
* **Text Generation:** Generate text based on a prompt. [Example](https://huggingface.co/gpt2)
|
||||
* **Text Generation:** Generate text based on a prompt. [Example](https://huggingface.co/openai-community/gpt2)
|
||||
* **Token Classification:** Useful for tasks such as Named Entity Recognition and Part of Speech. [Example](https://huggingface.co/dslim/bert-base-NER).
|
||||
* **Zero-Shot Classification:** Too cool to explain with words. Here is an [example](https://huggingface.co/typeform/distilbert-base-uncased-mnli)
|
||||
* ([WIP](https://github.com/huggingface/huggingface_hub/issues/99)) **Table to Text Generation**.
|
||||
|
||||
@@ -31,7 +31,7 @@ without ever having to download the full dataset.
|
||||
In the following, we demonstrate how to train a bi-directional transformer model
|
||||
using masked language modeling objective as introduced in [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
|
||||
More specifically, we demonstrate how JAX/Flax and dataset streaming can be leveraged
|
||||
to pre-train [**`roberta-base`**](https://huggingface.co/roberta-base)
|
||||
to pre-train [**`FacebookAI/roberta-base`**](https://huggingface.co/FacebookAI/roberta-base)
|
||||
in English on a single TPUv3-8 pod for 10000 update steps.
|
||||
|
||||
The example script uses the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.
|
||||
@@ -80,8 +80,8 @@ from transformers import RobertaTokenizerFast, RobertaConfig
|
||||
|
||||
model_dir = "./english-roberta-base-dummy"
|
||||
|
||||
tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
|
||||
config = RobertaConfig.from_pretrained("roberta-base")
|
||||
tokenizer = RobertaTokenizerFast.from_pretrained("FacebookAI/roberta-base")
|
||||
config = RobertaConfig.from_pretrained("FacebookAI/roberta-base")
|
||||
|
||||
tokenizer.save_pretrained(model_dir)
|
||||
config.save_pretrained(model_dir)
|
||||
|
||||
@@ -32,7 +32,7 @@ Models written in JAX/Flax are **immutable** and updated in a purely functional
|
||||
way which enables simple and efficient model parallelism.
|
||||
|
||||
In this example we will use the vision model from [CLIP](https://huggingface.co/models?filter=clip)
|
||||
as the image encoder and [`roberta-base`](https://huggingface.co/roberta-base) as the text encoder.
|
||||
as the image encoder and [`FacebookAI/roberta-base`](https://huggingface.co/FacebookAI/roberta-base) as the text encoder.
|
||||
Note that one can also use the [ViT](https://huggingface.co/models?filter=vit) model as image encoder and any other BERT or ROBERTa model as text encoder.
|
||||
To train the model on languages other than English one should choose a text encoder trained on the desired
|
||||
language and a image-text dataset in that language. One such dataset is [WIT](https://github.com/google-research-datasets/wit).
|
||||
@@ -76,7 +76,7 @@ Here is an example of how to load the model using pre-trained text and vision mo
|
||||
```python
|
||||
from modeling_hybrid_clip import FlaxHybridCLIP
|
||||
|
||||
model = FlaxHybridCLIP.from_text_vision_pretrained("bert-base-uncased", "openai/clip-vit-base-patch32")
|
||||
model = FlaxHybridCLIP.from_text_vision_pretrained("google-bert/bert-base-uncased", "openai/clip-vit-base-patch32")
|
||||
|
||||
# save the model
|
||||
model.save_pretrained("bert-clip")
|
||||
@@ -89,7 +89,7 @@ If the checkpoints are in PyTorch then one could pass `text_from_pt=True` and `v
|
||||
PyTorch checkpoints convert them to flax and load the model.
|
||||
|
||||
```python
|
||||
model = FlaxHybridCLIP.from_text_vision_pretrained("bert-base-uncased", "openai/clip-vit-base-patch32", text_from_pt=True, vision_from_pt=True)
|
||||
model = FlaxHybridCLIP.from_text_vision_pretrained("google-bert/bert-base-uncased", "openai/clip-vit-base-patch32", text_from_pt=True, vision_from_pt=True)
|
||||
```
|
||||
|
||||
This loads both the text and vision encoders using pre-trained weights, the projection layers are randomly
|
||||
@@ -154,9 +154,9 @@ Next we can run the example script to train the model:
|
||||
```bash
|
||||
python run_hybrid_clip.py \
|
||||
--output_dir ${MODEL_DIR} \
|
||||
--text_model_name_or_path="roberta-base" \
|
||||
--text_model_name_or_path="FacebookAI/roberta-base" \
|
||||
--vision_model_name_or_path="openai/clip-vit-base-patch32" \
|
||||
--tokenizer_name="roberta-base" \
|
||||
--tokenizer_name="FacebookAI/roberta-base" \
|
||||
--train_file="coco_dataset/train_dataset.json" \
|
||||
--validation_file="coco_dataset/validation_dataset.json" \
|
||||
--do_train --do_eval \
|
||||
|
||||
@@ -314,8 +314,6 @@ class FlaxHybridCLIP(FlaxPreTrainedModel):
|
||||
Information necessary to initiate the text model. Can be either:
|
||||
|
||||
- A string, the `model id` of a pretrained model hosted inside a model repo on huggingface.co.
|
||||
Valid model ids can be located at the root-level, like ``bert-base-uncased``, or namespaced under
|
||||
a user or organization name, like ``dbmdz/bert-base-german-cased``.
|
||||
- A path to a `directory` containing model weights saved using
|
||||
:func:`~transformers.FlaxPreTrainedModel.save_pretrained`, e.g., ``./my_model_directory/``.
|
||||
- A path or url to a `PyTorch checkpoint folder` (e.g, ``./pt_model``). In
|
||||
@@ -327,8 +325,6 @@ class FlaxHybridCLIP(FlaxPreTrainedModel):
|
||||
Information necessary to initiate the vision model. Can be either:
|
||||
|
||||
- A string, the `model id` of a pretrained model hosted inside a model repo on huggingface.co.
|
||||
Valid model ids can be located at the root-level, like ``bert-base-uncased``, or namespaced under
|
||||
a user or organization name, like ``dbmdz/bert-base-german-cased``.
|
||||
- A path to a `directory` containing model weights saved using
|
||||
:func:`~transformers.FlaxPreTrainedModel.save_pretrained`, e.g., ``./my_model_directory/``.
|
||||
- A path or url to a `PyTorch checkpoint folder` (e.g, ``./pt_model``). In
|
||||
@@ -354,7 +350,7 @@ class FlaxHybridCLIP(FlaxPreTrainedModel):
|
||||
>>> from transformers import FlaxHybridCLIP
|
||||
>>> # initialize a model from pretrained BERT and CLIP models. Note that the projection layers will be randomly initialized.
|
||||
>>> # If using CLIP's vision model the vision projection layer will be initialized using pre-trained weights
|
||||
>>> model = FlaxHybridCLIP.from_text_vision_pretrained('bert-base-uncased', 'openai/clip-vit-base-patch32')
|
||||
>>> model = FlaxHybridCLIP.from_text_vision_pretrained('google-bert/bert-base-uncased', 'openai/clip-vit-base-patch32')
|
||||
>>> # saving model after fine-tuning
|
||||
>>> model.save_pretrained("./bert-clip")
|
||||
>>> # load fine-tuned model
|
||||
|
||||
@@ -54,7 +54,7 @@ model.save_pretrained("gpt-neo-1.3B")
|
||||
```bash
|
||||
python run_clm_mp.py \
|
||||
--model_name_or_path gpt-neo-1.3B \
|
||||
--tokenizer_name gpt2 \
|
||||
--tokenizer_name openai-community/gpt2 \
|
||||
--dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
|
||||
--do_train --do_eval \
|
||||
--block_size 1024 \
|
||||
|
||||
@@ -36,7 +36,7 @@ def load_models():
|
||||
_ = s2s_model.eval()
|
||||
else:
|
||||
s2s_tokenizer, s2s_model = make_qa_s2s_model(
|
||||
model_name="t5-small", from_file="seq2seq_models/eli5_t5_model_1024_4.pth", device="cuda:0"
|
||||
model_name="google-t5/t5-small", from_file="seq2seq_models/eli5_t5_model_1024_4.pth", device="cuda:0"
|
||||
)
|
||||
return (qar_tokenizer, qar_model, s2s_tokenizer, s2s_model)
|
||||
|
||||
|
||||
@@ -32,7 +32,7 @@ to that word). This technique has been refined for Chinese in [this paper](https
|
||||
To fine-tune a model using whole word masking, use the following script:
|
||||
```bash
|
||||
python run_mlm_wwm.py \
|
||||
--model_name_or_path roberta-base \
|
||||
--model_name_or_path FacebookAI/roberta-base \
|
||||
--dataset_name wikitext \
|
||||
--dataset_config_name wikitext-2-raw-v1 \
|
||||
--do_train \
|
||||
@@ -83,7 +83,7 @@ export VALIDATION_REF_FILE=/path/to/validation/chinese_ref/file
|
||||
export OUTPUT_DIR=/tmp/test-mlm-wwm
|
||||
|
||||
python run_mlm_wwm.py \
|
||||
--model_name_or_path roberta-base \
|
||||
--model_name_or_path FacebookAI/roberta-base \
|
||||
--train_file $TRAIN_FILE \
|
||||
--validation_file $VALIDATION_FILE \
|
||||
--train_ref_file $TRAIN_REF_FILE \
|
||||
|
||||
@@ -10,7 +10,7 @@ Based on the script [`run_mmimdb.py`](https://github.com/huggingface/transformer
|
||||
python run_mmimdb.py \
|
||||
--data_dir /path/to/mmimdb/dataset/ \
|
||||
--model_type bert \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--model_name_or_path google-bert/bert-base-uncased \
|
||||
--output_dir /path/to/save/dir/ \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
|
||||
@@ -61,7 +61,7 @@ python examples/movement-pruning/masked_run_squad.py \
|
||||
--predict_file dev-v1.1.json \
|
||||
--do_train --do_eval --do_lower_case \
|
||||
--model_type masked_bert \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--model_name_or_path google-bert/bert-base-uncased \
|
||||
--per_gpu_train_batch_size 16 \
|
||||
--warmup_steps 5400 \
|
||||
--num_train_epochs 10 \
|
||||
@@ -84,7 +84,7 @@ python examples/movement-pruning/masked_run_squad.py \
|
||||
--predict_file dev-v1.1.json \
|
||||
--do_train --do_eval --do_lower_case \
|
||||
--model_type masked_bert \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--model_name_or_path google-bert/bert-base-uncased \
|
||||
--per_gpu_train_batch_size 16 \
|
||||
--warmup_steps 5400 \
|
||||
--num_train_epochs 10 \
|
||||
@@ -104,7 +104,7 @@ python examples/movement-pruning/masked_run_squad.py \
|
||||
--predict_file dev-v1.1.json \
|
||||
--do_train --do_eval --do_lower_case \
|
||||
--model_type masked_bert \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--model_name_or_path google-bert/bert-base-uncased \
|
||||
--per_gpu_train_batch_size 16 \
|
||||
--warmup_steps 5400 \
|
||||
--num_train_epochs 10 \
|
||||
@@ -124,7 +124,7 @@ python examples/movement-pruning/masked_run_squad.py \
|
||||
--predict_file dev-v1.1.json \
|
||||
--do_train --do_eval --do_lower_case \
|
||||
--model_type masked_bert \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--model_name_or_path google-bert/bert-base-uncased \
|
||||
--per_gpu_train_batch_size 16 \
|
||||
--warmup_steps 5400 \
|
||||
--num_train_epochs 10 \
|
||||
|
||||
@@ -10,8 +10,8 @@ Paper authors: Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyo
|
||||
|
||||
## Examples
|
||||
|
||||
`sanity_script.sh` will launch performer fine-tuning from the bert-base-cased checkpoint on the Simple Wikipedia dataset (a small, easy-language English Wikipedia) from `datasets`.
|
||||
`full_script.sh` will launch performer fine-tuning from the bert-large-cased checkpoint on the English Wikipedia dataset from `datasets`.
|
||||
`sanity_script.sh` will launch performer fine-tuning from the google-bert/bert-base-cased checkpoint on the Simple Wikipedia dataset (a small, easy-language English Wikipedia) from `datasets`.
|
||||
`full_script.sh` will launch performer fine-tuning from the google-bert/bert-large-cased checkpoint on the English Wikipedia dataset from `datasets`.
|
||||
|
||||
Here are a few key arguments:
|
||||
- Remove the `--performer` argument to use a standard Bert model.
|
||||
|
||||
@@ -61,7 +61,7 @@ DISCRIMINATOR_MODELS_PARAMS = {
|
||||
"embed_size": 1024,
|
||||
"class_vocab": {"non_clickbait": 0, "clickbait": 1},
|
||||
"default_class": 1,
|
||||
"pretrained_model": "gpt2-medium",
|
||||
"pretrained_model": "openai-community/gpt2-medium",
|
||||
},
|
||||
"sentiment": {
|
||||
"url": "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/discriminators/SST_classifier_head.pt",
|
||||
@@ -69,7 +69,7 @@ DISCRIMINATOR_MODELS_PARAMS = {
|
||||
"embed_size": 1024,
|
||||
"class_vocab": {"very_positive": 2, "very_negative": 3},
|
||||
"default_class": 3,
|
||||
"pretrained_model": "gpt2-medium",
|
||||
"pretrained_model": "openai-community/gpt2-medium",
|
||||
},
|
||||
}
|
||||
|
||||
@@ -585,7 +585,7 @@ def set_generic_model_params(discrim_weights, discrim_meta):
|
||||
|
||||
|
||||
def run_pplm_example(
|
||||
pretrained_model="gpt2-medium",
|
||||
pretrained_model="openai-community/gpt2-medium",
|
||||
cond_text="",
|
||||
uncond=False,
|
||||
num_samples=1,
|
||||
@@ -738,7 +738,7 @@ if __name__ == "__main__":
|
||||
"--pretrained_model",
|
||||
"-M",
|
||||
type=str,
|
||||
default="gpt2-medium",
|
||||
default="openai-community/gpt2-medium",
|
||||
help="pretrained model name or path to local checkpoint",
|
||||
)
|
||||
parser.add_argument("--cond_text", type=str, default="The lake", help="Prefix texts to condition on")
|
||||
|
||||
@@ -45,7 +45,7 @@ max_length_seq = 100
|
||||
class Discriminator(nn.Module):
|
||||
"""Transformer encoder followed by a Classification Head"""
|
||||
|
||||
def __init__(self, class_size, pretrained_model="gpt2-medium", cached_mode=False, device="cpu"):
|
||||
def __init__(self, class_size, pretrained_model="openai-community/gpt2-medium", cached_mode=False, device="cpu"):
|
||||
super().__init__()
|
||||
self.tokenizer = GPT2Tokenizer.from_pretrained(pretrained_model)
|
||||
self.encoder = GPT2LMHeadModel.from_pretrained(pretrained_model)
|
||||
@@ -218,7 +218,7 @@ def get_cached_data_loader(dataset, batch_size, discriminator, shuffle=False, de
|
||||
def train_discriminator(
|
||||
dataset,
|
||||
dataset_fp=None,
|
||||
pretrained_model="gpt2-medium",
|
||||
pretrained_model="openai-community/gpt2-medium",
|
||||
epochs=10,
|
||||
batch_size=64,
|
||||
log_interval=10,
|
||||
@@ -502,7 +502,10 @@ if __name__ == "__main__":
|
||||
help="File path of the dataset to use. Needed only in case of generic datadset",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--pretrained_model", type=str, default="gpt2-medium", help="Pretrained model to use as encoder"
|
||||
"--pretrained_model",
|
||||
type=str,
|
||||
default="openai-community/gpt2-medium",
|
||||
help="Pretrained model to use as encoder",
|
||||
)
|
||||
parser.add_argument("--epochs", type=int, default=10, metavar="N", help="Number of training epochs")
|
||||
parser.add_argument(
|
||||
|
||||
@@ -50,11 +50,11 @@ Calibrate the pretrained model and finetune with quantization awared:
|
||||
|
||||
```bash
|
||||
python3 run_quant_qa.py \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--model_name_or_path google-bert/bert-base-uncased \
|
||||
--dataset_name squad \
|
||||
--max_seq_length 128 \
|
||||
--doc_stride 32 \
|
||||
--output_dir calib/bert-base-uncased \
|
||||
--output_dir calib/google-bert/bert-base-uncased \
|
||||
--do_calib \
|
||||
--calibrator percentile \
|
||||
--percentile 99.99
|
||||
@@ -62,7 +62,7 @@ python3 run_quant_qa.py \
|
||||
|
||||
```bash
|
||||
python3 run_quant_qa.py \
|
||||
--model_name_or_path calib/bert-base-uncased \
|
||||
--model_name_or_path calib/google-bert/bert-base-uncased \
|
||||
--dataset_name squad \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
@@ -71,8 +71,8 @@ python3 run_quant_qa.py \
|
||||
--num_train_epochs 2 \
|
||||
--max_seq_length 128 \
|
||||
--doc_stride 32 \
|
||||
--output_dir finetuned_int8/bert-base-uncased \
|
||||
--tokenizer_name bert-base-uncased \
|
||||
--output_dir finetuned_int8/google-bert/bert-base-uncased \
|
||||
--tokenizer_name google-bert/bert-base-uncased \
|
||||
--save_steps 0
|
||||
```
|
||||
|
||||
@@ -82,14 +82,14 @@ To export the QAT model finetuned above:
|
||||
|
||||
```bash
|
||||
python3 run_quant_qa.py \
|
||||
--model_name_or_path finetuned_int8/bert-base-uncased \
|
||||
--model_name_or_path finetuned_int8/google-bert/bert-base-uncased \
|
||||
--output_dir ./ \
|
||||
--save_onnx \
|
||||
--per_device_eval_batch_size 1 \
|
||||
--max_seq_length 128 \
|
||||
--doc_stride 32 \
|
||||
--dataset_name squad \
|
||||
--tokenizer_name bert-base-uncased
|
||||
--tokenizer_name google-bert/bert-base-uncased
|
||||
```
|
||||
|
||||
Use `--recalibrate-weights` to calibrate the weight ranges according to the quantizer axis. Use `--quant-per-tensor` for per tensor quantization (default is per channel).
|
||||
@@ -117,7 +117,7 @@ python3 evaluate-hf-trt-qa.py \
|
||||
--max_seq_length 128 \
|
||||
--doc_stride 32 \
|
||||
--dataset_name squad \
|
||||
--tokenizer_name bert-base-uncased \
|
||||
--tokenizer_name google-bert/bert-base-uncased \
|
||||
--int8 \
|
||||
--seed 42
|
||||
```
|
||||
@@ -128,14 +128,14 @@ Finetune a fp32 precision model with [transformers/examples/pytorch/question-ans
|
||||
|
||||
```bash
|
||||
python3 ../../pytorch/question-answering/run_qa.py \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--model_name_or_path google-bert/bert-base-uncased \
|
||||
--dataset_name squad \
|
||||
--per_device_train_batch_size 12 \
|
||||
--learning_rate 3e-5 \
|
||||
--num_train_epochs 2 \
|
||||
--max_seq_length 128 \
|
||||
--doc_stride 32 \
|
||||
--output_dir ./finetuned_fp32/bert-base-uncased \
|
||||
--output_dir ./finetuned_fp32/google-bert/bert-base-uncased \
|
||||
--save_steps 0 \
|
||||
--do_train \
|
||||
--do_eval
|
||||
@@ -147,13 +147,13 @@ python3 ../../pytorch/question-answering/run_qa.py \
|
||||
|
||||
```bash
|
||||
python3 run_quant_qa.py \
|
||||
--model_name_or_path ./finetuned_fp32/bert-base-uncased \
|
||||
--model_name_or_path ./finetuned_fp32/google-bert/bert-base-uncased \
|
||||
--dataset_name squad \
|
||||
--calibrator percentile \
|
||||
--percentile 99.99 \
|
||||
--max_seq_length 128 \
|
||||
--doc_stride 32 \
|
||||
--output_dir ./calib/bert-base-uncased \
|
||||
--output_dir ./calib/google-bert/bert-base-uncased \
|
||||
--save_steps 0 \
|
||||
--do_calib \
|
||||
--do_eval
|
||||
@@ -163,14 +163,14 @@ python3 run_quant_qa.py \
|
||||
|
||||
```bash
|
||||
python3 run_quant_qa.py \
|
||||
--model_name_or_path ./calib/bert-base-uncased \
|
||||
--model_name_or_path ./calib/google-bert/bert-base-uncased \
|
||||
--output_dir ./ \
|
||||
--save_onnx \
|
||||
--per_device_eval_batch_size 1 \
|
||||
--max_seq_length 128 \
|
||||
--doc_stride 32 \
|
||||
--dataset_name squad \
|
||||
--tokenizer_name bert-base-uncased
|
||||
--tokenizer_name google-bert/bert-base-uncased
|
||||
```
|
||||
|
||||
### Evaluate the INT8 PTQ ONNX model inference with TensorRT
|
||||
@@ -183,7 +183,7 @@ python3 evaluate-hf-trt-qa.py \
|
||||
--max_seq_length 128 \
|
||||
--doc_stride 32 \
|
||||
--dataset_name squad \
|
||||
--tokenizer_name bert-base-uncased \
|
||||
--tokenizer_name google-bert/bert-base-uncased \
|
||||
--int8 \
|
||||
--seed 42
|
||||
```
|
||||
|
||||
@@ -22,5 +22,5 @@ If you would like to list benchmark results on your favorite models of the [mode
|
||||
|
||||
| Benchmark description | Results | Environment info | Author |
|
||||
|:----------|:-------------|:-------------|------:|
|
||||
| PyTorch Benchmark on inference for `bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
|
||||
| PyTorch Benchmark on inference for `bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
|
||||
| PyTorch Benchmark on inference for `google-bert/bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
|
||||
| PyTorch Benchmark on inference for `google-bert/bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
|
||||
|
||||
@@ -65,7 +65,7 @@ Finally, we can run the example script to train the model:
|
||||
python examples/tensorflow/contrastive-image-text/run_clip.py \
|
||||
--output_dir ./clip-roberta-finetuned \
|
||||
--vision_model_name_or_path openai/clip-vit-base-patch32 \
|
||||
--text_model_name_or_path roberta-base \
|
||||
--text_model_name_or_path FacebookAI/roberta-base \
|
||||
--data_dir $PWD/data \
|
||||
--dataset_name ydshieh/coco_dataset_script \
|
||||
--dataset_config_name=2017 \
|
||||
|
||||
@@ -57,7 +57,7 @@ def parse_args():
|
||||
parser.add_argument(
|
||||
"--pretrained_model_config",
|
||||
type=str,
|
||||
default="roberta-base",
|
||||
default="FacebookAI/roberta-base",
|
||||
help="The model config to use. Note that we don't copy the model's weights, only the config!",
|
||||
)
|
||||
parser.add_argument(
|
||||
|
||||
@@ -43,7 +43,7 @@ This script trains a masked language model.
|
||||
### Example command
|
||||
```bash
|
||||
python run_mlm.py \
|
||||
--model_name_or_path distilbert-base-cased \
|
||||
--model_name_or_path distilbert/distilbert-base-cased \
|
||||
--output_dir output \
|
||||
--dataset_name wikitext \
|
||||
--dataset_config_name wikitext-103-raw-v1
|
||||
@@ -52,7 +52,7 @@ python run_mlm.py \
|
||||
When using a custom dataset, the validation file can be separately passed as an input argument. Otherwise some split (customizable) of training data is used as validation.
|
||||
```bash
|
||||
python run_mlm.py \
|
||||
--model_name_or_path distilbert-base-cased \
|
||||
--model_name_or_path distilbert/distilbert-base-cased \
|
||||
--output_dir output \
|
||||
--train_file train_file_path
|
||||
```
|
||||
@@ -64,7 +64,7 @@ This script trains a causal language model.
|
||||
### Example command
|
||||
```bash
|
||||
python run_clm.py \
|
||||
--model_name_or_path distilgpt2 \
|
||||
--model_name_or_path distilbert/distilgpt2 \
|
||||
--output_dir output \
|
||||
--dataset_name wikitext \
|
||||
--dataset_config_name wikitext-103-raw-v1
|
||||
@@ -74,7 +74,7 @@ When using a custom dataset, the validation file can be separately passed as an
|
||||
|
||||
```bash
|
||||
python run_clm.py \
|
||||
--model_name_or_path distilgpt2 \
|
||||
--model_name_or_path distilbert/distilgpt2 \
|
||||
--output_dir output \
|
||||
--train_file train_file_path
|
||||
```
|
||||
|
||||
@@ -36,7 +36,7 @@ README, but for more information you can see the 'Input Datasets' section of
|
||||
### Example command
|
||||
```bash
|
||||
python run_swag.py \
|
||||
--model_name_or_path distilbert-base-cased \
|
||||
--model_name_or_path distilbert/distilbert-base-cased \
|
||||
--output_dir output \
|
||||
--do_eval \
|
||||
--do_train
|
||||
|
||||
@@ -47,7 +47,7 @@ README, but for more information you can see the 'Input Datasets' section of
|
||||
### Example command
|
||||
```bash
|
||||
python run_qa.py \
|
||||
--model_name_or_path distilbert-base-cased \
|
||||
--model_name_or_path distilbert/distilbert-base-cased \
|
||||
--output_dir output \
|
||||
--dataset_name squad \
|
||||
--do_train \
|
||||
|
||||
@@ -334,11 +334,11 @@ def main():
|
||||
|
||||
# region T5 special-casing
|
||||
if data_args.source_prefix is None and model_args.model_name_or_path in [
|
||||
"t5-small",
|
||||
"t5-base",
|
||||
"t5-large",
|
||||
"t5-3b",
|
||||
"t5-11b",
|
||||
"google-t5/t5-small",
|
||||
"google-t5/t5-base",
|
||||
"google-t5/t5-large",
|
||||
"google-t5/t5-3b",
|
||||
"google-t5/t5-11b",
|
||||
]:
|
||||
logger.warning(
|
||||
"You're running a t5 model but didn't provide a source prefix, which is the expected, e.g. with "
|
||||
|
||||
@@ -107,7 +107,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_text_classification.py
|
||||
--model_name_or_path distilbert-base-uncased
|
||||
--model_name_or_path distilbert/distilbert-base-uncased
|
||||
--output_dir {tmp_dir}
|
||||
--overwrite_output_dir
|
||||
--train_file ./tests/fixtures/tests_samples/MRPC/train.csv
|
||||
@@ -137,7 +137,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_clm.py
|
||||
--model_name_or_path distilgpt2
|
||||
--model_name_or_path distilbert/distilgpt2
|
||||
--train_file ./tests/fixtures/sample_text.txt
|
||||
--validation_file ./tests/fixtures/sample_text.txt
|
||||
--do_train
|
||||
@@ -163,7 +163,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_mlm.py
|
||||
--model_name_or_path distilroberta-base
|
||||
--model_name_or_path distilbert/distilroberta-base
|
||||
--train_file ./tests/fixtures/sample_text.txt
|
||||
--validation_file ./tests/fixtures/sample_text.txt
|
||||
--max_seq_length 64
|
||||
@@ -188,7 +188,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_ner.py
|
||||
--model_name_or_path bert-base-uncased
|
||||
--model_name_or_path google-bert/bert-base-uncased
|
||||
--train_file tests/fixtures/tests_samples/conll/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/conll/sample.json
|
||||
--output_dir {tmp_dir}
|
||||
@@ -212,7 +212,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_qa.py
|
||||
--model_name_or_path bert-base-uncased
|
||||
--model_name_or_path google-bert/bert-base-uncased
|
||||
--version_2_with_negative
|
||||
--train_file tests/fixtures/tests_samples/SQUAD/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/SQUAD/sample.json
|
||||
@@ -237,7 +237,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_swag.py
|
||||
--model_name_or_path bert-base-uncased
|
||||
--model_name_or_path google-bert/bert-base-uncased
|
||||
--train_file tests/fixtures/tests_samples/swag/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/swag/sample.json
|
||||
--output_dir {tmp_dir}
|
||||
@@ -261,7 +261,7 @@ class ExamplesTests(TestCasePlus):
|
||||
tmp_dir = self.get_auto_remove_tmp_dir()
|
||||
testargs = f"""
|
||||
run_summarization.py
|
||||
--model_name_or_path t5-small
|
||||
--model_name_or_path google-t5/t5-small
|
||||
--train_file tests/fixtures/tests_samples/xsum/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/xsum/sample.json
|
||||
--output_dir {tmp_dir}
|
||||
|
||||
@@ -71,7 +71,7 @@ README, but for more information you can see the 'Input Datasets' section of
|
||||
### Example command
|
||||
```bash
|
||||
python run_text_classification.py \
|
||||
--model_name_or_path distilbert-base-cased \
|
||||
--model_name_or_path distilbert/distilbert-base-cased \
|
||||
--train_file training_data.json \
|
||||
--validation_file validation_data.json \
|
||||
--output_dir output/ \
|
||||
@@ -103,7 +103,7 @@ README, but for more information you can see the 'Input Datasets' section of
|
||||
### Example command
|
||||
```bash
|
||||
python run_glue.py \
|
||||
--model_name_or_path distilbert-base-cased \
|
||||
--model_name_or_path distilbert/distilbert-base-cased \
|
||||
--task_name mnli \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
|
||||
@@ -27,7 +27,7 @@ The following example fine-tunes BERT on CoNLL-2003:
|
||||
|
||||
```bash
|
||||
python run_ner.py \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--model_name_or_path google-bert/bert-base-uncased \
|
||||
--dataset_name conll2003 \
|
||||
--output_dir /tmp/test-ner
|
||||
```
|
||||
@@ -36,7 +36,7 @@ To run on your own training and validation files, use the following command:
|
||||
|
||||
```bash
|
||||
python run_ner.py \
|
||||
--model_name_or_path bert-base-uncased \
|
||||
--model_name_or_path google-bert/bert-base-uncased \
|
||||
--train_file path_to_train_file \
|
||||
--validation_file path_to_validation_file \
|
||||
--output_dir /tmp/test-ner
|
||||
|
||||
@@ -29,11 +29,11 @@ can also be used by passing the name of the TPU resource with the `--tpu` argume
|
||||
|
||||
MBart and some T5 models require special handling.
|
||||
|
||||
T5 models `t5-small`, `t5-base`, `t5-large`, `t5-3b` and `t5-11b` must use an additional argument: `--source_prefix "translate {source_lang} to {target_lang}"`. For example:
|
||||
T5 models `google-t5/t5-small`, `google-t5/t5-base`, `google-t5/t5-large`, `google-t5/t5-3b` and `google-t5/t5-11b` must use an additional argument: `--source_prefix "translate {source_lang} to {target_lang}"`. For example:
|
||||
|
||||
```bash
|
||||
python run_translation.py \
|
||||
--model_name_or_path t5-small \
|
||||
--model_name_or_path google-t5/t5-small \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--source_lang en \
|
||||
|
||||
Reference in New Issue
Block a user