Update all references to canonical models (#29001)
* Script & Manual edition * Update
This commit is contained in:
@@ -22,5 +22,5 @@ If you would like to list benchmark results on your favorite models of the [mode
|
||||
|
||||
| Benchmark description | Results | Environment info | Author |
|
||||
|:----------|:-------------|:-------------|------:|
|
||||
| PyTorch Benchmark on inference for `bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
|
||||
| PyTorch Benchmark on inference for `bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
|
||||
| PyTorch Benchmark on inference for `google-bert/bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
|
||||
| PyTorch Benchmark on inference for `google-bert/bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
#### Fine-tuning BERT on SQuAD1.0 with relative position embeddings
|
||||
|
||||
The following examples show how to fine-tune BERT models with different relative position embeddings. The BERT model
|
||||
`bert-base-uncased` was pretrained with default absolute position embeddings. We provide the following pretrained
|
||||
`google-bert/bert-base-uncased` was pretrained with default absolute position embeddings. We provide the following pretrained
|
||||
models which were pre-trained on the same training data (BooksCorpus and English Wikipedia) as in the BERT model
|
||||
training, but with different relative position embeddings.
|
||||
|
||||
@@ -10,7 +10,7 @@ Shaw et al., [Self-Attention with Relative Position Representations](https://arx
|
||||
* `zhiheng-huang/bert-base-uncased-embedding-relative-key-query`, trained from scratch with relative embedding method 4
|
||||
in Huang et al. [Improve Transformer Models with Better Relative Position Embeddings](https://arxiv.org/abs/2009.13658)
|
||||
* `zhiheng-huang/bert-large-uncased-whole-word-masking-embedding-relative-key-query`, fine-tuned from model
|
||||
`bert-large-uncased-whole-word-masking` with 3 additional epochs with relative embedding method 4 in Huang et al.
|
||||
`google-bert/bert-large-uncased-whole-word-masking` with 3 additional epochs with relative embedding method 4 in Huang et al.
|
||||
[Improve Transformer Models with Better Relative Position Embeddings](https://arxiv.org/abs/2009.13658)
|
||||
|
||||
|
||||
@@ -61,7 +61,7 @@ torchrun --nproc_per_node=8 ./examples/question-answering/run_squad.py \
|
||||
--gradient_accumulation_steps 3
|
||||
```
|
||||
Training with the above command leads to the f1 score of 93.52, which is slightly better than the f1 score of 93.15 for
|
||||
`bert-large-uncased-whole-word-masking`.
|
||||
`google-bert/bert-large-uncased-whole-word-masking`.
|
||||
|
||||
#### Distributed training
|
||||
|
||||
@@ -69,7 +69,7 @@ Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word
|
||||
|
||||
```bash
|
||||
torchrun --nproc_per_node=8 ./examples/question-answering/run_squad.py \
|
||||
--model_name_or_path bert-large-uncased-whole-word-masking \
|
||||
--model_name_or_path google-bert/bert-large-uncased-whole-word-masking \
|
||||
--dataset_name squad \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
@@ -90,7 +90,7 @@ exact_match = 86.91
|
||||
```
|
||||
|
||||
This fine-tuned model is available as a checkpoint under the reference
|
||||
[`bert-large-uncased-whole-word-masking-finetuned-squad`](https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad).
|
||||
[`google-bert/bert-large-uncased-whole-word-masking-finetuned-squad`](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad).
|
||||
|
||||
## Results
|
||||
|
||||
|
||||
@@ -39,8 +39,8 @@ def fill_mask(masked_input, model, tokenizer, topk=5):
|
||||
return topk_filled_outputs
|
||||
|
||||
|
||||
tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
|
||||
model = CamembertForMaskedLM.from_pretrained("camembert-base")
|
||||
tokenizer = CamembertTokenizer.from_pretrained("almanach/camembert-base")
|
||||
model = CamembertForMaskedLM.from_pretrained("almanach/camembert-base")
|
||||
model.eval()
|
||||
|
||||
masked_input = "Le camembert est <mask> :)"
|
||||
|
||||
@@ -20,7 +20,7 @@
|
||||
|
||||
This script with default values fine-tunes and evaluate a pretrained OpenAI GPT on the RocStories dataset:
|
||||
python run_openai_gpt.py \
|
||||
--model_name openai-gpt \
|
||||
--model_name openai-community/openai-gpt \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--train_dataset "$ROC_STORIES_DIR/cloze_test_val__spring2016 - cloze_test_ALL_val.csv" \
|
||||
@@ -104,7 +104,7 @@ def pre_process_datasets(encoded_datasets, input_len, cap_length, start_token, d
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--model_name", type=str, default="openai-gpt", help="pretrained model name")
|
||||
parser.add_argument("--model_name", type=str, default="openai-community/openai-gpt", help="pretrained model name")
|
||||
parser.add_argument("--do_train", action="store_true", help="Whether to run training.")
|
||||
parser.add_argument("--do_eval", action="store_true", help="Whether to run eval on the dev set.")
|
||||
parser.add_argument(
|
||||
|
||||
@@ -40,7 +40,7 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="PyTorch Transformer Language Model")
|
||||
parser.add_argument("--model_name", type=str, default="transfo-xl-wt103", help="pretrained model name")
|
||||
parser.add_argument("--model_name", type=str, default="transfo-xl/transfo-xl-wt103", help="pretrained model name")
|
||||
parser.add_argument(
|
||||
"--split", type=str, default="test", choices=["all", "valid", "test"], help="which split to evaluate"
|
||||
)
|
||||
|
||||
@@ -170,7 +170,7 @@ If 'translation' is in your task name, the computed metric will be BLEU. Otherwi
|
||||
For t5, you need to specify --task translation_{src}_to_{tgt} as follows:
|
||||
```bash
|
||||
export DATA_DIR=wmt_en_ro
|
||||
./run_eval.py t5-base \
|
||||
./run_eval.py google-t5/t5-base \
|
||||
$DATA_DIR/val.source t5_val_generations.txt \
|
||||
--reference_path $DATA_DIR/val.target \
|
||||
--score_path enro_bleu.json \
|
||||
|
||||
@@ -28,7 +28,7 @@ from transformers.testing_utils import TestCasePlus, slow
|
||||
from utils import FAIRSEQ_AVAILABLE, DistributedSortishSampler, LegacySeq2SeqDataset, Seq2SeqDataset
|
||||
|
||||
|
||||
BERT_BASE_CASED = "bert-base-cased"
|
||||
BERT_BASE_CASED = "google-bert/bert-base-cased"
|
||||
PEGASUS_XSUM = "google/pegasus-xsum"
|
||||
ARTICLES = [" Sam ate lunch today.", "Sams lunch ingredients."]
|
||||
SUMMARIES = ["A very interesting story about what I ate for lunch.", "Avocado, celery, turkey, coffee"]
|
||||
|
||||
@@ -74,7 +74,7 @@ def pack_data_dir(tok, data_dir: Path, max_tokens, save_path):
|
||||
|
||||
def packer_cli():
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--tok_name", type=str, help="like facebook/bart-large-cnn,t5-base, etc.")
|
||||
parser.add_argument("--tok_name", type=str, help="like facebook/bart-large-cnn,google-t5/t5-base, etc.")
|
||||
parser.add_argument("--max_seq_len", type=int, default=128)
|
||||
parser.add_argument("--data_dir", type=str)
|
||||
parser.add_argument("--save_path", type=str)
|
||||
|
||||
@@ -124,7 +124,7 @@ def run_generate():
|
||||
parser.add_argument(
|
||||
"--model_name",
|
||||
type=str,
|
||||
help="like facebook/bart-large-cnn,t5-base, etc.",
|
||||
help="like facebook/bart-large-cnn,google-t5/t5-base, etc.",
|
||||
default="sshleifer/distilbart-xsum-12-3",
|
||||
)
|
||||
parser.add_argument("--save_dir", type=str, help="where to save", default="tmp_gen")
|
||||
|
||||
@@ -100,7 +100,7 @@ def run_generate(verbose=True):
|
||||
"""
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("model_name", type=str, help="like facebook/bart-large-cnn,t5-base, etc.")
|
||||
parser.add_argument("model_name", type=str, help="like facebook/bart-large-cnn,google-t5/t5-base, etc.")
|
||||
parser.add_argument("input_path", type=str, help="like cnn_dm/test.source")
|
||||
parser.add_argument("save_path", type=str, help="where to save summaries")
|
||||
parser.add_argument("--reference_path", type=str, required=False, help="like cnn_dm/test.target")
|
||||
|
||||
@@ -34,7 +34,7 @@ Let's define some variables that we need for further pre-processing steps and tr
|
||||
|
||||
```bash
|
||||
export MAX_LENGTH=128
|
||||
export BERT_MODEL=bert-base-multilingual-cased
|
||||
export BERT_MODEL=google-bert/bert-base-multilingual-cased
|
||||
```
|
||||
|
||||
Run the pre-processing script on training, dev and test datasets:
|
||||
@@ -92,7 +92,7 @@ Instead of passing all parameters via commandline arguments, the `run_ner.py` sc
|
||||
{
|
||||
"data_dir": ".",
|
||||
"labels": "./labels.txt",
|
||||
"model_name_or_path": "bert-base-multilingual-cased",
|
||||
"model_name_or_path": "google-bert/bert-base-multilingual-cased",
|
||||
"output_dir": "germeval-model",
|
||||
"max_seq_length": 128,
|
||||
"num_train_epochs": 3,
|
||||
@@ -222,7 +222,7 @@ Let's define some variables that we need for further pre-processing steps:
|
||||
|
||||
```bash
|
||||
export MAX_LENGTH=128
|
||||
export BERT_MODEL=bert-large-cased
|
||||
export BERT_MODEL=google-bert/bert-large-cased
|
||||
```
|
||||
|
||||
Here we use the English BERT large model for fine-tuning.
|
||||
@@ -250,7 +250,7 @@ This configuration file looks like:
|
||||
{
|
||||
"data_dir": "./data_wnut_17",
|
||||
"labels": "./data_wnut_17/labels.txt",
|
||||
"model_name_or_path": "bert-large-cased",
|
||||
"model_name_or_path": "google-bert/bert-large-cased",
|
||||
"output_dir": "wnut-17-model-1",
|
||||
"max_seq_length": 128,
|
||||
"num_train_epochs": 3,
|
||||
|
||||
@@ -113,7 +113,7 @@ class TokenClassificationTask:
|
||||
for word, label in zip(example.words, example.labels):
|
||||
word_tokens = tokenizer.tokenize(word)
|
||||
|
||||
# bert-base-multilingual-cased sometimes output "nothing ([]) when calling tokenize with just a space.
|
||||
# google-bert/bert-base-multilingual-cased sometimes output "nothing ([]) when calling tokenize with just a space.
|
||||
if len(word_tokens) > 0:
|
||||
tokens.extend(word_tokens)
|
||||
# Use the real label id for the first token of the word, and padding ids for the remaining tokens
|
||||
|
||||
Reference in New Issue
Block a user