Update all references to canonical models (#29001)

* Script & Manual edition

* Update
This commit is contained in:
Lysandre Debut
2024-02-16 08:16:58 +01:00
committed by GitHub
parent 1e402b957d
commit f497f564bb
561 changed files with 2682 additions and 2687 deletions

View File

@@ -159,13 +159,13 @@ to be used, but that everybody in team is on the same page on what type of model
To give an example, a well-defined project would be the following:
- task: summarization
- model: [t5-small](https://huggingface.co/t5-small)
- model: [google-t5/t5-small](https://huggingface.co/google-t5/t5-small)
- dataset: [CNN/Daily mail](https://huggingface.co/datasets/cnn_dailymail)
- training script: [run_summarization_flax.py](https://github.com/huggingface/transformers/blob/main/examples/flax/summarization/run_summarization_flax.py)
- outcome: t5 model that can summarize news
- work flow: adapt `run_summarization_flax.py` to work with `t5-small`.
- work flow: adapt `run_summarization_flax.py` to work with `google-t5/t5-small`.
This example is a very easy and not the most interesting project since a `t5-small`
This example is a very easy and not the most interesting project since a `google-t5/t5-small`
summarization model exists already for CNN/Daily mail and pretty much no code has to be
written.
A well-defined project does not need to have the dataset be part of
@@ -335,7 +335,7 @@ dataset = load_dataset('oscar', "unshuffled_deduplicated_en", split='train', str
dummy_input = next(iter(dataset))["text"]
tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
tokenizer = RobertaTokenizerFast.from_pretrained("FacebookAI/roberta-base")
input_ids = tokenizer(dummy_input, return_tensors="np").input_ids[:, :10]
model = FlaxRobertaModel.from_pretrained("julien-c/dummy-unknown")
@@ -492,7 +492,7 @@ dataset = load_dataset('oscar', "unshuffled_deduplicated_en", split='train', str
dummy_input = next(iter(dataset))["text"]
tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
tokenizer = RobertaTokenizerFast.from_pretrained("FacebookAI/roberta-base")
input_ids = tokenizer(dummy_input, return_tensors="np").input_ids[:, :10]
model = FlaxRobertaModel.from_pretrained("julien-c/dummy-unknown")
@@ -518,7 +518,7 @@ be available in a couple of days.
- [BigBird](https://github.com/huggingface/transformers/blob/main/src/transformers/models/big_bird/modeling_flax_big_bird.py)
- [CLIP](https://github.com/huggingface/transformers/blob/main/src/transformers/models/clip/modeling_flax_clip.py)
- [ELECTRA](https://github.com/huggingface/transformers/blob/main/src/transformers/models/electra/modeling_flax_electra.py)
- [GPT2](https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_flax_gpt2.py)
- [GPT2](https://github.com/huggingface/transformers/blob/main/src/transformers/models/openai-community/gpt2/modeling_flax_gpt2.py)
- [(TODO) MBART](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mbart/modeling_flax_mbart.py)
- [RoBERTa](https://github.com/huggingface/transformers/blob/main/src/transformers/models/roberta/modeling_flax_roberta.py)
- [T5](https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_flax_t5.py)
@@ -729,7 +729,7 @@ Let's use the base `FlaxRobertaModel` without any heads as an example.
from transformers import FlaxRobertaModel, RobertaTokenizerFast
import jax
tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
tokenizer = RobertaTokenizerFast.from_pretrained("FacebookAI/roberta-base")
inputs = tokenizer("JAX/Flax is amazing ", padding="max_length", max_length=128, return_tensors="np")
model = FlaxRobertaModel.from_pretrained("julien-c/dummy-unknown")
@@ -1011,7 +1011,7 @@ and run the following commands in a Python shell to save a config.
```python
from transformers import RobertaConfig
config = RobertaConfig.from_pretrained("roberta-base")
config = RobertaConfig.from_pretrained("FacebookAI/roberta-base")
config.save_pretrained("./")
```
@@ -1193,12 +1193,12 @@ All the widgets are open sourced in the `huggingface_hub` [repo](https://github.
**NLP**
* **Conversational:** To have the best conversations!. [Example](https://huggingface.co/microsoft/DialoGPT-large?).
* **Feature Extraction:** Retrieve the input embeddings. [Example](https://huggingface.co/sentence-transformers/distilbert-base-nli-mean-tokens?text=test).
* **Fill Mask:** Predict potential words for a mask token. [Example](https://huggingface.co/bert-base-uncased?).
* **Question Answering:** Given a context and a question, predict the answer. [Example](https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad).
* **Fill Mask:** Predict potential words for a mask token. [Example](https://huggingface.co/google-bert/bert-base-uncased?).
* **Question Answering:** Given a context and a question, predict the answer. [Example](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad).
* **Sentence Simmilarity:** Predict how similar a set of sentences are. Useful for Sentence Transformers.
* **Summarization:** Given a text, output a summary of it. [Example](https://huggingface.co/sshleifer/distilbart-cnn-12-6).
* **Table Question Answering:** Given a table and a question, predict the answer. [Example](https://huggingface.co/google/tapas-base-finetuned-wtq).
* **Text Generation:** Generate text based on a prompt. [Example](https://huggingface.co/gpt2)
* **Text Generation:** Generate text based on a prompt. [Example](https://huggingface.co/openai-community/gpt2)
* **Token Classification:** Useful for tasks such as Named Entity Recognition and Part of Speech. [Example](https://huggingface.co/dslim/bert-base-NER).
* **Zero-Shot Classification:** Too cool to explain with words. Here is an [example](https://huggingface.co/typeform/distilbert-base-uncased-mnli)
* ([WIP](https://github.com/huggingface/huggingface_hub/issues/99)) **Table to Text Generation**.

View File

@@ -31,7 +31,7 @@ without ever having to download the full dataset.
In the following, we demonstrate how to train a bi-directional transformer model
using masked language modeling objective as introduced in [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
More specifically, we demonstrate how JAX/Flax and dataset streaming can be leveraged
to pre-train [**`roberta-base`**](https://huggingface.co/roberta-base)
to pre-train [**`FacebookAI/roberta-base`**](https://huggingface.co/FacebookAI/roberta-base)
in English on a single TPUv3-8 pod for 10000 update steps.
The example script uses the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.
@@ -80,8 +80,8 @@ from transformers import RobertaTokenizerFast, RobertaConfig
model_dir = "./english-roberta-base-dummy"
tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
config = RobertaConfig.from_pretrained("roberta-base")
tokenizer = RobertaTokenizerFast.from_pretrained("FacebookAI/roberta-base")
config = RobertaConfig.from_pretrained("FacebookAI/roberta-base")
tokenizer.save_pretrained(model_dir)
config.save_pretrained(model_dir)

View File

@@ -32,7 +32,7 @@ Models written in JAX/Flax are **immutable** and updated in a purely functional
way which enables simple and efficient model parallelism.
In this example we will use the vision model from [CLIP](https://huggingface.co/models?filter=clip)
as the image encoder and [`roberta-base`](https://huggingface.co/roberta-base) as the text encoder.
as the image encoder and [`FacebookAI/roberta-base`](https://huggingface.co/FacebookAI/roberta-base) as the text encoder.
Note that one can also use the [ViT](https://huggingface.co/models?filter=vit) model as image encoder and any other BERT or ROBERTa model as text encoder.
To train the model on languages other than English one should choose a text encoder trained on the desired
language and a image-text dataset in that language. One such dataset is [WIT](https://github.com/google-research-datasets/wit).
@@ -76,7 +76,7 @@ Here is an example of how to load the model using pre-trained text and vision mo
```python
from modeling_hybrid_clip import FlaxHybridCLIP
model = FlaxHybridCLIP.from_text_vision_pretrained("bert-base-uncased", "openai/clip-vit-base-patch32")
model = FlaxHybridCLIP.from_text_vision_pretrained("google-bert/bert-base-uncased", "openai/clip-vit-base-patch32")
# save the model
model.save_pretrained("bert-clip")
@@ -89,7 +89,7 @@ If the checkpoints are in PyTorch then one could pass `text_from_pt=True` and `v
PyTorch checkpoints convert them to flax and load the model.
```python
model = FlaxHybridCLIP.from_text_vision_pretrained("bert-base-uncased", "openai/clip-vit-base-patch32", text_from_pt=True, vision_from_pt=True)
model = FlaxHybridCLIP.from_text_vision_pretrained("google-bert/bert-base-uncased", "openai/clip-vit-base-patch32", text_from_pt=True, vision_from_pt=True)
```
This loads both the text and vision encoders using pre-trained weights, the projection layers are randomly
@@ -154,9 +154,9 @@ Next we can run the example script to train the model:
```bash
python run_hybrid_clip.py \
--output_dir ${MODEL_DIR} \
--text_model_name_or_path="roberta-base" \
--text_model_name_or_path="FacebookAI/roberta-base" \
--vision_model_name_or_path="openai/clip-vit-base-patch32" \
--tokenizer_name="roberta-base" \
--tokenizer_name="FacebookAI/roberta-base" \
--train_file="coco_dataset/train_dataset.json" \
--validation_file="coco_dataset/validation_dataset.json" \
--do_train --do_eval \

View File

@@ -314,8 +314,6 @@ class FlaxHybridCLIP(FlaxPreTrainedModel):
Information necessary to initiate the text model. Can be either:
- A string, the `model id` of a pretrained model hosted inside a model repo on huggingface.co.
Valid model ids can be located at the root-level, like ``bert-base-uncased``, or namespaced under
a user or organization name, like ``dbmdz/bert-base-german-cased``.
- A path to a `directory` containing model weights saved using
:func:`~transformers.FlaxPreTrainedModel.save_pretrained`, e.g., ``./my_model_directory/``.
- A path or url to a `PyTorch checkpoint folder` (e.g, ``./pt_model``). In
@@ -327,8 +325,6 @@ class FlaxHybridCLIP(FlaxPreTrainedModel):
Information necessary to initiate the vision model. Can be either:
- A string, the `model id` of a pretrained model hosted inside a model repo on huggingface.co.
Valid model ids can be located at the root-level, like ``bert-base-uncased``, or namespaced under
a user or organization name, like ``dbmdz/bert-base-german-cased``.
- A path to a `directory` containing model weights saved using
:func:`~transformers.FlaxPreTrainedModel.save_pretrained`, e.g., ``./my_model_directory/``.
- A path or url to a `PyTorch checkpoint folder` (e.g, ``./pt_model``). In
@@ -354,7 +350,7 @@ class FlaxHybridCLIP(FlaxPreTrainedModel):
>>> from transformers import FlaxHybridCLIP
>>> # initialize a model from pretrained BERT and CLIP models. Note that the projection layers will be randomly initialized.
>>> # If using CLIP's vision model the vision projection layer will be initialized using pre-trained weights
>>> model = FlaxHybridCLIP.from_text_vision_pretrained('bert-base-uncased', 'openai/clip-vit-base-patch32')
>>> model = FlaxHybridCLIP.from_text_vision_pretrained('google-bert/bert-base-uncased', 'openai/clip-vit-base-patch32')
>>> # saving model after fine-tuning
>>> model.save_pretrained("./bert-clip")
>>> # load fine-tuned model

View File

@@ -54,7 +54,7 @@ model.save_pretrained("gpt-neo-1.3B")
```bash
python run_clm_mp.py \
--model_name_or_path gpt-neo-1.3B \
--tokenizer_name gpt2 \
--tokenizer_name openai-community/gpt2 \
--dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --do_eval \
--block_size 1024 \