[examples/flax] use Repository API for push_to_hub (#13672)

* use Repository for push_to_hub * update readme * update other flax scripts * update readme * update qa example * fix push_to_hub call * fix typo * fix more typos * update readme * use abosolute path to get repo name * fix glue script
2021-09-30 16:38:07 +05:30
parent b90096fe14
commit 7db2a79b38
15 changed files with 183 additions and 292 deletions
--- a/examples/flax/token-classification/README.md
+++ b/examples/flax/token-classification/README.md
@@ -22,31 +22,6 @@ It will either run on a datasets hosted on our hub or with your own text files f

 The following example fine-tunes BERT on CoNLL-2003:

-To begin with it is recommended to create a model repository to save the trained model and logs.
-Here we call the model `"bert-ner-conll2003-test"`, but you can change the model name as you like.
-
-You can do this either directly on [huggingface.co](https://huggingface.co/new) (assuming that
-you are logged in) or via the command line:
-
-```
-huggingface-cli repo create bert-ner-conll2003-test
-```
-
-Next we clone the model repository to add the tokenizer and model files.
-
-```
-git clone https://huggingface.co/<your-username>/bert-ner-conll2003-test
-```
-
-Great, we have set up our model repository. During training, we will automatically
-push the training logs and model weights to the repo.
-
-Next, let's add a symbolic link to the `run_flax_ner.py`.
-
-```bash
-export MODEL_DIR="./bert-ner-conll2003-test"
-ln -s ~/transformers/examples/flax/token-classification/run_flax_ner.py run_flax_ner.py
-```

 ```bash
 python run_flax_ner.py \
@@ -56,7 +31,7 @@ python run_flax_ner.py \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --per_device_train_batch_size 4 \
-  --output_dir ${MODEL_DIR} \
+  --output_dir ./bert-ner-conll2003 \
  --eval_steps 300 \
  --push_to_hub
 ```
--- a/examples/flax/token-classification/run_flax_ner.py
+++ b/examples/flax/token-classification/run_flax_ner.py
@@ -21,6 +21,7 @@ import sys
 import time
 from dataclasses import dataclass, field
 from itertools import chain
+from pathlib import Path
 from typing import Any, Callable, Dict, Optional, Tuple

 import datasets
@@ -37,6 +38,7 @@ from flax.jax_utils import replicate, unreplicate
 from flax.metrics import tensorboard
 from flax.training import train_state
 from flax.training.common_utils import get_metrics, onehot, shard
+from huggingface_hub import Repository
 from transformers import (
    AutoConfig,
    AutoTokenizer,
@@ -44,6 +46,7 @@ from transformers import (
    HfArgumentParser,
    TrainingArguments,
 )
+from transformers.file_utils import get_full_repo_name
 from transformers.utils import check_min_version
 from transformers.utils.versions import require_version

@@ -304,6 +307,16 @@ def main():
        datasets.utils.logging.set_verbosity_error()
        transformers.utils.logging.set_verbosity_error()

+    # Handle the repository creation
+    if training_args.push_to_hub:
+        if training_args.hub_model_id is None:
+            repo_name = get_full_repo_name(
+                Path(training_args.output_dir).absolute().name, token=training_args.hub_token
+            )
+        else:
+            repo_name = training_args.hub_model_id
+        repo = Repository(training_args.output_dir, clone_from=repo_name)
+
    # Get the datasets: you can either provide your own CSV/JSON/TXT training and evaluation files (see below)
    # or just provide the name of one of the public datasets for token classification task available on the hub at https://huggingface.co/datasets/
    # (the dataset will be downloaded automatically from the datasets Hub).
@@ -656,12 +669,10 @@ def main():
                # save checkpoint after each epoch and push checkpoint to the hub
                if jax.process_index() == 0:
                    params = jax.device_get(unreplicate(state.params))
-                    model.save_pretrained(
-                        training_args.output_dir,
-                        params=params,
-                        push_to_hub=training_args.push_to_hub,
-                        commit_message=f"Saving weights and logs of step {cur_step}",
-                    )
+                    model.save_pretrained(training_args.output_dir, params=params)
+                    tokenizer.save_pretrained(training_args.output_dir)
+                    if training_args.push_to_hub:
+                        repo.push_to_hub(commit_message=f"Saving weights and logs of step {cur_step}", blocking=False)
        epochs.desc = f"Epoch ... {epoch + 1}/{num_epochs}"