Trainer push to hub (#11328)

* Initial support for upload to hub

* push -> upload

* Fixes + examples

* Fix torchhub test

* Torchhub test I hate you

* push_model_to_hub -> push_to_hub

* Apply mixin to other pretrained models

* Remove ABC inheritance

* Add tests

* Typo

* Run tests

* Install git-lfs

* Change approach

* Add push_to_hub to all

* Staging test suite

* Typo

* Maybe like this?

* More deps

* Cache

* Adapt name

* Quality

* MOAR tests

* Put it in testing_utils

* Docs + torchhub last hope

* Styling

* Wrong method

* Typos

* Update src/transformers/file_utils.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Address review comments

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
This commit is contained in:
Sylvain Gugger
2021-04-23 09:17:37 -04:00
committed by GitHub
parent 7bc86bea68
commit bf2e0cf70b
31 changed files with 766 additions and 31 deletions

View File

@@ -73,3 +73,10 @@ Generation
.. autoclass:: transformers.generation_tf_utils.TFGenerationMixin
:members:
Pushing to the Hub
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.file_utils.PushToHubMixin
:members:

View File

@@ -22,8 +22,6 @@ the `model hub <https://huggingface.co/models>`__.
Optionally, you can join an existing organization or create a new one.
Prepare your model for uploading
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We have seen in the :doc:`training tutorial <training>`: how to fine-tune a model on a given task. You have probably
done something similar on your task, either using the model directly in your own training loop or using the
@@ -31,7 +29,7 @@ done something similar on your task, either using the model directly in your own
`model hub <https://huggingface.co/models>`__.
Model versioning
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Since version v3.5.0, the model hub has built-in model versioning based on git and git-lfs. It is based on the paradigm
that one model *is* one repo.
@@ -54,6 +52,106 @@ For instance:
>>> revision="v2.0.1" # tag name, or branch name, or commit hash
>>> )
Push your model from Python
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Preparation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The first step is to make sure your credentials to the hub are stored somewhere. This can be done in two ways. If you
have access to a terminal, you cam just run the following command in the virtual environment where you installed 🤗
Transformers:
.. code-block:: bash
transformers-cli login
It will store your access token in the Hugging Face cache folder (by default :obj:`~/.cache/`).
If you don't have an easy access to a terminal (for instance in a Colab session), you can find a token linked to your
acount by going on `huggingface.co <https://huggingface.co/>`, click on your avatar on the top left corner, then on
`Edit profile` on the left, just beneath your profile picture. In the submenu `API Tokens`, you will find your API
token that you can just copy.
Directly push your model to the hub
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once you have an API token (either stored in the cache or copied and pasted in your notebook), you can directly push a
finetuned model you saved in :obj:`save_drectory` by calling:
.. code-block:: python
finetuned_model.push_to_hub("my-awesome-model")
If you have your API token not stored in the cache, you will need to pass it with :obj:`use_auth_token=your_token`.
This is also be the case for all the examples below, so we won't mention it again.
This will create a repository in your namespace name :obj:`my-awesome-model`, so anyone can now run:
.. code-block:: python
from transformers import AutoModel
model = AutoModel.from_pretrained("your_username/my-awesome-model")
Even better, you can combine this push to the hub with the call to :obj:`save_pretrained`:
.. code-block:: python
finetuned_model.save_pretrained(save_directory, push_to_hub=True, repo_name="my-awesome-model")
If you are a premium user and want your model to be private, just add :obj:`private=True` to this call.
If you are a member of an organization and want to push it inside the namespace of the organization instead of yours,
just add :obj:`organization=my_amazing_org`.
Add new files to your model repo
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once you have pushed your model to the hub, you might want to add the tokenizer, or a version of your model for another
framework (TensorFlow, PyTorch, Flax). This is super easy to do! Let's begin with the tokenizer. You can add it to the
repo you created before like this
.. code-block:: python
tokenizer.push_to_hub("my-awesome-model")
If you know its URL (it should be :obj:`https://huggingface.co/username/repo_name`), you can also do:
.. code-block:: python
tokenizer.push_to_hub(repo_url=my_repo_url)
And that's all there is to it! It's also a very easy way to fix a mistake if one of the files online had a bug.
To add a model for another backend, it's also super easy. Let's say you have fine-tuned a TensorFlow model and want to
add the pytorch model files to your model repo, so that anyone in the community can use it. The following allows you to
directly create a PyTorch version of your TensorFlow model:
.. code-block:: python
from transfomers import AutoModel
model = AutoModel.from_pretrained(save_directory, from_tf=True)
You can also replace :obj:`save_directory` by the identifier of your model (:obj:`username/repo_name`) if you don't
have a local save of it anymore. Then, just do the same as before:
.. code-block:: python
model.push_to_hub("my-awesome-model")
or
.. code-block:: python
model.push_to_hub(repo_url=my_repo_url)
Use your terminal and git
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Basic steps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^