Expand tutorial for custom models (#15587)

* Expand tutorial for custom models * Style * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr> Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2022-02-09 17:44:28 -05:00
parent a86ee2261e
commit c722753afd
1 changed files with 253 additions and 82 deletions
--- a/docs/source/custom_models.mdx
+++ b/docs/source/custom_models.mdx
@@ -15,68 +15,219 @@ specific language governing permissions and limitations under the License.
 The 🤗 Transformers library is designed to be easily extensible. Every model is fully coded in a given subfolder
 of the repository with no abstraction, so you can easily copy a modeling file and tweak it to your needs.
-Once you are happy with those tweaks and trained a model you want to share with the community, there are simple steps
+If you are writing a brand new model, it might be easier to start from scratch. In this tutorial, we will show you
-to push on the Model Hub not only the weights of your model, but also the code it relies on, so that anyone in the
+how to write a custom model and its configuration so it can be used inside Transformers, and how you can share it
-community can use it, even if it's not present in the 🤗 Transformers library.
+with the community (with the code it relies on) so that anyone can use it, even if it's not present in the 🤗
 Transformers library.
-This also applies to configurations and tokenizers (support for feature extractors and processors is coming soon).
+We will illustrate all of this on a ResNet model, by wrapping the ResNet class of the
 [timm library](https://github.com/rwightman/pytorch-image-models/tree/master/timm) into a [`PreTrainedModel`].
 ## Writing a custom configuration
 Before we dive into the model, let's first write its configuration. The configuration of a model is an object that
 will contain all the necessary information to build the model. As we will see in the next section, the model can only
 take a `config` to be initialized, so we really need that object to be as complete as possible.
 In our example, we will take a couple of arguments of the ResNet class that we might want to tweak. Different
 configurations will then give us the different types of ResNets that are possible. We then just store those arguments,
 after checking the validity of a few of them.
 ```python
 from transformers import PretrainedConfig
 from typing import List
 class ResnetConfig(PretrainedConfig):
    model_type = "resnet"
    def __init__(
        self,
        block_type="bottleneck",
        layers: List[int] = [3, 4, 6, 3],
        num_classes: int = 1000,
        input_channels: int = 3,
        cardinality: int = 1,
        base_width: int = 64,
        stem_width: int = 64,
        stem_type: str = "",
        avg_down: bool = False,
        **kwargs,
    ):
        if block_type not in ["basic", "bottleneck"]:
            raise ValueError(f"`block` must be 'basic' or bottleneck', got {block}.")
        if stem_type not in ["", "deep", "deep-tiered"]:
            raise ValueError(f"`stem_type` must be '', 'deep' or 'deep-tiered', got {block}.")
        self.block_type = block_type
        self.layers = layers
        self.num_classes = num_classes
        self.input_channels = input_channels
        self.cardinality = cardinality
        self.base_width = base_width
        self.stem_width = stem_width
        self.stem_type = stem_type
        self.avg_down = avg_down
        super().__init__(**kwargs)
 ```
 The three important things to remember when writing you own configuration are the following:
 - you have to inherit from `PretrainedConfig`,
 - the `__init__` of your `PretrainedConfig` must accept any kwargs,
 - those `kwargs` need to be passed to the superclass `__init__`.
 The inheritance is to make sure you get all the functionality from the 🤗 Transformers library, while the two other
 constraints come from the fact a `PretrainedConfig` has more fields than the ones you are setting. When reloading a
 config with the `from_pretrained` method, those fields need to be accepted by your config and then sent to the
 superclass.
 Defining a `model_type` for your configuration (here `model_type="resnet"`) is not mandatory, unless you want to
 register your model with the auto classes (see last section).
 With this done, you can easily create and save your configuration like you would do with any other model config of the
 library. Here is how we can create a resnet50d config and save it:
 ```py
 resnet50d_config = ResnetConfig(block_type="bottleneck", stem_width=32, stem_type="deep", avg_down=True)
 resnet50d_config.save_pretrained("custom-resnet")
 ```
 This will save a file named `config.json` inside the folder `custom-resnet`. You can then reload your config with the
 `from_pretrained` method:
 ```py
 resnet50d_config = ResnetConfig.from_pretrained("custom-resnet")
 ```
 You can also use any other method of the [`PretrainedConfig`] class, like [`~PretrainedConfig.push_to_hub`] to
 directly upload your config to the Hub.
 ## Writing a custom model
 Now that we have our ResNet configuration, we can go on writing the model. We will actually write two: one that
 extracts the hidden features from a batch of images (like [`BertModel`]) and one that is suitable for image
 classification (like [`BertModelForSequenceClassification`]).
 As we mentioned before, we'll only write a loose wrapper of the model to keep it simple for this example. The only
 thing we need to do before writing this class is a map between the block types and actual block classes. Then the
 model is defined from the configuration by passing everything to the `ResNet` class:
 ```py
 from transformers import PreTrainedModel
 from timm.models.resnet import BasicBlock, Bottleneck, ResNet
 from .configuration_resnet import ResnetConfig
 BLOCK_MAPPING = {"basic": BasicBlock, "bottleneck": Bottleneck}
 class ResnetModel(PreTrainedModel):
    config_class = ResnetConfig
    def __init__(self, config):
        super().__init__(config)
        block_layer = BLOCK_MAPPING[config.block_type]
        self.model = ResNet(
            block_layer,
            config.layers,
            num_classes=config.num_classes,
            in_chans=config.input_channels,
            cardinality=config.cardinality,
            base_width=config.base_width,
            stem_width=config.stem_width,
            stem_type=config.stem_type,
            avg_down=config.avg_down,
        )
    def forward(self, tensor):
        return self.model.forward_features(tensor)
 ```
 For the model that will classify images, we just change the forward method:
 ```py
 class ResnetModelForImageClassification(PreTrainedModel):
    config_class = ResnetConfig
    def __init__(self, config):
        super().__init__(config)
        block_layer = BLOCK_MAPPING[config.block_type]
        self.model = ResNet(
            block_layer,
            config.layers,
            num_classes=config.num_classes,
            in_chans=config.input_channels,
            cardinality=config.cardinality,
            base_width=config.base_width,
            stem_width=config.stem_width,
            stem_type=config.stem_type,
            avg_down=config.avg_down,
        )
    def forward(self, tensor, labels=None):
        logits = self.model(tensor)
        if labels is not None:
            loss = torch.nn.cross_entropy(logits, labels)
            return {"loss": loss, "logits": logits}
        return {"logits": logits}
 ```
 In both cases, notice how we inherit from `PreTrainedModel` and call the superclass initialization with the `config`
 (a bit like when you write a regular `torch.nn.Module`). The line that sets the `config_class` is not mandatory, unless
 you want to register your model with the auto classes (see last section).
 <Tip>
 If your model is very similar to a model inside the library, you can re-use the same configuration as this model.
 </Tip>
 You can have your model return anything you want, but returning a dictionary like we did for
 `ResnetModelForImageClassification`, with the loss included when labels are passed, will make your model directly
 usable inside the [`Trainer`] class. Using another output format is fine as long as you are planning on using your own
 training loop or another library for training.
 Now that we have our model class, let's create one:
 ```py
 resnet50d = ResnetModelForImageClassification(resnet50d_config)
 ```
 Again, you can use any of the methods of [`PreTrainedModel`], like [`~PreTrainedModel.save_pretrained`] or
 [`~PreTrainedModel.push_to_hub`]. We will use the second in the next section, and see how to push the model weights
 with the code of our model. But first, let's load some pretrained weights inside our model.
 In your own use case, you will probably be training your custom model on your own data. To go fast for this tutorial,
 we will use the pretrained version of the resnet50d. Since our model is just a wrapper around it, it's going to be
 easy to transfer those weights:
 ```py
 import timm
 pretrained_model = timm.create_model("resnet50d", pretrained=True)
 resnet50d.model.load_state_dict(pretrained_model.state_dict())
 ```
 Now let's see how to make sure that when we do [`~PreTrainedModel.save_pretrained`] or [`~PreTrainedModel.push_to_hub`], the
 code of the model is saved.
 ## Sending the code to the Hub
 First, make sure your model is fully defined in a `.py` file. It can rely on relative imports to some other files as
-long as all the files are in the same directory (we don't support submodules for this feature yet). For instance,
+long as all the files are in the same directory (we don't support submodules for this feature yet). For our example,
-let's say you have a `modeling.py` file and a `configuration.py` file in a folder of the current working directory
+we'll define a `modeling_resnet.py` file and a `configuration_resnet.py` file in a folder of the current working
-named `awesome_model`, and that the modeling file defines an `AwesomeModel`, the configuration file a `AwesomeConfig`.
+directory named `resnet_model`. The configuration file contains the code for `ResnetConfig` and the modeling file
 contains the code of `ResnetModel` and `ResnetModelForImageClassification`.
 ```
 .
-└── awesome_model
+└── resnet_model
    ├── __init__.py
-    ├── configuration.py
+    ├── configuration_resnet.py
-    └── modeling.py
+    └── modeling_resnet.py
 ```
-The `__init__.py` can be empty, it's just there so that Python detects `awesome_model` can be use as a module.
+The `__init__.py` can be empty, it's just there so that Python detects `resnet_model` can be use as a module.
 Here is an example of what the configuration file could look like:
 ```py
 from transformers import PretrainedConfig
 class AwesomeConfig(PretrainedConfig):
    model_type = "awesome"
    def __init__(self, attribute=1, hidden_size=42, **kwargs):
        self.attribute = attribute
        self.hidden_size = hidden_size
        super().__init__(**kwargs)
 ```
 and the modeling file could have content like this:
 ```py
 import torch
 from transformers import PreTrainedModel
 from .configuration import AwesomeConfig
 class AwesomeModel(PreTrainedModel):
    config_class = AwesomeConfig
    base_model_prefix = "base"
    def __init__(self, config):
        super().__init__(config)
        self.linear = torch.nn.Linear(config.hidden_size, config.hidden_size)
    def forward(self, x):
        return self.linear(x)
 ```
 `AwesomeModel` should subclass [`PreTrainedModel`] and `AwesomeConfig` should subclass [`PretrainedConfig`]. The
 easiest way to achieve this is to copy the modeling and configuration files of the model closest to the one you're
 coding, and then tweaking them.
 <Tip warning={true}>
@@ -87,51 +238,44 @@ to import from the `transformers` package.
 Note that you can re-use (or subclass) an existing configuration/model.
-To share your model with the community, follow those steps: first import the custom objects.
+To share your model with the community, follow those steps: first import the ResNet model and config from the newly
 created files:
 ```py
-from awesome_model.configuration import AwesomeConfig
+from resnet_model.configuration_resnet import ResnetConfig
-from awesome_model.modeling import AwesomeModel
+from resnet_model.modeling_resnet import ResnetModel, ResnetModelForImageClassification
 ```
 Then you have to tell the library you want to copy the code files of those objects when using the `save_pretrained`
 method and properly register them with a given Auto class (especially for models), just run:
 ```py
-AwesomeConfig.register_for_auto_class()
+ResnetConfig.register_for_auto_class()
-AwesomeModel.register_for_auto_class("AutoModel")
+ResnetModel.register_for_auto_class("AutoModel")
 ResnetModelForImageClassification.register_for_auto_class("AutoModelForImageClassification")
 ```
 Note that there is no need to specify an auto class for the configuration (there is only one auto class for them,
-[`AutoConfig`]) but it's different for models. Your custom model could be suitable for sequence classification (in
+[`AutoConfig`]) but it's different for models. Your custom model could be suitable for many different tasks, so you
-which case you should do `AwesomeModel.register_for_auto_class("AutoModelForSequenceClassification")`) or any other
+have to specify which one of the auto classes is the correct one for your model.
 task, so you have to specify which one of the auto classes is the correct one for your model.
-Next, just create the config and models as you would any other Transformer models:
+Next, let's create the config and models as we did before:
 ```py
-config = AwesomeConfig()
+resnet50d_config = ResnetConfig(block_type="bottleneck", stem_width=32, stem_type="deep", avg_down=True)
-model = AwesomeModel(config)
+resnet50d = ResnetModelForImageClassification(resnet50d_config)
 pretrained_model = timm.create_model("resnet50d", pretrained=True)
 resnet50d.model.load_state_dict(pretrained_model.state_dict())
 ```
-then train your model. Alternatively, you could load a pretrained checkpoint you have already trained in your model.
+Now to send the model to the Hub, make sure you are logged in. Either run in your terminal:
 Once everything is ready, you just have to do:
 ```py
 model.save_pretrained("save_dir")
 ```
 which will not only save the model weights and the configuration in json format, but also copy the modeling and
 configuration `.py` files in this folder, so you can directly upload the result to the Hub.
 If you have already logged in to Hugging face with
 ```bash
 huggingface-cli login
 ```
-or in a notebook with
+or from a notebook:
 ```py
 from huggingface_hub import notebook_login
@@ -139,11 +283,15 @@ from huggingface_hub import notebook_login
 notebook_login()
 ```
-you can push your model and its code to the Hub with the following:
+You can then push to to your own namespace (or an organization you are a member of) like this:
 ```py
-model.push_to_hub("model-identifier")
+resnet50d.push_to_hub("custom-resnet50d")
-``` 
+```
 On top of the modeling weights and the configuration in json format, this also copied the modeling and
 configuration `.py` files in the folder `custom-resnet50d` and uploaded the result to the Hub. You can check the result
 in this [model repo](https://huggingface.co/sgugger/custom-resnet50d).
 See the [sharing tutorial](model_sharing) for more information on the push to Hub method.
@@ -154,18 +302,41 @@ the `from_pretrained` method. The only thing is that you have to add an extra ar
 online code and trust the author of that model, to avoid executing malicious code on your machine:
 ```py
-from transformers import AutoModel
+from transformers import AutoModelForImageClassification
-model = AutoModel.from_pretrained("model-checkpoint", trust_remote_code=True)
+model = AutoModelForImageClassification.from_pretrained("sgugger/custom-resnet50d", trust_remote_code=True)
 ```
 It is also strongly encouraged to pass a commit hash as a `revision` to make sure the author of the models did not
 update the code with some malicious new lines (unless you fully trust the authors of the models).
 ```py
-commit_hash = "b731e5fae6d80a4a775461251c4388886fb7a249"
+commit_hash = "ed94a7c6247d8aedce4647f00f20de6875b5b292"
-model = AutoModel.from_pretrained("model-checkpoint", trust_remote_code=True, revision=commit_hash)
+model = AutoModelForImageClassification.from_pretrained(
    "sgugger/custom-resnet50d", trust_remote_code=True, revision=commit_hash
 )
 ```
 Note that when browsing the commit history of the model repo on the Hub, there is a button to easily copy the commit
 hash of any commit.
 ## Registering a model with custom code to the auto classes
 If you are writing a library that extends 🤗 Transformers, you may want to extend the auto classes to include your own
 model. This is different from pushing the code to the Hub in the sense that users will need to import your library to
 get the custom models (contrarily to automatically downloading the model code from the Hub).
 As long as your config has a `model_type` attribute that is different from existing model types, and that your model
 classes have the right `config_class` attributes, you can just add them to the auto classes likes this:
 ```py
 from transformers import AutoConfig, AutoModel, AutoModelForImageClassification
 AutoConfig.register("resnet", ResnetConfig)
 AutoModel.register(ResnetConfig, ResnetModel)
 AutoModelForImageClassification.register(ResnetConfig, ResnetModelForImageClassification)
 ```
 Note that the first argument used when registering your custom config to [`AutoConfig`] needs to match the `model_type`
 of your custom config, and the first argument used when registering your custom models to any auto model class needs
 to match the `config_class` of those models.