Modular transformers: modularity and inheritance for new model additions (#33248)
* update exampel * update * push the converted diff files for testing and ci * correct one example * fix class attributes and docstring * nits * oups * fixed config! * update * nitd * class attributes are not matched against the other, this is missing * fixed overwriting self.xxx now onto the attributes I think * partial fix, now order with docstring * fix docstring order? * more fixes * update * fix missing docstrings! * examples don't all work yet * fixup * nit * updated * hick * update * delete * update * update * update * fix * all default * no local import * fix more diff * some fix related to "safe imports" * push fixed * add helper! * style * add a check * all by default * add the * update * FINALLY! * nit * fix config dependencies * man that is it * fix fix * update diffs * fix the last issue * re-default to all * alll the fixes * nice * fix properties vs setter * fixup * updates * update dependencies * make sure to install what needs to be installed * fixup * quick fix for now * fix! * fixup * update * update * updates * whitespaces * nit * fix * simplify everything, and make it file agnostic (should work for image processors) * style * finish fixing all import issues * fixup * empty modeling should not be written! * Add logic to find who depends on what * update * cleanup * update * update gemma to support positions * some small nits * this is the correct docstring for gemma2 * fix merging of docstrings * update * fixup * update * take doc into account * styling * update * fix hidden activation * more fixes * final fixes! * fixup * fixup instruct blip video * update * fix bugs * align gemma2 with the rest as well * updats * revert * update * more reversiom * grind * more * arf * update * order will matter * finish del stuff * update * rename to modular * fixup * nits * update makefile * fixup * update order of the checks! * fix * fix docstring that has a call inside * fiix conversion check * style * add some initial documentation * update * update doc * some fixup * updates * yups * Mostly todo gimme a minut * update * fixup * revert some stuff * Review docs for the modular transformers (#33472) Docs * good update * fixup * mmm current updates lead to this code * okay, this fixes it * cool * fixes * update * nit * updates * nits * fix doc * update * revert bad changes * update * updates * proper update * update * update? * up * update * cool * nits * nits * bon bon * fix * ? * minimise changes * update * update * update * updates? * fixed gemma2 * kind of a hack * nits * update * remove `diffs` in favor of `modular` * fix make fix copies --------- Co-authored-by: Lysandre Debut <hi@lysand.re>
This commit is contained in:
@@ -5,6 +5,8 @@
|
||||
title: Quick tour
|
||||
- local: installation
|
||||
title: Installation
|
||||
- local: add_new_model
|
||||
title: Adding a new model to `transformers`
|
||||
title: Get started
|
||||
- sections:
|
||||
- local: pipeline_tutorial
|
||||
@@ -149,6 +151,8 @@
|
||||
title: Interoperability with GGUF files
|
||||
- local: tiktoken
|
||||
title: Interoperability with TikToken files
|
||||
- local: modular_transformers
|
||||
title: Modularity in `transformers`
|
||||
title: Developer guides
|
||||
- sections:
|
||||
- local: quantization/overview
|
||||
|
||||
121
docs/source/en/modular_transformers.md
Normal file
121
docs/source/en/modular_transformers.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# Modular transformers
|
||||
|
||||
`transformers` is an opinionated framework; our philosophy is defined in the following [conceptual guide](./philosophy).
|
||||
|
||||
The core of that philosophy is exemplified by the [single model, single file](https://huggingface.co/blog/transformers-design-philosophy)
|
||||
aspect of the library. This component's downside is that it limits the inheritance and importability of components from
|
||||
files to others in the toolkit.
|
||||
|
||||
As a result, model components tend to be repeated across many files. There are as many attention layers defined
|
||||
in `transformers` as there are models, and a significant number of those are identical to each other.
|
||||
The unfortunate consequence is that independent implementations tend to diverge as fixes and changes get applied
|
||||
to specific parts of the code.
|
||||
|
||||
In order to balance this issue, we introduced the concept of "copies" across the library. By adding a comment indicating
|
||||
that code is a copy of another, we can enforce through CI and local commands that copies do not diverge. However,
|
||||
while the complexity is low, this is often quite tedious to do.
|
||||
|
||||
And, finally, this contributes to adding a significant overhead to contributing models which we would like to remove.
|
||||
This approach often requires model contributions to add modeling code (~1k lines), processor (~500 lines), tests, docs,
|
||||
etc. Model contribution PRs rarely add less than 3-5k lines of code, with much of this code being boilerplate.
|
||||
|
||||
This raises the bar for contributions, and with Modular Transformers, we're aiming to lower the bar to a much more
|
||||
acceptable point.
|
||||
|
||||
## What is it?
|
||||
|
||||
Modular Transformers introduces the concept of a "modular" file to a model folder. This modular file accepts code
|
||||
that isn't typically accepted in modeling/processing files, as it allows importing from neighbouring models as well
|
||||
as inheritance from classes to others.
|
||||
|
||||
This modular file defines models, processors, and the configuration class that would otherwise be defined in their
|
||||
respective modules.
|
||||
|
||||
Finally, this feature introduces a new `linter` which will "unravel" the modular file into the "single model, single
|
||||
file" directory structure. These files will get auto-generated every time the script is run; reducing the required
|
||||
contributions to the modular file, and therefore only to the changes between the contributed model and others.
|
||||
|
||||
Model users will end up importing and using the single-file interface, so no change is expected here. Doing this, we
|
||||
hope to combine the best of both worlds: enabling simple contributions while sticking to our philosophy.
|
||||
|
||||
This is therefore a replacement for the `# Copied from` markers, and previously contributed models can be expected to
|
||||
be moved to the new Modular Transformers format in the coming months.
|
||||
|
||||
### Details
|
||||
|
||||
The "linter", which unravels the inheritance and creates all single-files from the modular file, will flatten the
|
||||
inheritance while trying to be invisible to Python users. At this time, the linter flattens a **single** level of
|
||||
inheritance.
|
||||
|
||||
For example:
|
||||
- If a configuration class inherits from another and adds/deletes an argument, the generated file will either directly
|
||||
reference it (in case of addition) or completely remove it (in case of deletion).
|
||||
- If a class inherits from another, for example: class GemmaModel(LlamaModel):, dependencies are automatically
|
||||
inferred. All submodules will be automatically inferred from the superclass.
|
||||
|
||||
You should be able to write everything (the tokenizer, the image processor, the model, the config) in this `modular`
|
||||
file, and the corresponding files will be created for you.
|
||||
|
||||
### Enforcement
|
||||
|
||||
[TODO] We are introducing a new test, that makes sure the generated content matches what is present in the `modular_xxxx.py`
|
||||
|
||||
### Examples
|
||||
|
||||
Here is a quick example with BERT and RoBERTa. The two models are intimately related: their modeling implementation
|
||||
differs solely by a change in the embedding layer.
|
||||
|
||||
Instead of redefining the model entirely, here is what the `modular_roberta.py` file looks like for the modeling &
|
||||
configuration classes (for the sake of the example, the tokenizer is ignored at this time as very different).
|
||||
|
||||
```python
|
||||
from torch import nn
|
||||
from ..bert.configuration_bert import BertConfig
|
||||
from ..bert.modeling_bert import (
|
||||
BertModel,
|
||||
BertEmbeddings,
|
||||
BertForMaskedLM
|
||||
)
|
||||
|
||||
# The RoBERTa config is identical to BERT's config
|
||||
class RobertaConfig(BertConfig):
|
||||
model_type = 'roberta'
|
||||
|
||||
# We redefine the embeddings here to highlight the padding ID difference, and we redefine the position embeddings
|
||||
class RobertaEmbeddings(BertEmbeddings):
|
||||
def __init__(self, config):
|
||||
super().__init__(config())
|
||||
|
||||
self.padding_idx = config.pad_token_id
|
||||
self.position_embeddings = nn.Embedding(
|
||||
config.max_position_embeddings, config.hidden_size, padding_idx=self.padding_idx
|
||||
)
|
||||
|
||||
# The RoBERTa model is identical to the BERT model, except for the embedding layer.
|
||||
# We redefine the embeddings above, so here there is no need to do additional work
|
||||
class RobertaModel(BertModel):
|
||||
def __init__(self, config):
|
||||
super().__init__(config)
|
||||
self.embeddings = RobertaEmbeddings(config)
|
||||
|
||||
|
||||
# The heads now only need to redefine the model inside to the correct `RobertaModel`
|
||||
class RobertaForMaskedLM(BertForMaskedLM):
|
||||
def __init__(self, config):
|
||||
super().__init__(config)
|
||||
self.model = RobertaModel(config)
|
||||
```
|
||||
|
||||
Note that if you do not use the dependency that you defined, you will have the following error:
|
||||
|
||||
```bash
|
||||
ValueError: You defined `RobertaEmbeddings` in the modular_roberta.py, it should be used
|
||||
when you define `BertModel`, as it is one of it's direct dependencies. Make sure
|
||||
you use it in the `__init__` function.
|
||||
```
|
||||
|
||||
Additionally, you may find a list of examples here:
|
||||
|
||||
## What it is not
|
||||
|
||||
It is not a replacement for the modeling code (yet?), and if your model is not based on anything else that ever existed, then you can add a `modeling` file as usual.
|
||||
Reference in New Issue
Block a user