diff --git a/docs/source/en/add_new_model.md b/docs/source/en/add_new_model.md index 6f8e5499ba..419b1dced4 100644 --- a/docs/source/en/add_new_model.md +++ b/docs/source/en/add_new_model.md @@ -476,7 +476,7 @@ When both implementations produce the same output, verify the outputs are within torch.allclose(original_output, output, atol=1e-3) ``` -This is typically the most difficult part of the process. Congratulations if you've made it this far! +This is typically the most difficult part of the process. Congratulations if you've made it this far! And if you're stuck or struggling with this step, don't hesitate to ask for help on your pull request. @@ -541,6 +541,48 @@ input_ids = tokenizer(input_str).input_ids When both implementations have the same `input_ids`, add a tokenizer test file. This file is analogous to the modeling test files. The tokenizer test files should contain a couple of hardcoded integration tests. +## Implement image processor + +> [!TIP] +> Fast image processors use the [torchvision](https://pytorch.org/vision/stable/index.html) library and can perform image processing on the GPU, significantly improving processing speed. +> We recommend adding a fast image processor ([`BaseImageProcessorFast`]) in addition to the "slow" image processor ([`BaseImageProcessor`]) to provide users with the best performance. Feel free to tag [@yonigozlan](https://github.com/yonigozlan) for help adding a [`BaseImageProcessorFast`]. + +While this example doesn't include an image processor, you may need to implement one if your model requires image inputs. The image processor is responsible for converting images into a format suitable for your model. Before implementing a new one, check whether an existing image processor in the Transformers library can be reused, as many models share similar image processing techniques. Note that you can also use [modular](./modular_transformers) for image processors to reuse existing components. + +If you do need to implement a new image processor, refer to an existing image processor to understand the expected structure. Slow image processors ([`BaseImageProcessor`]) and fast image processors ([`BaseImageProcessorFast`]) are designed differently, so make sure you follow the correct structure based on the processor type you're implementing. + +Run the following command (only if you haven't already created the fast image processor with the `transformers-cli add-new-model-like` command) to generate the necessary imports and to create a prefilled template for the fast image processor. Modify the template to fit your model. + +```bash +transformers-cli add-fast-image-processor --model-name your_model_name +``` + +This command will generate the necessary imports and provide a pre-filled template for the fast image processor. You can then modify it to fit your model's needs. + +Add tests for the image processor in `tests/models/your_model_name/test_image_processing_your_model_name.py`. These tests should be similar to those for other image processors and should verify that the image processor correctly handles image inputs. If your image processor includes unique features or processing methods, ensure you add specific tests for those as well. + +## Implement processor + +If your model accepts multiple modalities, like text and images, you need to add a processor. The processor centralizes the preprocessing of different modalities before passing them to the model. + +The processor should call the appropriate modality-specific processors within its `__call__` function to handle each type of input correctly. Be sure to check existing processors in the library to understand their expected structure. Transformers uses the following convention in the `__call__` function signature. + +```python +def __call__( + self, + images: ImageInput = None, + text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None, + audio=None, + videos=None, + **kwargs: Unpack[YourModelProcessorKwargs], +) -> BatchFeature: + ... +``` + +`YourModelProcessorKwargs` is a `TypedDict` that includes all the typical processing arguments and any extra arguments a specific processor may require. + +Add tests for the processor in `tests/models/your_model_name/test_processor_your_model_name.py`. These tests should be similar to those for other processors and should verify that the processor correctly handles the different modalities. + ## Integration tests Now that you have a model and tokenizer, add end-to-end integration tests for the model and tokenizer to `tests/models/brand_new_llama/test_modeling_brand_new_llama.py`. @@ -620,4 +662,4 @@ There are four timelines for model additions depending on the model contributor - **Hub-first release**: Transformers [remote-code](./models#custom-models) feature allows Transformers-based projects to be shared directly on the Hub. This is a good option if you don't have the bandwidth to add a model directly to Transformers. - If a model ends up being very popular, then it's very likely that we'll integrate it in Transformers ourselves to enable better support (documentation, maintenance, optimization, etc.) for it. A Hub-first release is the most frictionless way to add a model. \ No newline at end of file + If a model ends up being very popular, then it's very likely that we'll integrate it in Transformers ourselves to enable better support (documentation, maintenance, optimization, etc.) for it. A Hub-first release is the most frictionless way to add a model. diff --git a/src/transformers/commands/add_fast_image_processor.py b/src/transformers/commands/add_fast_image_processor.py index 72b0f07865..a78fc2a7cf 100644 --- a/src/transformers/commands/add_fast_image_processor.py +++ b/src/transformers/commands/add_fast_image_processor.py @@ -414,11 +414,35 @@ def get_fast_image_processing_content_header(content: str) -> str: """ Get the header of the slow image processor file. """ - # get all lines before and including the line containing """Image processor - content_header = re.search(r"^(.*?\n)*?\"\"\"Image processor.*", content) + # get all the commented lines at the beginning of the file + content_header = re.search(r"^# coding=utf-8\n(#[^\n]*\n)*", content, re.MULTILINE) + if not content_header: + logger.warning("Couldn't find the content header in the slow image processor file. Using a default header.") + return ( + f"# coding=utf-8\n" + f"# Copyright {CURRENT_YEAR} The HuggingFace Team. All rights reserved.\n" + f"#\n" + f'# Licensed under the Apache License, Version 2.0 (the "License");\n' + f"# you may not use this file except in compliance with the License.\n" + f"# You may obtain a copy of the License at\n" + f"#\n" + f"# http://www.apache.org/licenses/LICENSE-2.0\n" + f"#\n" + f"# Unless required by applicable law or agreed to in writing, software\n" + f'# distributed under the License is distributed on an "AS IS" BASIS,\n' + f"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n" + f"# See the License for the specific language governing permissions and\n" + f"# limitations under the License.\n" + f"\n" + ) content_header = content_header.group(0) + # replace the year in the copyright content_header = re.sub(r"# Copyright (\d+)\s", f"# Copyright {CURRENT_YEAR} ", content_header) - content_header = content_header.replace("Image processor", "Fast Image processor") + # get the line starting with """Image processor in content if it exists + match = re.search(r'^"""Image processor.*$', content, re.MULTILINE) + if match: + content_header += match.group(0).replace("Image processor", "Fast Image processor") + return content_header diff --git a/src/transformers/commands/add_new_model_like.py b/src/transformers/commands/add_new_model_like.py index 31156aa884..badf6f0a40 100644 --- a/src/transformers/commands/add_new_model_like.py +++ b/src/transformers/commands/add_new_model_like.py @@ -29,6 +29,7 @@ from ..models import auto as auto_module from ..models.auto.configuration_auto import model_type_to_module_name from ..utils import is_flax_available, is_tf_available, is_torch_available, logging from . import BaseTransformersCLICommand +from .add_fast_image_processor import add_fast_image_processor logger = logging.get_logger(__name__) # pylint: disable=invalid-name @@ -66,6 +67,9 @@ class ModelPatterns: image_processor_class (`str`, *optional*): The image processor class associated with this model (leave to `None` for models that don't use an image processor). + image_processor_fast_class (`str`, *optional*): + The fast image processor class associated with this model (leave to `None` for models that don't use a fast + image processor). feature_extractor_class (`str`, *optional*): The feature extractor class associated with this model (leave to `None` for models that don't use a feature extractor). @@ -82,6 +86,7 @@ class ModelPatterns: config_class: Optional[str] = None tokenizer_class: Optional[str] = None image_processor_class: Optional[str] = None + image_processor_fast_class: Optional[str] = None feature_extractor_class: Optional[str] = None processor_class: Optional[str] = None @@ -107,6 +112,7 @@ ATTRIBUTE_TO_PLACEHOLDER = { "config_class": "[CONFIG_CLASS]", "tokenizer_class": "[TOKENIZER_CLASS]", "image_processor_class": "[IMAGE_PROCESSOR_CLASS]", + "image_processor_fast_class": "[IMAGE_PROCESSOR_FAST_CLASS]", "feature_extractor_class": "[FEATURE_EXTRACTOR_CLASS]", "processor_class": "[PROCESSOR_CLASS]", "checkpoint": "[CHECKPOINT]", @@ -339,7 +345,13 @@ def replace_model_patterns( # contains the camel-cased named, but will be treated before. attributes_to_check = ["config_class"] # Add relevant preprocessing classes - for attr in ["tokenizer_class", "image_processor_class", "feature_extractor_class", "processor_class"]: + for attr in [ + "tokenizer_class", + "image_processor_class", + "image_processor_fast_class", + "feature_extractor_class", + "processor_class", + ]: if getattr(old_model_patterns, attr) is not None and getattr(new_model_patterns, attr) is not None: attributes_to_check.append(attr) @@ -763,10 +775,10 @@ def retrieve_info_for_model(model_type, frameworks: Optional[List[str]] = None): tokenizer_class = None image_processor_classes = auto_module.image_processing_auto.IMAGE_PROCESSOR_MAPPING_NAMES.get(model_type, None) if isinstance(image_processor_classes, tuple): - image_processor_class = image_processor_classes[0] # we take the slow image processor class. + image_processor_class, image_processor_fast_class = image_processor_classes else: image_processor_class = image_processor_classes - + image_processor_fast_class = None feature_extractor_class = auto_module.feature_extraction_auto.FEATURE_EXTRACTOR_MAPPING_NAMES.get(model_type, None) processor_class = auto_module.processing_auto.PROCESSOR_MAPPING_NAMES.get(model_type, None) @@ -800,6 +812,7 @@ def retrieve_info_for_model(model_type, frameworks: Optional[List[str]] = None): config_class=config_class, tokenizer_class=tokenizer_class, image_processor_class=image_processor_class, + image_processor_fast_class=image_processor_fast_class, feature_extractor_class=feature_extractor_class, processor_class=processor_class, ) @@ -957,6 +970,7 @@ def add_model_to_main_init( processing_classes = [ old_model_patterns.tokenizer_class, old_model_patterns.image_processor_class, + old_model_patterns.image_processor_fast_class, old_model_patterns.feature_extractor_class, old_model_patterns.processor_class, ] @@ -1034,7 +1048,7 @@ AUTO_CLASSES_PATTERNS = { ' ("{model_type}", "{pretrained_archive_map}"),', ], "feature_extraction_auto.py": [' ("{model_type}", "{feature_extractor_class}"),'], - "image_processing_auto.py": [' ("{model_type}", "{image_processor_class}"),'], + "image_processing_auto.py": [' ("{model_type}", "{image_processor_classes}"),'], "modeling_auto.py": [' ("{model_type}", "{any_pt_class}"),'], "modeling_tf_auto.py": [' ("{model_type}", "{any_tf_class}"),'], "modeling_flax_auto.py": [' ("{model_type}", "{any_flax_class}"),'], @@ -1068,14 +1082,27 @@ def add_model_to_auto_classes( ) elif "{config_class}" in pattern: new_patterns.append(pattern.replace("{config_class}", old_model_patterns.config_class)) - elif "{image_processor_class}" in pattern: + elif "{image_processor_classes}" in pattern: if ( old_model_patterns.image_processor_class is not None and new_model_patterns.image_processor_class is not None ): - new_patterns.append( - pattern.replace("{image_processor_class}", old_model_patterns.image_processor_class) - ) + if ( + old_model_patterns.image_processor_fast_class is not None + and new_model_patterns.image_processor_fast_class is not None + ): + new_patterns.append( + pattern.replace( + '"{image_processor_classes}"', + f'("{old_model_patterns.image_processor_class}", "{old_model_patterns.image_processor_fast_class}")', + ) + ) + else: + new_patterns.append( + pattern.replace( + '"{image_processor_classes}"', f'("{old_model_patterns.image_processor_class}",)' + ) + ) elif "{feature_extractor_class}" in pattern: if ( old_model_patterns.feature_extractor_class is not None @@ -1101,7 +1128,6 @@ def add_model_to_auto_classes( new_model_line = new_model_line.replace( old_model_patterns.model_camel_cased, new_model_patterns.model_camel_cased ) - add_content_to_file(full_name, new_model_line, add_after=old_model_line) # Tokenizers require special handling @@ -1198,6 +1224,10 @@ def duplicate_doc_file( # We only add the image processor if necessary if old_model_patterns.image_processor_class != new_model_patterns.image_processor_class: new_blocks.append(new_block) + elif "ImageProcessorFast" in block_class: + # We only add the image processor if necessary + if old_model_patterns.image_processor_fast_class != new_model_patterns.image_processor_fast_class: + new_blocks.append(new_block) elif "FeatureExtractor" in block_class: # We only add the feature extractor if necessary if old_model_patterns.feature_extractor_class != new_model_patterns.feature_extractor_class: @@ -1281,6 +1311,7 @@ def create_new_model_like( add_copied_from: bool = True, frameworks: Optional[List[str]] = None, old_checkpoint: Optional[str] = None, + create_fast_image_processor: bool = False, ): """ Creates a new model module like a given model of the Transformers library. @@ -1295,6 +1326,8 @@ def create_new_model_like( old_checkpoint (`str`, *optional*): The name of the base checkpoint for the old model. Should be passed along when it can't be automatically recovered from the `model_type`. + create_fast_image_processor (`bool`, *optional*, defaults to `False`): + Whether or not to add a fast image processor to the new model, if the old model had only a slow one. """ # Retrieve all the old model info. model_info = retrieve_info_for_model(model_type, frameworks=frameworks) @@ -1309,7 +1342,13 @@ def create_new_model_like( ) keep_old_processing = True - for processing_attr in ["image_processor_class", "feature_extractor_class", "processor_class", "tokenizer_class"]: + for processing_attr in [ + "image_processor_class", + "image_processor_fast_class", + "feature_extractor_class", + "processor_class", + "tokenizer_class", + ]: if getattr(old_model_patterns, processing_attr) != getattr(new_model_patterns, processing_attr): keep_old_processing = False @@ -1416,7 +1455,11 @@ def create_new_model_like( duplicate_doc_file(doc_file, old_model_patterns, new_model_patterns, frameworks=frameworks) insert_model_in_doc_toc(old_model_patterns, new_model_patterns) - # 6. Warn the user for duplicate patterns + # 6. Add fast image processor if necessary + if create_fast_image_processor: + add_fast_image_processor(model_name=new_model_patterns.model_lower_cased) + + # 7. Warn the user for duplicate patterns if old_model_patterns.model_type == old_model_patterns.checkpoint: print( "The model you picked has the same name for the model type and the checkpoint name " @@ -1484,6 +1527,7 @@ class AddNewModelLikeCommand(BaseTransformersCLICommand): self.add_copied_from, self.frameworks, self.old_checkpoint, + self.create_fast_image_processor, ) = get_user_input() self.path_to_repo = path_to_repo @@ -1503,6 +1547,7 @@ class AddNewModelLikeCommand(BaseTransformersCLICommand): add_copied_from=self.add_copied_from, frameworks=self.frameworks, old_checkpoint=self.old_checkpoint, + create_fast_image_processor=self.create_fast_image_processor, ) @@ -1594,6 +1639,7 @@ def get_user_input(): old_model_info = retrieve_info_for_model(old_model_type) old_tokenizer_class = old_model_info["model_patterns"].tokenizer_class old_image_processor_class = old_model_info["model_patterns"].image_processor_class + old_image_processor_fast_class = old_model_info["model_patterns"].image_processor_fast_class old_feature_extractor_class = old_model_info["model_patterns"].feature_extractor_class old_processor_class = old_model_info["model_patterns"].processor_class old_frameworks = old_model_info["frameworks"] @@ -1634,7 +1680,13 @@ def get_user_input(): old_processing_classes = [ c if not isinstance(c, tuple) else c[0] - for c in [old_image_processor_class, old_feature_extractor_class, old_tokenizer_class, old_processor_class] + for c in [ + old_image_processor_class, + old_image_processor_fast_class, + old_feature_extractor_class, + old_tokenizer_class, + old_processor_class, + ] if c is not None ] old_processing_classes = ", ".join(old_processing_classes) @@ -1645,9 +1697,11 @@ def get_user_input(): ) if keep_processing: image_processor_class = old_image_processor_class + image_processor_fast_class = old_image_processor_fast_class feature_extractor_class = old_feature_extractor_class processor_class = old_processor_class tokenizer_class = old_tokenizer_class + create_fast_image_processor = False else: if old_tokenizer_class is not None: tokenizer_class = get_user_field( @@ -1663,6 +1717,13 @@ def get_user_input(): ) else: image_processor_class = None + if old_image_processor_fast_class is not None: + image_processor_fast_class = get_user_field( + "What will be the name of the fast image processor class for this model? ", + default_value=f"{model_camel_cased}ImageProcessorFast", + ) + else: + image_processor_fast_class = None if old_feature_extractor_class is not None: feature_extractor_class = get_user_field( "What will be the name of the feature extractor class for this model? ", @@ -1677,6 +1738,16 @@ def get_user_input(): ) else: processor_class = None + if old_image_processor_class is not None and old_image_processor_fast_class is None: + create_fast_image_processor = get_user_field( + "A fast image processor can be created from the slow one, but modifications might be needed. " + "Should we add a fast image processor class for this model (recommended, yes/no)? ", + convert_to=convert_to_bool, + default_value="yes", + fallback_message="Please answer yes/no, y/n, true/false or 1/0.", + ) + else: + create_fast_image_processor = False model_patterns = ModelPatterns( model_name, @@ -1688,6 +1759,7 @@ def get_user_input(): config_class=config_class, tokenizer_class=tokenizer_class, image_processor_class=image_processor_class, + image_processor_fast_class=image_processor_fast_class, feature_extractor_class=feature_extractor_class, processor_class=processor_class, ) @@ -1706,6 +1778,7 @@ def get_user_input(): default_value="yes", fallback_message="Please answer yes/no, y/n, true/false or 1/0.", ) + if all_frameworks: frameworks = None else: @@ -1715,4 +1788,4 @@ def get_user_input(): ) frameworks = list(set(frameworks.split(" "))) - return (old_model_type, model_patterns, add_copied_from, frameworks, old_checkpoint) + return (old_model_type, model_patterns, add_copied_from, frameworks, old_checkpoint, create_fast_image_processor)