SmolVLM2 (#36126)
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled

* smolvlm init

* updates

* fixing bugs

* minimal run, no checks

* minimal run, no checks

* passing first check + adding url support

* updating video dataloading logic

* fixing image logic

* trying modular, but fails

* modular is working, changing processor to match PR comments and general transformers logic

* fixing kwargs

* offloading video loading logic to  image_util

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* update

* add idefics3-based tests

* add keyword to all

* add PreTrainedModel

* updateing video loading logic

* working inference

* updates for PR comments

* updates for PR comments

* moving SmolVLMPretrainedModel higher to fix import error

* CI test pass

* CI test pass

* removing lambda

* CI test pass

* CI test pass

* CI test pass

* CI test pass

* CI test pass

* CI test pass

* processor tests

* add example in docs

* typo

* fix copies

* skip compile tests - sdpa for VisionTransformer

* fix init

* raise import error for num2words

* update doc for FA2

* more doc fix

* CI

* updates for PR comments

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Joshua Lochner <admin@xenova.com>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* fixing processor -- tokenizer not defined properly, (gpt2 tokenizer), and does not have the attributes of fake image token, etc

* adding smolvlm to VQA models

* removing vqa auto class

* Update src/transformers/models/smolvlm/processing_smolvlm.py

Co-authored-by: Joshua Lochner <admin@xenova.com>

* removing smolvlmvisiontransformer from index.md

* my bad, video processing had typos

* fixing docs

* renaming params in SmolVLMModel.inputs_merger

* removing un-needed dtype/device in model forward

* ruff for CI

* update docs

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* return cache position

* return cache position

* return cache also in modular

* needed to run modular again

* fix training tests

* push vectorized inputs merger

* format

* format

* reduce number of mappings

* addressing PR comments

* happy CI, happy me :)

* skip non-nested images

* adjust integration test for smaller GPUs

* format

* fix kwargs in chat template apply

* skip this for now

---------

Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Pablo <pablo.montalvo.leroux@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Joshua Lochner <admin@xenova.com>
This commit is contained in:
Orr Zohar
2025-02-20 06:00:26 -08:00
committed by GitHub
parent f2ab182dca
commit 4397dfcb71
29 changed files with 5108 additions and 4 deletions

View File

@@ -95,6 +95,7 @@ FlashAttention-2 is currently supported for the following architectures:
* [PhiMoE](https://huggingface.co/docs/transformers/model_doc/phimoe#transformers.PhimoeModel)
* [StableLm](https://huggingface.co/docs/transformers/model_doc/stablelm#transformers.StableLmModel)
* [Starcoder2](https://huggingface.co/docs/transformers/model_doc/starcoder2#transformers.Starcoder2Model)
* [SmolVLM](https://huggingface.co/docs/transformers/model_doc/smolvlm#transformers.SmolVLMModel)
* [Qwen2](https://huggingface.co/docs/transformers/model_doc/qwen2#transformers.Qwen2Model)
* [Qwen2Audio](https://huggingface.co/docs/transformers/model_doc/qwen2_audio#transformers.Qwen2AudioEncoder)
* [Qwen2MoE](https://huggingface.co/docs/transformers/model_doc/qwen2_moe#transformers.Qwen2MoeModel)
@@ -301,6 +302,7 @@ For now, Transformers supports SDPA inference and training for the following arc
* [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral#transformers.MixtralModel)
* [StableLm](https://huggingface.co/docs/transformers/model_doc/stablelm#transformers.StableLmModel)
* [Starcoder2](https://huggingface.co/docs/transformers/model_doc/starcoder2#transformers.Starcoder2Model)
* [SmolVLM](https://huggingface.co/docs/transformers/model_doc/smolvlm#transformers.SmolVLMModel)
* [Qwen2](https://huggingface.co/docs/transformers/model_doc/qwen2#transformers.Qwen2Model)
* [Qwen2Audio](https://huggingface.co/docs/transformers/model_doc/qwen2_audio#transformers.Qwen2AudioEncoder)
* [Qwen2MoE](https://huggingface.co/docs/transformers/model_doc/qwen2_moe#transformers.Qwen2MoeModel)