SmolVLM2 (#36126)
* smolvlm init * updates * fixing bugs * minimal run, no checks * minimal run, no checks * passing first check + adding url support * updating video dataloading logic * fixing image logic * trying modular, but fails * modular is working, changing processor to match PR comments and general transformers logic * fixing kwargs * offloading video loading logic to image_util * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * update * add idefics3-based tests * add keyword to all * add PreTrainedModel * updateing video loading logic * working inference * updates for PR comments * updates for PR comments * moving SmolVLMPretrainedModel higher to fix import error * CI test pass * CI test pass * removing lambda * CI test pass * CI test pass * CI test pass * CI test pass * CI test pass * CI test pass * processor tests * add example in docs * typo * fix copies * skip compile tests - sdpa for VisionTransformer * fix init * raise import error for num2words * update doc for FA2 * more doc fix * CI * updates for PR comments * Update docs/source/en/model_doc/smolvlm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/smolvlm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/smolvlm.md Co-authored-by: Joshua Lochner <admin@xenova.com> * Update docs/source/en/model_doc/smolvlm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/smolvlm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * fixing processor -- tokenizer not defined properly, (gpt2 tokenizer), and does not have the attributes of fake image token, etc * adding smolvlm to VQA models * removing vqa auto class * Update src/transformers/models/smolvlm/processing_smolvlm.py Co-authored-by: Joshua Lochner <admin@xenova.com> * removing smolvlmvisiontransformer from index.md * my bad, video processing had typos * fixing docs * renaming params in SmolVLMModel.inputs_merger * removing un-needed dtype/device in model forward * ruff for CI * update docs * Update docs/source/en/model_doc/smolvlm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * return cache position * return cache position * return cache also in modular * needed to run modular again * fix training tests * push vectorized inputs merger * format * format * reduce number of mappings * addressing PR comments * happy CI, happy me :) * skip non-nested images * adjust integration test for smaller GPUs * format * fix kwargs in chat template apply * skip this for now --------- Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Pablo <pablo.montalvo.leroux@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Joshua Lochner <admin@xenova.com>
This commit is contained in:
@@ -317,6 +317,7 @@ Flax), PyTorch, and/or TensorFlow.
|
||||
| [SEW](model_doc/sew) | ✅ | ❌ | ❌ |
|
||||
| [SEW-D](model_doc/sew-d) | ✅ | ❌ | ❌ |
|
||||
| [SigLIP](model_doc/siglip) | ✅ | ❌ | ❌ |
|
||||
| [SmolVLM](model_doc/smolvlm) | ✅ | ❌ | ❌ |
|
||||
| [Speech Encoder decoder](model_doc/speech-encoder-decoder) | ✅ | ❌ | ✅ |
|
||||
| [Speech2Text](model_doc/speech_to_text) | ✅ | ✅ | ❌ |
|
||||
| [SpeechT5](model_doc/speecht5) | ✅ | ❌ | ❌ |
|
||||
|
||||
Reference in New Issue
Block a user