🔴 Video processors as a separate class (#35206)
* initial design * update all video processors * add tests * need to add qwen2-vl (not tested yet) * add qwen2-vl in auto map * fix copies * isort * resolve confilicts kinda * nit: * qwen2-vl is happy now * qwen2-5 happy * other models are happy * fix copies * fix tests * add docs * CI green now? * add more tests * even more changes + tests * doc builder fail * nit * Update src/transformers/models/auto/processing_auto.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * small update * imports correctly * dump, otherwise this is getting unmanagebale T-T * dump * update * another update * update * tests * move * modular * docs * test * another update * init * remove flakiness in tests * fixup * clean up and remove commented lines * docs * skip this one! * last fix after rebasing * run fixup * delete slow files * remove unnecessary tests + clean up a bit * small fixes * fix tests * more updates * docs * fix tests * update * style * fix qwen2-5-vl * fixup * fixup * unflatten batch when preparing * dump, come back soon * add docs and fix some tests * how to guard this with new dummies? * chat templates in qwen * address some comments * remove `Fast` suffix * fixup * oops should be imported from transforms * typo in requires dummies * new model added with video support * fixup once more * last fixup I hope * revert image processor name + comments * oh, this is why fetch test is failing * fix tests * fix more tests * fixup * add new models: internvl, smolvlm * update docs * imprt once * fix failing tests * do we need to guard it here again, why? * new model was added, update it * remove testcase from tester * fix tests * make style * not related CI fail, lets' just fix here * mark flaky for now, filas 15 out of 100 * style * maybe we can do this way? * don't download images in setup class --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
This commit is contained in:
committed by
GitHub
parent
716819b830
commit
a31fa218ad
@@ -74,6 +74,10 @@ Likewise, if your `NewModel` is a subclass of [`PreTrainedModel`], make sure its
|
||||
|
||||
[[autodoc]] AutoImageProcessor
|
||||
|
||||
## AutoVideoProcessor
|
||||
|
||||
[[autodoc]] AutoVideoProcessor
|
||||
|
||||
## AutoProcessor
|
||||
|
||||
[[autodoc]] AutoProcessor
|
||||
|
||||
@@ -58,6 +58,12 @@ The attributes can be obtained from model config, as `model.config.num_query_tok
|
||||
|
||||
[[autodoc]] InstructBlipVideoProcessor
|
||||
|
||||
|
||||
## InstructBlipVideoVideoProcessor
|
||||
|
||||
[[autodoc]] InstructBlipVideoVideoProcessor
|
||||
- preprocess
|
||||
|
||||
## InstructBlipVideoImageProcessor
|
||||
|
||||
[[autodoc]] InstructBlipVideoImageProcessor
|
||||
|
||||
@@ -353,3 +353,7 @@ This example showcases how to handle a batch of chat conversations with interlea
|
||||
## InternVLProcessor
|
||||
|
||||
[[autodoc]] InternVLProcessor
|
||||
|
||||
## InternVLVideoProcessor
|
||||
|
||||
[[autodoc]] InternVLVideoProcessor
|
||||
|
||||
@@ -262,6 +262,10 @@ model = LlavaNextVideoForConditionalGeneration.from_pretrained(
|
||||
|
||||
[[autodoc]] LlavaNextVideoImageProcessor
|
||||
|
||||
## LlavaNextVideoVideoProcessor
|
||||
|
||||
[[autodoc]] LlavaNextVideoVideoProcessor
|
||||
|
||||
## LlavaNextVideoModel
|
||||
|
||||
[[autodoc]] LlavaNextVideoModel
|
||||
|
||||
@@ -303,6 +303,7 @@ model = LlavaOnevisionForConditionalGeneration.from_pretrained(
|
||||
## LlavaOnevisionImageProcessor
|
||||
|
||||
[[autodoc]] LlavaOnevisionImageProcessor
|
||||
- preprocess
|
||||
|
||||
## LlavaOnevisionImageProcessorFast
|
||||
|
||||
@@ -313,6 +314,10 @@ model = LlavaOnevisionForConditionalGeneration.from_pretrained(
|
||||
|
||||
[[autodoc]] LlavaOnevisionVideoProcessor
|
||||
|
||||
## LlavaOnevisionVideoProcessor
|
||||
|
||||
[[autodoc]] LlavaOnevisionVideoProcessor
|
||||
|
||||
## LlavaOnevisionModel
|
||||
|
||||
[[autodoc]] LlavaOnevisionModel
|
||||
|
||||
@@ -287,6 +287,11 @@ model = Qwen2VLForConditionalGeneration.from_pretrained(
|
||||
[[autodoc]] Qwen2VLImageProcessor
|
||||
- preprocess
|
||||
|
||||
## Qwen2VLVideoProcessor
|
||||
|
||||
[[autodoc]] Qwen2VLVideoProcessor
|
||||
- preprocess
|
||||
|
||||
## Qwen2VLImageProcessorFast
|
||||
|
||||
[[autodoc]] Qwen2VLImageProcessorFast
|
||||
|
||||
@@ -197,6 +197,9 @@ print(generated_texts[0])
|
||||
[[autodoc]] SmolVLMImageProcessor
|
||||
- preprocess
|
||||
|
||||
## SmolVLMVideoProcessor
|
||||
[[autodoc]] SmolVLMVideoProcessor
|
||||
- preprocess
|
||||
|
||||
## SmolVLMProcessor
|
||||
[[autodoc]] SmolVLMProcessor
|
||||
|
||||
@@ -211,6 +211,11 @@ model = VideoLlavaForConditionalGeneration.from_pretrained(
|
||||
|
||||
[[autodoc]] VideoLlavaImageProcessor
|
||||
|
||||
|
||||
## VideoLlavaVideoProcessor
|
||||
|
||||
[[autodoc]] VideoLlavaVideoProcessor
|
||||
|
||||
## VideoLlavaProcessor
|
||||
|
||||
[[autodoc]] VideoLlavaProcessor
|
||||
|
||||
Reference in New Issue
Block a user