[WIP] Emu3: add model (#33770)
* model can convert to HF and be loaded back * nit * works in single batch generation but hallucinates * use the image tokens * add image generation * now it works * add tests * update * add modulare but it doesn't work for porting docstring :( * skip some tests * add slow tests * modular removed the import? * guess this works * update * update * fix copies * fix test * fix copies * update * docs * fix tests * last fix tests? * pls * repo consistency * more style * style * remove file * address comments * tiny bits * update after the new modular * fix tests * add one more cond in check attributes * decompose down/up/mid blocks * allow static cache generation in VLMs * nit * fix copies * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix VAE upsampling * Update src/transformers/models/emu3/modular_emu3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address comments * state overwritten stuff explicitly * fix copies * add the flag for flex attn --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
ccc0381d36
commit
52e1f87c7d
@@ -1626,7 +1626,7 @@ class GenerationTesterMixin:
|
||||
# checks without adding test complexity. Ditto for `pixel_values_videos` and `pixel_values_images`
|
||||
pixel_values_is_mutually_exclusive = any(
|
||||
model_name in model_class.__name__.lower()
|
||||
for model_name in ["llava", "idefics2", "idefics3", "mllama", "paligemma"]
|
||||
for model_name in ["llava", "idefics2", "idefics3", "mllama", "paligemma", "emu3"]
|
||||
)
|
||||
if pixel_values_is_mutually_exclusive:
|
||||
inputs_dict.pop("pixel_values", None)
|
||||
@@ -1700,6 +1700,18 @@ class GenerationTesterMixin:
|
||||
if "inputs_embeds" not in inspect.signature(model.prepare_inputs_for_generation).parameters.keys():
|
||||
self.skipTest(reason="This model does not support `inputs_embeds` in generation")
|
||||
|
||||
# Some VLMs assume `inputs_embeds` and `pixel_values` are mutually exclusive AND fall in the
|
||||
# exception above (complex `inputs_embeds` computation). Popping `pixel_values` allow us to run the
|
||||
# checks without adding test complexity. Ditto for `pixel_values_videos` and `pixel_values_images`
|
||||
pixel_values_is_mutually_exclusive = any(
|
||||
model_name in model_class.__name__.lower()
|
||||
for model_name in ["llava", "idefics2", "idefics3", "mllama", "paligemma", "emu3"]
|
||||
)
|
||||
if pixel_values_is_mutually_exclusive:
|
||||
inputs_dict.pop("pixel_values", None)
|
||||
inputs_dict.pop("pixel_values_videos", None)
|
||||
inputs_dict.pop("pixel_values_images", None)
|
||||
|
||||
input_ids = inputs_dict.pop("input_ids")
|
||||
|
||||
model.config.use_cache = True
|
||||
@@ -1941,6 +1953,10 @@ class GenerationTesterMixin:
|
||||
|
||||
for dtype in (torch.float32, torch.float16):
|
||||
model = model_class(config).to(torch_device).to(dtype).eval()
|
||||
inputs_dict = {
|
||||
k: v.to(dtype) if isinstance(v, torch.Tensor) and torch.is_floating_point(v) else v
|
||||
for k, v in inputs_dict.items()
|
||||
}
|
||||
set_model_for_less_flaky_test(model)
|
||||
|
||||
generation_kwargs = {
|
||||
|
||||
Reference in New Issue
Block a user