[Llava] Add Llava to transformers (#27662)
* add model like * logits match * minor fixes * fixes * up * up * add todo * llava processor * keep the processor simple * add conversion script * fixup * fix copies * up * add to index * fix config + logits * fix * refactor * more refactor * more refactor * fix copies * add authors * v1 tests * add `LlavaProcessor` in init * remove unneeded import * up * up * docs * up * fix CI * fix CI * add attention mask in test * make fixup * remove the vision model * that' s the dirty way to do it * nits * nits * updates * add more tests * add input tests * fixup * more styling * nits * updates amd cleanup * fixup the generation expected results * fix the testing script * some cleanup and simplification which does not work yet but almost there! * make correct dispatch operations * vectorize works for batch of images and text * last todos * nits * update test and modeling code * remove useless function for now * fix few issues * fix generation * some nits * add bakllava * nits * remove duplicated code * finis merge * cleanup * missed this line * fill the todos * add left padding offset * add left and rignt padding logic * bool to properly index * make sure * more cleanups * batch is fixed 😉 * add correct device for tensor creation * fix some dtype missmatch * ruff * update conversion script * Update src/transformers/__init__.py * fa 2 support + fix conversion script * more * correct reshaping * fix test dict * fix copies by ignoring * fix nit * skip clip vision model * fixup * fixup * LlavaForVisionText2Text -> LlavaForCausalLM * update * fix * raise correct errors * fix * docs * nuke for now * nits here and there * fixup * fix remaining tests * update LlavaForConditionalGeneration instead of CausalLM * fixups * pipeline support * slow and piepline tests * supports batch * nits * cleanup * fix first integration tests * add pad token where needed * correct etsts * fixups * update pipeline testr * fix quality * nits * revert unneeded change * nit * use BatchFeature * from ...feature_extraction_utils import BatchFeature * nits * nits * properly update * more f*** nits * fix copies * comment * keep slow test slow * Update src/transformers/models/llava/processing_llava.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add piepline example * add pixel values in docstrign * update pr doctest * fix * fix slow tests * remove hack * fixup * small note * forward contrib credits from PR25789 * forward contrib credits from original implementation and work * add arthur * Update src/transformers/models/llava/processing_llava.py Co-authored-by: Lysandre Debut <hi@lysand.re> * update docstring * nit * move to not doctested because of timeout issues * fixup * add description * more * fix-copies * fix docs * add beam search * add more comments * add typehints on processor * add speedup plot * update slow tests and docs * push test * push batched test * fix batched generation with different number of images * remove benchmark due to a bug * fix test * fix copies * add gcolab demo --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: shauray8 <shauray8@users.noreply.github.com> Co-authored-by: haotian-liu <haotian-liu@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>
This commit is contained in:
@@ -673,6 +673,7 @@ MODELS_NOT_IN_README = [
|
||||
"TimmBackbone",
|
||||
"Vision Encoder decoder",
|
||||
"VisionTextDualEncoder",
|
||||
"CLIPVisionModel",
|
||||
]
|
||||
|
||||
# Template for new entries to add in the main README when we have missing models.
|
||||
|
||||
@@ -171,6 +171,7 @@ MODEL_NAMES_WITH_SAME_CONFIG = {
|
||||
"XLS-R": "Wav2Vec2",
|
||||
"XLSR-Wav2Vec2": "Wav2Vec2",
|
||||
}
|
||||
MODEL_NAMES_TO_IGNORE = ["CLIPVisionModel"]
|
||||
|
||||
|
||||
def get_model_table_from_auto_modules() -> str:
|
||||
@@ -243,6 +244,8 @@ def get_model_table_from_auto_modules() -> str:
|
||||
check = {True: "✅", False: "❌"}
|
||||
|
||||
for name in model_names:
|
||||
if name in MODEL_NAMES_TO_IGNORE:
|
||||
continue
|
||||
if name in MODEL_NAMES_WITH_SAME_CONFIG.keys():
|
||||
prefix = model_name_to_prefix[MODEL_NAMES_WITH_SAME_CONFIG[name]]
|
||||
else:
|
||||
|
||||
@@ -146,6 +146,7 @@ docs/source/en/model_doc/levit.md
|
||||
docs/source/en/model_doc/lilt.md
|
||||
docs/source/en/model_doc/llama.md
|
||||
docs/source/en/model_doc/llama2.md
|
||||
docs/source/en/model_doc/llava.md
|
||||
docs/source/en/model_doc/longformer.md
|
||||
docs/source/en/model_doc/longt5.md
|
||||
docs/source/en/model_doc/luke.md
|
||||
@@ -294,7 +295,7 @@ docs/source/en/serialization.md
|
||||
docs/source/en/tasks/asr.md
|
||||
docs/source/en/tasks/audio_classification.md
|
||||
docs/source/en/tasks/document_question_answering.md
|
||||
docs/source/en/tasks/idefics.md # causes other tests to fail
|
||||
docs/source/en/tasks/idefics.md
|
||||
docs/source/en/tasks/image_captioning.md
|
||||
docs/source/en/tasks/image_classification.md
|
||||
docs/source/en/tasks/language_modeling.md
|
||||
@@ -432,7 +433,7 @@ src/transformers/models/blip/modeling_blip_text.py
|
||||
src/transformers/models/blip/modeling_tf_blip_text.py
|
||||
src/transformers/models/blip_2/configuration_blip_2.py
|
||||
src/transformers/models/blip_2/convert_blip_2_original_to_pytorch.py
|
||||
src/transformers/models/blip_2/modeling_blip_2.py # causes other tests to fail
|
||||
src/transformers/models/blip_2/modeling_blip_2.py
|
||||
src/transformers/models/bloom/convert_bloom_original_checkpoint_to_pytorch.py
|
||||
src/transformers/models/bloom/modeling_bloom.py
|
||||
src/transformers/models/bloom/modeling_flax_bloom.py
|
||||
@@ -634,6 +635,8 @@ src/transformers/models/lilt/configuration_lilt.py
|
||||
src/transformers/models/llama/configuration_llama.py
|
||||
src/transformers/models/llama/convert_llama_weights_to_hf.py
|
||||
src/transformers/models/llama/modeling_llama.py
|
||||
src/transformers/models/llava/configuration_llava.py
|
||||
src/transformers/models/llava/modeling_llava.py
|
||||
src/transformers/models/longformer/configuration_longformer.py
|
||||
src/transformers/models/longformer/convert_longformer_original_pytorch_lightning_to_pytorch.py
|
||||
src/transformers/models/longt5/configuration_longt5.py
|
||||
|
||||
Reference in New Issue
Block a user