[Llava] Add Llava to transformers (#27662)

* add model like * logits match * minor fixes * fixes * up * up * add todo * llava processor * keep the processor simple * add conversion script * fixup * fix copies * up * add to index * fix config + logits * fix * refactor * more refactor * more refactor * fix copies * add authors * v1 tests * add `LlavaProcessor` in init * remove unneeded import * up * up * docs * up * fix CI * fix CI * add attention mask in test * make fixup * remove the vision model * that' s the dirty way to do it * nits * nits * updates * add more tests * add input tests * fixup * more styling * nits * updates amd cleanup * fixup the generation expected results * fix the testing script * some cleanup and simplification which does not work yet but almost there! * make correct dispatch operations * vectorize works for batch of images and text * last todos * nits * update test and modeling code * remove useless function for now * fix few issues * fix generation * some nits * add bakllava * nits * remove duplicated code * finis merge * cleanup * missed this line * fill the todos * add left padding offset * add left and rignt padding logic * bool to properly index * make sure * more cleanups * batch is fixed 😉 * add correct device for tensor creation * fix some dtype missmatch * ruff * update conversion script * Update src/transformers/__init__.py * fa 2 support + fix conversion script * more * correct reshaping * fix test dict * fix copies by ignoring * fix nit * skip clip vision model * fixup * fixup * LlavaForVisionText2Text -> LlavaForCausalLM * update * fix * raise correct errors * fix * docs * nuke for now * nits here and there * fixup * fix remaining tests * update LlavaForConditionalGeneration instead of CausalLM * fixups * pipeline support * slow and piepline tests * supports batch * nits * cleanup * fix first integration tests * add pad token where needed * correct etsts * fixups * update pipeline testr * fix quality * nits * revert unneeded change * nit * use BatchFeature * from ...feature_extraction_utils import BatchFeature * nits * nits * properly update * more f*** nits * fix copies * comment * keep slow test slow * Update src/transformers/models/llava/processing_llava.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add piepline example * add pixel values in docstrign * update pr doctest * fix * fix slow tests * remove hack * fixup * small note * forward contrib credits from PR25789 * forward contrib credits from original implementation and work * add arthur * Update src/transformers/models/llava/processing_llava.py Co-authored-by: Lysandre Debut <hi@lysand.re> * update docstring * nit * move to not doctested because of timeout issues * fixup * add description * more * fix-copies * fix docs * add beam search * add more comments * add typehints on processor * add speedup plot * update slow tests and docs * push test * push batched test * fix batched generation with different number of images * remove benchmark due to a bug * fix test * fix copies * add gcolab demo --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: shauray8 <shauray8@users.noreply.github.com> Co-authored-by: haotian-liu <haotian-liu@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-07 09:30:47 +01:00
parent 0410a29a2d
commit 44b5506d29
29 changed files with 1408 additions and 7 deletions
--- a/utils/check_copies.py
+++ b/utils/check_copies.py
@@ -673,6 +673,7 @@ MODELS_NOT_IN_README = [
    "TimmBackbone",
    "Vision Encoder decoder",
    "VisionTextDualEncoder",
+    "CLIPVisionModel",
 ]

 # Template for new entries to add in the main README when we have missing models.
--- a/utils/check_table.py
+++ b/utils/check_table.py
@@ -171,6 +171,7 @@ MODEL_NAMES_WITH_SAME_CONFIG = {
    "XLS-R": "Wav2Vec2",
    "XLSR-Wav2Vec2": "Wav2Vec2",
 }
+MODEL_NAMES_TO_IGNORE = ["CLIPVisionModel"]


 def get_model_table_from_auto_modules() -> str:
@@ -243,6 +244,8 @@ def get_model_table_from_auto_modules() -> str:
    check = {True: "✅", False: "❌"}

    for name in model_names:
+        if name in MODEL_NAMES_TO_IGNORE:
+            continue
        if name in MODEL_NAMES_WITH_SAME_CONFIG.keys():
            prefix = model_name_to_prefix[MODEL_NAMES_WITH_SAME_CONFIG[name]]
        else:
--- a/utils/not_doctested.txt
+++ b/utils/not_doctested.txt
@@ -146,6 +146,7 @@ docs/source/en/model_doc/levit.md
 docs/source/en/model_doc/lilt.md
 docs/source/en/model_doc/llama.md
 docs/source/en/model_doc/llama2.md
+docs/source/en/model_doc/llava.md
 docs/source/en/model_doc/longformer.md
 docs/source/en/model_doc/longt5.md
 docs/source/en/model_doc/luke.md
@@ -294,7 +295,7 @@ docs/source/en/serialization.md
 docs/source/en/tasks/asr.md
 docs/source/en/tasks/audio_classification.md
 docs/source/en/tasks/document_question_answering.md
-docs/source/en/tasks/idefics.md  # causes other tests to fail
+docs/source/en/tasks/idefics.md
 docs/source/en/tasks/image_captioning.md
 docs/source/en/tasks/image_classification.md
 docs/source/en/tasks/language_modeling.md
@@ -432,7 +433,7 @@ src/transformers/models/blip/modeling_blip_text.py
 src/transformers/models/blip/modeling_tf_blip_text.py
 src/transformers/models/blip_2/configuration_blip_2.py
 src/transformers/models/blip_2/convert_blip_2_original_to_pytorch.py
-src/transformers/models/blip_2/modeling_blip_2.py  # causes other tests to fail
+src/transformers/models/blip_2/modeling_blip_2.py
 src/transformers/models/bloom/convert_bloom_original_checkpoint_to_pytorch.py
 src/transformers/models/bloom/modeling_bloom.py
 src/transformers/models/bloom/modeling_flax_bloom.py
@@ -634,6 +635,8 @@ src/transformers/models/lilt/configuration_lilt.py
 src/transformers/models/llama/configuration_llama.py
 src/transformers/models/llama/convert_llama_weights_to_hf.py
 src/transformers/models/llama/modeling_llama.py
+src/transformers/models/llava/configuration_llava.py
+src/transformers/models/llava/modeling_llava.py
 src/transformers/models/longformer/configuration_longformer.py
 src/transformers/models/longformer/convert_longformer_original_pytorch_lightning_to_pytorch.py
 src/transformers/models/longt5/configuration_longt5.py