Add Idefics2 (#30253)

* Initial add model additions * Test * All weights loading * Can perform full forward pass * Local and remote the same * Matching local and remote * Fixup * Idefics2Model importable; fixup docstrings * Don't skip by default * Remove deprecated use_resampler arg * Remove self.config * DecoupledLinear takes config * Tidy up * Enable eager attention and tidy up * Most tests passing * Update for batch of processed images * Add image processor * Update doc pages * Update conversion script * Remove erroneous breakpoint * Remove accidendtal spelling change * Update to reflect changes on hub - make generate work * Fix up * Image processor tests * Update tests * Add a processor * Add a processor * Update convert script * Update modeling file - remove fixmes * Bug fix * Add processing test * Use processor * Fix up * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Fix test * Update config - PR comments and defaults align with checkpoint * Reviewer comments * Add copied froms for flahs attention * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove qk_layer_norm and freeze_layers functionality * Fix * Remove freeze_layer options from config * Sync with upstream main * Fix attention shapes siglip * Remove Llava-next refs - TO REBASE * Use AutoModel for text model * Add comment to explain vision embeddings * Fix issue with tie_word_embeddings * Address review comments * Fix and fix up * Chat templates for idefics * Fix copies * Fix * Add layer norms to FA2 * Fix tests * Apply suggestions from code review Co-authored-by: Victor SANH <victorsanh@gmail.com> * Fix * Review comments * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update inputs merger * Merge weights in correct order * Update convert script * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update template * Model code examples (fix idefics too) * More review comments * Tidy up * Update processing * Fix attention mask preparation * Update inputs_merger inputs * Vectorize inputs_merger * Update src/transformers/models/idefics2/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/idefics2/modeling_idefics2.py * Review comments * saying bye to the `qk_layer_norms` * Simplify * Update latents * Remove erroneuous readme changes * Return images when applying chat template * Fix bug - prompt images are for a single sample * Update src/transformers/models/idefics2/modeling_idefics2.py * image splitting * fix test * some more comment * some comment * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics2/image_processing_idefics2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update processor * Update model tests * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Don't add BOS in template * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Remove index in examples * Update tests to reflect #13 * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * PR comment - consistent typing * Update readme and model doc * Update docs * Update checkpoint references * Update examples * Fix and update tests * Small addition * Update tests - remove copied from as no ignore placement copy could be found * Update example * small fixes * Update docs/source/en/model_doc/idefics2.md Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update docs/source/en/model_doc/idefics2.md Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update README.md Co-authored-by: Victor SANH <victorsanh@gmail.com> * Connector model as bridge * Fix up * Fix up * Don't pass model inputs for generation kwargs update * IDEFICS-2 -> Idefics2 * Remove config archive name * IDEFICS-2 -> Idefics2 * Add back llava-next * Update readmes * Add requirements for processor tester * Use custom convert_to_rgb to avoid possible BC * Fix doc example * Fix doc example * Skip model doc tests - as model to large * More doc example - account for image splitting * Update src/transformers/image_transforms.py * Fix config doctest --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: ArthurZucker <arthur.zucker@gmail.com> Co-authored-by: Victor SANH <victorsanh@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-04-15 17:03:03 +01:00
parent 667939a2d3
commit 6b78360e6d
41 changed files with 4692 additions and 38 deletions
--- a/docs/source/en/perf_infer_gpu_one.md
+++ b/docs/source/en/perf_infer_gpu_one.md
@@ -47,6 +47,7 @@ FlashAttention-2 is currently supported for the following architectures:
 * [GPTNeo](https://huggingface.co/docs/transformers/model_doc/gpt_neo#transformers.GPTNeoModel)
 * [GPTNeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox#transformers.GPTNeoXModel)
 * [GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj#transformers.GPTJModel)
+* [Idefics2](https://huggingface.co/docs/transformers/model_doc/idefics2#transformers.Idefics2Model)
 * [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon#transformers.FalconModel)
 * [Llama](https://huggingface.co/docs/transformers/model_doc/llama#transformers.LlamaModel)
 * [Llava](https://huggingface.co/docs/transformers/model_doc/llava)
@@ -96,8 +97,8 @@ model_id = "tiiuae/falcon-7b"
 tokenizer = AutoTokenizer.from_pretrained(model_id)

 model = AutoModelForCausalLM.from_pretrained(
-    model_id, 
-    torch_dtype=torch.bfloat16, 
+    model_id,
+    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
 )
 ```
@@ -109,7 +110,7 @@ FlashAttention-2 can only be used when the model's dtype is `fp16` or `bf16`. Ma
 <br>

 You can also set `use_flash_attention_2=True` to enable FlashAttention-2 but it is deprecated in favor of `attn_implementation="flash_attention_2"`.
-  
+
 </Tip>

 FlashAttention-2 can be combined with other optimization techniques like quantization to further speedup inference. For example, you can combine FlashAttention-2 with 8-bit or 4-bit quantization:
@@ -123,14 +124,14 @@ tokenizer = AutoTokenizer.from_pretrained(model_id)

 # load in 8bit
 model = AutoModelForCausalLM.from_pretrained(
-    model_id, 
+    model_id,
    load_in_8bit=True,
    attn_implementation="flash_attention_2",
 )

 # load in 4bit
 model = AutoModelForCausalLM.from_pretrained(
-    model_id, 
+    model_id,
    load_in_4bit=True,
    attn_implementation="flash_attention_2",
 )