Add SigLIP (#26522)
* Add first draft * Use appropriate gelu function * More improvements * More improvements * More improvements * Convert checkpoint * More improvements * Improve docs, remove print statements * More improvements * Add link * remove unused masking function * begin tokenizer * do_lower_case * debug * set split_special_tokens=True * Remove script * Fix style * Fix rebase * Use same design as CLIP * Add fast tokenizer * Add SiglipTokenizer to init, remove extra_ids * Improve conversion script * Use smaller inputs in conversion script * Update conversion script * More improvements * Add processor to conversion script * Add tests * Remove print statements * Add tokenizer tests * Fix more tests * More improvements related to weight initialization * More improvements * Make more tests pass * More improvements * More improvements * Add copied from * Add canonicalize_text * Enable fast tokenizer tests * More improvements * Fix most slow tokenizer tests * Address comments * Fix style * Remove script * Address some comments * Add copied from to tests * Add more copied from * Add more copied from * Add more copied from * Remove is_flax_available * More updates * Address comment * Remove SiglipTokenizerFast for now * Add caching * Remove umt5 test * Add canonicalize_text inside _tokenize, thanks Arthur * Fix image processor tests * Skip tests which are not applicable * Skip test_initialization * More improvements * Compare pixel values * Fix doc tests, add integration test * Add do_normalize * Remove causal mask and leverage ignore copy * Fix attention_mask * Fix remaining tests * Fix dummies * Rename temperature and bias * Address comments * Add copied from to tokenizer tests * Add SiglipVisionModel to auto mapping * Add copied from to image processor tests * Improve doc * Remove SiglipVisionModel from index * Address comments * Improve docs * Simplify config * Add first draft * Make it like mistral * More improvements * Fix attention_mask * Fix output_attentions * Add note in docs * Convert multilingual model * Convert large checkpoint * Convert more checkpoints * Add pipeline support, correct image_mean and image_std * Use padding=max_length by default * Make processor like llava * Add code snippet * Convert more checkpoints * Set keep_punctuation_string=None as in OpenCLIP * Set normalized=False for special tokens * Fix doc test * Update integration test * Add figure * Update organization * Happy new year * Use AutoModel everywhere --------- Co-authored-by: patil-suraj <surajp815@gmail.com>
This commit is contained in:
@@ -1061,6 +1061,7 @@ MODELS_NOT_IN_README = [
|
||||
"Vision Encoder decoder",
|
||||
"VisionTextDualEncoder",
|
||||
"CLIPVisionModel",
|
||||
"SiglipVisionModel",
|
||||
]
|
||||
|
||||
# Template for new entries to add in the main README when we have missing models.
|
||||
|
||||
@@ -214,7 +214,6 @@ IGNORE_NON_AUTO_CONFIGURED = PRIVATE_MODELS.copy() + [
|
||||
"ChineseCLIPVisionModel",
|
||||
"CLIPTextModel",
|
||||
"CLIPTextModelWithProjection",
|
||||
"CLIPVisionModel",
|
||||
"CLIPVisionModelWithProjection",
|
||||
"ClvpForCausalLM",
|
||||
"ClvpModel",
|
||||
@@ -309,6 +308,8 @@ IGNORE_NON_AUTO_CONFIGURED = PRIVATE_MODELS.copy() + [
|
||||
"SeamlessM4Tv2NARTextToUnitForConditionalGeneration",
|
||||
"SeamlessM4Tv2CodeHifiGan",
|
||||
"SeamlessM4Tv2ForSpeechToSpeech", # no auto class for speech-to-speech
|
||||
"SiglipVisionModel",
|
||||
"SiglipTextModel",
|
||||
]
|
||||
|
||||
# DO NOT edit this list!
|
||||
|
||||
@@ -171,7 +171,7 @@ MODEL_NAMES_WITH_SAME_CONFIG = {
|
||||
"XLS-R": "Wav2Vec2",
|
||||
"XLSR-Wav2Vec2": "Wav2Vec2",
|
||||
}
|
||||
MODEL_NAMES_TO_IGNORE = ["CLIPVisionModel"]
|
||||
MODEL_NAMES_TO_IGNORE = ["CLIPVisionModel", "SiglipVisionModel"]
|
||||
|
||||
|
||||
def get_model_table_from_auto_modules() -> str:
|
||||
|
||||
Reference in New Issue
Block a user