Add X-CLIP (#18852)
* First draft * Improve conversion script * Make vision encoder work * More improvements * Improve conversion script * Fix quality * Add MultiframeIntegrationTransformer * More improvements * Make MiT output work * Fix quality * Add prompts generator * Add tests * Fix some tests * Fix some more tests * Fix more tests * Improve conversion script * Fix model outputs * Fix more tests * Add XClipProcessor * Use processor in conversion script * Fix integration test * Update README, fix docs * Fix all tests * Add MIT output to XClipOutput * Create better variable names * Rename XClip to XCLIP * Extend conversion script * Add support for large models * Add support for 16 frame models * Add another model' * Fix module issue * Apply suggestions from code review * Add figure to docs * Fix CLIPProcessor issue * Apply suggestions from code review * Delete file * Convert more checkpoints * Convert last checkpoint * Update nielsr to microsoft
This commit is contained in:
@@ -49,6 +49,7 @@ CONFIG_CLASSES_TO_IGNORE_FOR_DOCSTRING_CHECKPOINT_CHECK = {
|
||||
"SpeechEncoderDecoderConfig",
|
||||
"VisionEncoderDecoderConfig",
|
||||
"VisionTextDualEncoderConfig",
|
||||
"XCLIPConfig",
|
||||
}
|
||||
|
||||
|
||||
|
||||
@@ -207,6 +207,8 @@ IGNORE_NON_AUTO_CONFIGURED = PRIVATE_MODELS.copy() + [
|
||||
"TFWav2Vec2ForCTC",
|
||||
"TFHubertForCTC",
|
||||
"MaskFormerForInstanceSegmentation",
|
||||
"XCLIPVisionModel",
|
||||
"XCLIPTextModel",
|
||||
]
|
||||
|
||||
# Update this list for models that have multiple model types for the same
|
||||
|
||||
Reference in New Issue
Block a user