Adding GroupViT Models (#17313)
* add group vit and fixed test (except slow) * passing slow test * addressed some comments * fixed test * fixed style * fixed copy * fixed segmentation output * fixed test * fixed relative path * fixed copy * add ignore non auto configured * fixed docstring, add doc * fixed copies * Apply suggestions from code review merge suggestions Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * resolve comment, renaming model * delete unused attr * use fix copies * resolve comments * fixed attn * remove unused vars * refactor tests * resolve final comments * add demo notebook * fixed inconsitent default * Apply suggestions from code review Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * rename stage->stages * Create single GroupViTEncoderLayer class * Update conversion script * Simplify conversion script * Remove cross-attention class in favor of GroupViTAttention * Convert other model as well, add processor to conversion script * addressing final comment * fixed args * Update src/transformers/models/groupvit/modeling_groupvit.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
This commit is contained in:
@@ -41,6 +41,7 @@ _re_checkpoint = re.compile("\[(.+?)\]\((https://huggingface\.co/.+?)\)")
|
||||
|
||||
CONFIG_CLASSES_TO_IGNORE_FOR_DOCSTRING_CHECKPOINT_CHECK = {
|
||||
"CLIPConfig",
|
||||
"GroupViTConfig",
|
||||
"DecisionTransformerConfig",
|
||||
"EncoderDecoderConfig",
|
||||
"RagConfig",
|
||||
|
||||
@@ -142,6 +142,8 @@ IGNORE_NON_AUTO_CONFIGURED = PRIVATE_MODELS.copy() + [
|
||||
"BeitForMaskedImageModeling",
|
||||
"CLIPTextModel",
|
||||
"CLIPVisionModel",
|
||||
"GroupViTTextModel",
|
||||
"GroupViTVisionModel",
|
||||
"TFCLIPTextModel",
|
||||
"TFCLIPVisionModel",
|
||||
"FlaxCLIPTextModel",
|
||||
|
||||
Reference in New Issue
Block a user