Add training support for SigLIP (#31495)
* Add siglip loss function * Update docs * Enable training tests [experimental] enable GC training tests as it has worked for my own data * Remove test_training* overrides to enable training tests [run_slow] siglip * Skip training tests for Siglip text model and ImageClassificationModel [run_slow] siglip * Skip GC training tests for SiglipForImageClassification * Explicitly skip training tests for SiglipVisionModel Add skip reason for training tests for SiglipTextModel * Remove copied from to fix CI
This commit is contained in:
@@ -27,7 +27,7 @@ The abstract from the paper is the following:
|
||||
## Usage tips
|
||||
|
||||
- Usage of SigLIP is similar to [CLIP](clip). The main difference is the training loss, which does not require a global view of all the pairwise similarities of images and texts within a batch. One needs to apply the sigmoid activation function to the logits, rather than the softmax.
|
||||
- Training is not yet supported. If you want to fine-tune SigLIP or train from scratch, refer to the loss function from [OpenCLIP](https://github.com/mlfoundations/open_clip/blob/73ad04ae7fb93ede1c02dc9040a828634cb1edf1/src/open_clip/loss.py#L307), which leverages various `torch.distributed` utilities.
|
||||
- Training is supported but does not use `torch.distributed` utilities which may limit the scalability of batch size. However, DDP and FDSP works on single-node multi-gpu setup.
|
||||
- When using the standalone [`SiglipTokenizer`] or [`SiglipProcessor`], make sure to pass `padding="max_length"` as that's how the model was trained.
|
||||
- To get the same results as the pipeline, a prompt template of "This is a photo of {label}." should be used.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user