Joao Gante
f4014e75db
Docs: update example with assisted generation + sample ( #30853 )
2024-05-16 14:32:21 +01:00
Raushan Turganbay
95b3c3814d
Video-LLaVa: Fix docs ( #30855 )
...
fix model id in docs
2024-05-16 17:23:01 +05:00
Yih-Dar
1b3dba9417
Make Gemma work with torch.compile ( #30775 )
...
* fix
* [run-slow] gemma
* add test
* add `test_compile_static_cache`
* fix
* style
* remove subprocess
* use attribute
* fix
* style
* update
* [run-slow] dbrx,gemma,jetmoe,phi3,recurrent_gemma
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-05-16 13:41:33 +02:00
Mohit Sharma
0753134f4d
Disable the FA backend for SDPA on AMD GPUs ( #30850 )
...
* disable fa
* disable fa
* update warning
* update warning
2024-05-16 13:31:14 +02:00
Joao Gante
9d889f870e
Cache: add new flag to distinguish models that Cache but not static cache ( #30800 )
...
* jamba cache
* new flag
* generate exception
2024-05-16 12:08:35 +01:00
NielsRogge
17cc71e149
[Idefics2] Improve docs, add resources ( #30717 )
...
* Add resources
* Address comment
* Address comments
* Update docs/source/en/model_doc/idefics2.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update figure
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-05-16 12:22:13 +02:00
hyenal
1c21f48a50
add sdpa to ViT [follow up of #29325 ] ( #30555 )
...
remove blank line (+1 squashed commit)
Squashed commits:
[24ccd2061] [run-slow]vit_msn,vision_encoder_decoder (+24 squashed commits)
Squashed commits:
[08bd27e7a] [run-slow]vit_msn,vision_encoder_decoder
[ec96a8db3] [run-slow]vit_msn
[ead817eca] fix vit msn multi gpu
[d12cdc8fd] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos
[3fdbfa88f] doc
[a3ff33e4a] finish implementation
[e20b7b7fb] Update test_modeling_common.py
[e290c5810] Update test_modeling_flax_common.py
[d3af86f46] comment
[ff7dd32d8] more comments
[59b137889] suggestion
[7e2ba6d67] attn_implementation as attribute of the class
[fe66ab71f] minor
[38642b568] Apply suggestions from code review
Accept comments
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
[22cde7d52] Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
[48e137cc6] Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
[99f4c679f] Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
[96cf20a6d] Update src/transformers/models/vit_msn/modeling_vit_msn.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
[c59377d23] Update src/transformers/models/vit_mae/modeling_vit_mae.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
[b70a47259] Update tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
[00c84d216] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos
[61f00ebb0] all tests are passing locally
[e9e0b82b7] vision encoder/decoder
[4d5076b56] test-vision (+20 squashed commits)
Squashed commits:
[d1add8db9] yolo
[9fde65716] fix flax
[986566c28] minor
[ca2f21d1f] vit
[3333efd7a] easy models change
[ebfc21402] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos
[b8b8603ed] [run-slow]vision_encoder_decoder,vision_text_dual_encoder,yolos
[48ecc7e26] all tests are passing locally
[bff7fc366] minor
[62f88306f] fix yolo and text_encoder tests
[121507555] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae
[1064cae0a] [run-slow]vision_encoder_decoder,vision_text_dual_encoder,yolos
[b7f52ff3a] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae
[cffaa10dd] fix-copies
[ef6c511c4] test vit hybrid
[7d4ba8644] vit hybrid
[66f919033] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae
[1fcc0a031] fixes
[cfde6eb21] fixup
[e77df1ed3] all except yolo end encoder decoder (+17 squashed commits)
Squashed commits:
[602913e22] vit + vit_mae are working
[547f6c4cc] RUN_SLOW=1 pytest tests/models/audio_spectrogram_transformer/ tests/models/deit/ tests/models/videomae/ passes
[61a97dfa9] it s the complete opposite...
[aefab37d4] fix more tests
[71802a1b9] fix all torch tests
[40b12eb58] encoder - decoder tests
[941552b69] slow decorator where appropriate
[14d055d80] has_attentions to yolo and msn
[3381fa19f] add correct name
[e261316a7] repo consistency
[31c6d0c08] fixup
[9d214276c] minor fix
[11ed2e1b7] chore
[eca6644c4] add sdpa to vit-based models
[cffbf390b] make fix-copies result
[6468319b0] fix style
[d324cd02a] add sdpa for vit
Co-authored-by: Liubov Yaronskaya <luba.yaronskaya@gmail.com >
2024-05-16 10:56:11 +01:00
NielsRogge
9fd606dbdb
[LLaVa-NeXT] Small fixes ( #30841 )
...
* First draft
* Update docstring
2024-05-16 08:19:15 +02:00
Edoardo Cetin
4b3eb19fa7
Fix llama model sdpa attention forward function masking bug when output_attentions=True ( #30652 )
...
* Fix llama model forward function with attention=True, same-length encoded sequence.
* Fix style
* propagate fix to modeling_cohere, gemma, dbrx, and olmo (which copy the same sdpa masking logic from llama)
* Fix style
* ignore unnecessary sdpa mask converter when output_attentions=True
* add tests checking sdpa and eager outputs match when output_attentions=True
* Split if statements in two lines
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Fix formatting
* Add fix to new jetmoe model
* Add missing output_attentions argument to jetmoe mask creation
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
2024-05-15 19:48:19 +02:00
Yih-Dar
2d83324ecf
Use torch 2.3 for CI ( #30837 )
...
2.3
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-05-15 19:31:52 +02:00
Younes Belkada
3f435823e0
FEAT / Bitsandbytes: Add dequantize API for bitsandbytes quantized models ( #30806 )
...
* add method
* change method name
* more comments
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* fixup
* add docstrings and fix comment
* warn users on the de-quantized dtype
* Update src/transformers/quantizers/base.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update src/transformers/integrations/bitsandbytes.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* final suggestion - use private method
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-05-15 17:17:09 +02:00
amyeroberts
58faa7b824
Deprecate models script - correctly set the model name for the doc file ( #30785 )
...
* Correctly set the moel name for the doc file
* Fix up
2024-05-15 15:14:11 +01:00
Xuan-Phi Nguyen
5ca085b882
Better llava next. ( #29850 )
...
* Better llava next.
- Batched forward with multiple image of different sizes (number of patches).
- Support training, for cases without any image.
- Support multi-image in same sequence. e.g: ["<image> <image> the first image is a dog while the second is a cat", "<image> <image> <image> <image> these 4 image are..."]
Current limitation:
- Haven't done testing
- Only support right padding (for training)
- left padding (batched generation) is not ready yet.
- PR not ready.
* fix bugs in batched generation
* add tests
* fix batch-gen bugs, left-padding positions and incorrect attention mask
* remove better modeling llava
* fix formatting
* fix test
* fix test
* fix testing
* fix test
* fix formatting
* Update src/transformers/models/llava_next/modeling_llava_next.py
add clarity
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update modeling_llava_next.py
remove assert
* fix bug modeling_llava_next.py
* update modeling
* fix bugs
* fix format
* fix error
* fix new_token_positions
* Update modeling_llava_next.py
* update formatting
* add args
* removecomments
* add slow tests for batched inference
* failing tf/flax tests
* this one ic correct
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* fix docs
* make fixup
* more fixup
* add test for batch equivalence
* Update tests/models/llava_next/test_modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/llava_next/image_processing_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/llava_next/image_processing_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* pr comments
* hardcode padding side for bs=1
* update
* [run-slow] llava_next
* [run-slow] llava_next
* make fix-copies
---------
Co-authored-by: NGUYEN, Xuan Phi <x.nguyen@alibaba-inc.com >
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
Co-authored-by: raushan <raushan@huggingface.co >
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz >
2024-05-15 19:02:56 +05:00
Sourab Mangrulkar
bdfefbadaf
Update ds_config_zero3.json ( #30829 )
2024-05-15 10:02:31 -04:00
xkszltl
92544cb8f3
Missing Optional in typing. ( #30821 )
...
The function checks for None in its first line.
2024-05-15 15:00:43 +01:00
amyeroberts
64c06df325
Jamba - Skip 4d custom attention mask test ( #30826 )
...
* Jamba - Skip 4d custom attention mask test
* Skip assistant greedy test
2024-05-15 13:57:28 +01:00
Lysandre Debut
a42844955f
Loading GGUF files support ( #30391 )
...
* Adds support for loading GGUF files
Co-authored-by: Younes Belkada <younesbelkada@gmail.com >
Co-authored-by: 99991 <99991@users.noreply.github.com >
* add q2_k q3_k q5_k support from @99991
* fix tests
* Update doc
* Style
* Docs
* fix CI
* Update docs/source/en/gguf.md
* Update docs/source/en/gguf.md
* Compute merges
* change logic
* add comment for clarity
* add comment for clarity
* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* change logic
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* change
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/modeling_gguf_pytorch_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* put back comment
* add comment about mistral
* comments and added tests
* fix unconsistent type
* more
* fix tokenizer
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* address comments about tests and tokenizer + add added_tokens
* from_gguf -> gguf_file
* replace on docs too
---------
Co-authored-by: Younes Belkada <younesbelkada@gmail.com >
Co-authored-by: 99991 <99991@users.noreply.github.com >
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-05-15 14:28:20 +02:00
Raushan Turganbay
bd9f4d7951
Add Video Llava ( #29733 )
...
* add model draft
* update docstring
* add tests
* support image and video as input
* update for better handling of mixed input and clean-up a bit
* bug when mixed inputs & add tests
* Update README.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com >
* Merge remote-tracking branch 'upstream/main' into video_llava
* link to abstract of paper in README
* fix test
* fix-copies
* make tests happy
* skip docstest for now
* do not run doctest for now
* Update src/transformers/models/video_llava/processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update tests/models/video_llava/test_modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* address review comments
* failing tests
* Fix vocab_size in common tests for VLMs
* codestyle
* Update src/transformers/models/video_llava/configuration_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/video_llava/configuration_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/video_llava/modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/video_llava/modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update docs/source/en/model_doc/video_llava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update docs/source/en/model_doc/video_llava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update docs/source/en/model_doc/video_llava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/video_llava/processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update tests/models/video_llava/test_modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update tests/models/video_llava/test_modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update tests/models/video_llava/test_modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* PR suggestions
* fix-copies
* Update src/transformers/models/video_llava/configuration_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/video_llava/configuration_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* add full example in docs
* clean-up with new model-id
* [run-slow] video_llava
* update docstring
* [run-slow] video_llava
* remove all achive maps
* fix some tests
* test was supposed to be skipped for llava :)
---------
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com >
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-05-15 16:42:29 +05:00
David
b8aee2e918
Remove unused module DETR based models ( #30823 )
...
* removing heads for classification from DETR models.
* quality fix
2024-05-15 11:19:43 +01:00
Ondřej Cífka
be3aa43e5f
Support mixed-language batches in WhisperGenerationMixin ( #29688 )
...
* Add support for mixing languages in a single batch
* Update docstring
* Enable different detected languages in batch
* Do not require input_features
* Test list of languages
* Fix comment
* Make init_tokens length-1 if possible, broadcast at the end
* Test for ValueError with language list of incorrect length
* Slow test for batched multilingual transcription
* fixup
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com >
* Address review, refactor
* Second attempt to move this line where it was originally
* Split test, fix a bug
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com >
2024-05-15 09:53:17 +02:00
Jacky Lee
37543bad3c
Add missing dependencies in image classification example ( #30820 )
...
fix: missing dependencies
2024-05-15 08:38:30 +02:00
Jacky Lee
99e16120ab
Add support for custom checkpoints in MusicGen ( #30011 )
...
* feat: support custom checkpoint
* update: revert changes and add TODO
* update: docs and exception handling
* fix: ah, extra space
2024-05-15 08:30:33 +02:00
Pablo Montalvo
1360801a69
Add PaliGemma ( #30814 )
...
* add new model like
* add state dict slicing + new model config
* update palma config and weights, passes vision activations
* fix
* update
* reorder loading/unpacking
* clean up
* add debug statements
* change device
* fix
* debugging
* fix noncausal mask
* fixup sdpa + causal mask
* fix activation function
* remove debug before changing modeling file
* add variants
* debug attention mask in generate
* revert to non-debug sdpa
* revert gemma modifications
* add custom language modeling
* use Processor
* add language modeling file to init
* try thin wrapper around generate
* Update
* update mask
* breakpoints galore
* remove conflict
* switch to left-padding
* add incomplete model doc
* add paligemma global files
* batch rename paligemma
* make generation match outputs and captioning
* style
* style
* remove copied from + doc
* remove more copied from
* remove copy from projector
* minor fix
* update config and style
* add readme - dummy
* CORRECT image captioning
* moving to args
* add siglip proper + fix merging image + text features
* take update_causal_mask from upstream
* remove breakpoint
* leverage AutoModel
* fix input_ids slicing
* make siglip head conditional
* remove encoder_decoder value
* remove unneeded modeling file
* add commented 4d attention mask
* FIXED generation with 4D mask
* Update src/transformers/models/siglip/modeling_siglip.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* fix left padding detection
* shuffle order of verifications
* fix missing labels for training
* fix
* vectorize merging of features, improve slicing
* improve testing before conversion
* handle merging in processor
* image token index depends on checkpoint
* add variants, save processor too
* save processors, base tokenizer off spm file
* expand model embeddings due to additional image token
* pass image processing args
* add convert rgb to siglip processor
* add \n token separately
* fix tokenizer and prompts
* fix docstrings
* change to camel
* fix casing
* debug pos_ids and sdpa
* pass and use cache_position
* add flag for newline tokenization
* Update src/transformers/models/paligemma/processing_paligemma.py
Co-authored-by: Merve Noyan <merveenoyan@gmail.com >
* simplify conversion script
* add copied from
* add precision to conversion script
* Update src/transformers/models/paligemma/modeling_paligemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co >
* clean up
* Shift attention mask from `1:`
After discussion with @molbap
* add docs, fix quality
* quality, tied weights inheritance, and logits/label alignment
* fix more tests
* pass attn_implementation to language model correctly
* add SiglipVisionTransformer to no split modules
* skip paligemma test for sdpa dispatch to flash
* skip incompatible tests
* quality
* [broken archive maps]
* Apply suggestions
- remove archive lists
- style
- take shape of inputs_embeds for batch
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/utils/dummy_pt_objects.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* simplify conversion script
* add suggestions
* add suggestions
* add copied from
* fix
* move labels out
* revert
* fix
* remove placeholder labels if None
* use cache_position
* fix quality + docstrings
* fix quality
* fix paligemma 4d gemma mask incompatibility
* fix config docstring
* fix query and attn_mask dtype
---------
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com >
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
Co-authored-by: Merve Noyan <merveenoyan@gmail.com >
Co-authored-by: Pedro Cuenca <pedro@huggingface.co >
2024-05-14 22:07:15 +02:00
Ankur Singh
c96aca3a8d
Added the necessay import of module ( #30804 )
2024-05-14 18:45:06 +01:00
Yikang Shen
ccdabc5642
Add JetMoE model ( #30005 )
...
* init jetmoe code
* update archive maps
* remove flax import
* fix import error
* update README
* ruff fix
* update readme
* fix
* update config
* fix issue
* merge files
* fix model bug
* fix test
* auto fix
* model size
* add comments
* fix form
* add flash attention support
* fix attention head number
* fix init
* fix support list
* sort auto mapping
* fix test
* fix docs
* update test
* fix test
* fix test
* change variable name
* fix config
* fix init
* update format
* clean code
* fix config
* fix config
* change default config
* update config
* fix issues
* update formate
* update config argument
* update format
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* change to mixtral aux loss
* change to cache_position
* debug
* fix bugs
* debug
* fix format
* fix format
* fix copy
* fix format
* fix format
* fix sort
* fix sort
* fix sort
* add copy comment
* add copy from
* remove debug code
* revert readme update
* add copy
* debug
* remove debug code
* fix flash attention
* add comments
* clean code
* clean format
* fix format
* fix format
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* change variable name
* add copied from
* fix variable name
* remove deprecated functinos
* sync to llama implementation
* fix format
* fix copy
* fix format
* update format
* remove repr
* add comment for moe weight
* fix copy
* Update src/transformers/models/jetmoe/configuration_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* add comments and reformat config
* fix format
* fix format
* fix format
* update test
* update doc string in config
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* update config doc
* update attention cache
* fix format
* fix copy
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
2024-05-14 16:32:01 +02:00
Masahiro Suzuki
d84f34ad77
[T5] Adding model_parallel = False to T5ForTokenClassification and MT5ForTokenClassification ( #30763 )
...
* Adding model_parallel = False
* Revert "Adding model_parallel = False"
This reverts commit ba1d99976acb598824ce3347dbe7d848daa21e79.
* Trainer: circumvent error for model in which is_parallelizable is True but does not have model_parallel attribute
2024-05-14 14:39:25 +01:00
Matt
9ef3884046
Deprecate TF weight conversion since we have full Safetensors support now ( #30786 )
2024-05-14 13:48:17 +01:00
Joao Gante
d8f8a9cd61
CI: more models wo cache support ( #30780 )
2024-05-14 10:43:03 +01:00
Raushan Turganbay
5ad960f1f4
Add Watermarking LogitsProcessor and WatermarkDetector ( #29676 )
...
* add watermarking processor
* remove the other hashing (context width=1 always)
* make style
* Update src/transformers/generation/logits_process.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* Update src/transformers/generation/logits_process.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* Update src/transformers/generation/logits_process.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* update watermarking process
* add detector
* update tests to use detector
* fix failing tests
* rename `input_seq`
* make style
* doc for processor
* minor fixes
* docs
* make quality
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* Update src/transformers/generation/logits_process.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* Update src/transformers/generation/watermarking.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* Update src/transformers/generation/watermarking.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* Update src/transformers/generation/watermarking.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* add PR suggestions
* let's use lru_cache's default max size (128)
* import processor if torch available
* maybe like this
* lets move the config to torch independet file
* add docs
* tiny docs fix to make the test happy
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* Update src/transformers/generation/watermarking.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* PR suggestions
* add docs
* fix test
* fix docs
* address pr comments
* style
* Revert "style"
This reverts commit 7f33cc34ff08b414f8e7f90060889877606b43b2.
* correct style
* make doctest green
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
2024-05-14 13:31:39 +05:00
Pashmina Cameron
65ea1904ff
PEFT: Access active_adapters as a property in Trainer ( #30790 )
...
Access active_adapters as a property
2024-05-14 09:31:18 +01:00
Raushan Turganbay
c02d302e6b
Fix cache type in Idefics2 ( #30729 )
...
standardize cache in idefics2
2024-05-14 13:30:53 +05:00
Jacky Lee
449894d2e5
Fix OWLv2 Doc ( #30794 )
...
fix: owlv2 doc
2024-05-14 08:36:11 +02:00
fxmarty
37bba2a32d
CI: update to ROCm 6.0.2 and test MI300 ( #30266 )
...
* update to ROCm 6.0.2 and test MI300
* add callers for mi300
* update dockerfile
* fix trainer tests
* remove apex
* style
* Update tests/trainer/test_trainer_seq2seq.py
* Update tests/trainer/test_trainer_seq2seq.py
* Update tests/trainer/test_trainer_seq2seq.py
* Update tests/trainer/test_trainer_seq2seq.py
* update to torch 2.3
* add workflow dispatch target
* we may need branches: mi300-ci after all
* nit
* fix docker build
* nit
* add check runner
* remove docker-gpu
* fix issues
* fix
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com >
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-05-13 18:14:36 +02:00
Marc Sun
539ed75d50
skip low_cpu_mem_usage tests ( #30782 )
2024-05-13 18:00:43 +02:00
amyeroberts
0f8fefd481
Deprecate models script ( #30184 )
...
* Add utility for finding candidate models for deprecation
* Update model init
* Make into configurable script
* Fix path
* Add sorting of base object alphabetically
* Tidy
* Refactor __init__ alpha ordering
* Update script with logging
* fix import
* Fix logger
* Fix logger
* Get config file before moving files
* Take models from CLI
* Split models into lines to make easier to feed to deprecate_models script
* Update
* Use posix path
* Print instead
* Add example in module docstring
* Fix up
* Add clarifying comments; add models to DEPRECATE_MODELS
* Address PR comments
* Don't update relative paths on the same level
2024-05-13 16:30:55 +01:00
Yih-Dar
82c1625ec3
Save other CI jobs' result (torch/tf pipeline, example, deepspeed etc) ( #30699 )
...
* update
* update
* update
* update
* update
* update
* update
* update
* Update utils/notification_service.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-05-13 17:27:44 +02:00
Joao Gante
2e27291ce4
Generate: assistant should be greedy in assisted decoding ( #30778 )
...
* assistant should be greedy
* better comment
* Update src/transformers/generation/candidate_generator.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-05-13 16:08:45 +01:00
Alazar
94306352f4
Port IDEFICS to tensorflow ( #26870 )
...
* Initial commit
* Just a copy of modeling_idefics.py that will be ported to TF
* - Prepend TF to the name of all classes
- Convert pytorch ops to TF (not all operations are converted yet)
* Add TF imports
* Add autotranslated files
* Add TF classes to model_tf_auto.py
* Add the TF classes in model_doc
* include auto-translated code
* Adopted from auto-translated version
* Add a forgotten super().build
* Add test code for TF version.
* Fix indentation and load pytorch weights for now
* Some fixes. Many tests are still failing but some are passing now.
- I have added TODO's for some of the hacks I made to unblock me
and I will address them soon
- I have the processing_idefics.py hacked in my view to support TF temporarily
* Add ALL_LAYERNORM_LAYERS to match pytorch
* Revert "Add ALL_LAYERNORM_LAYERS to match pytorch"
This reverts commit 7e0a35119b4d7a6284d04d8c543fba1b29e573c9 as it
is not needed in the tf implementation.
* Fix freeze_relevant_params()
* Some more fixes
* Fix test_attention_outputs
* Add tf stuff to processing_idefics.py
processing_idefics.py supports both pytorch and tf now.
test_processor_idefics.py for pytorch is passing, so i didn't break anything
but still some issues with tf. I also need to add tf tests in
test_processor_idefics.py.
* Pass return_tensors to image processing code and fix test
* Pass return_tensors to the image processor __init__
* Fix several test cases
- Make input to some of the forward pass of type `TFModelInputType`
- Decorate main layer forward pass with `@unpack_inputs`
- Decorate main layer with `@keras_serializable`
- Pass `inputs` to TFIdeficsModel
* Some more fixes forgotten in last commit
* Fix processing code and vision_tf.py
* Fix perceiver bug
* Import from
* Auto-add build() methods + style pass
* Fix build() errors due to `None` being passed as shape to some layers
* Change name in TFIdeficsForVisionText2Text to attribute in IdeficsForVisionText2Text
* Fix pytorch weights load for tf2
There were a lot of `name=` missing in weight initialization code.
* Attempt to fix CI
* Add back accidently removed line
* Remove torch-specific stuff from the TF test file
* make fix-copies, make style, remove autotranslated files
* Fixes to imports/docstrings
* Let's try the from future import in desperation
* Fix the core random_attention_mask fn to match the torch/flax behaviour
* Clean random_attention_mask up correctly
* Remove torch-only test
* Fix loss shape, couple of nits
* make style
* Don't test for OOB embeddings because IDEFICS uses those deliberately
* Fix loss computation to handle masking
* Fix test failures when flattening
* Fix some test failures
- Add cross attention gate which was missing and wasn't being passed arround
- Fix overwriting of image_attention_mask due to hack I had for dummy inputs
* Add a proper stateless scaled_dot_product_attention
* make style
* Adding missing attribute from the PyTorch version
* Small cleanups to decoupledlinearlayer in case that helps
* Pass epsilon to LayerNormalization
* Attemp to fix pytorch weight cross-loading for TFIdeficsEmbedding
* Fix a bug in TFIdeficsGatedCrossAttentionLayer
* Patching up build() methods
* Constant self.inv_freq
* Constant self.inv_freq
* First working version
The TF implementation works now, there was a bug in the TFIdeficsDecoupledLinear
where the weights were mis-intialized (in_features,out_features)
when it should be: (out_features, in_features)
I have tested this so far with tiny-random and idefics-9b-instruct
and gives correct output.
I also dumped the final outputs for both pytorch and TF
and they are identical.
* Fix some test failures
* remove print statement
* Fix return_tensors
* Fix CI test failure check_code_quality
* Attempt to fix CI failures by running `make fixup`
The hardcoded IDs in test_modeling_tf_idefics.py are for the integration
test and makes that file unreadable and should probably be moved to a seperate file.
* Attempt to fix tests_pr_documentation_tests
* Fix a test failure in test_image_processing_idefics.py
* Fix test test_pt_tf_model_equivalence
* Fix a few failures
* Tiny fix
* Some minor fixes
* Remove a duplicate test
* Override a few test failures for IDEFICS
- `test_keras_save_load` is passing now
- `test_compile_tf_model` is still failing
* Fix processing_idefics.py after rebase
* Guard import keras with is_tf_available
* fix check code quality
* fix check code quality
* Minor fixes
* Skip test_save_load temporarily
This test passed on my local box but fails on the CI, skipping
for now to see if there are other remaining failures on the CI.
* Run `ruff format tests src utils`
* Fix last failing test, `test_compile_tf_model`
* Add fixes for vision_tf.py
I forgot to add this file in last commit.
* Minor fixes
* Replace "<<<" with "<<" for doc tests
IDEFICS-9B is too big for doctest runner, so don't run it there
* Make code more readable
* Fix bug after code review
I added a layer_norm_eps to IdeficsConfig but I don't even need it
since the vision config has a layer_norm_eps.
* Fix after code review
Use original code tokenizer.convert_tokens_to_ids
* Keep PyTorch as the default return_tensors
* Fixes to modeling_tf after code review
* Fixes from code review
- Remove all references of `TF_IDEFICS_PRETRAINED_MODEL_ARCHIVE_LIST`
- Pass 1e-5 to LayerNormalization in perceiver
* Run ruff
* Undo a change
* Refactor processing code after Matt's suggestion
* Remove TODO's that aren't needed anymore
* For pytorch, Use original pytorch processing code from main
Since this PR is a TF port it shouldn't make any modifications
to pytorch IDEFICS code. This changes undo's the pytorch processing
modifications I made and uses original code from main.
* Update tests/models/idefics/test_modeling_idefics.py
* Update tests/models/idefics/test_modeling_tf_idefics.py
* Add missing imports for is_pt_tf_cross_test
* [DO NOT MERGE]: This is a commit for debugging and will be reverted
The cross test `test_pt_tf_model_equivalence` passes locally but
fails when running on the CI. This commit is to help debug that
and will be reverted.
* Revert "[DO NOT MERGE]: This is a commit for debugging and will be reverted"
This reverts commit 8f0d709ec5bd46685fb0b4259d914ffee794875b.
* [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted
* [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted
* Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted"
This reverts commit 998cc38b8c3d313bf5e5eb55a7f5b7b881897b89.
* Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted"
This reverts commit 1c695ac4219c4ae4d39b330b01744dc27deb7dd4.
* Don't skip test_save_load
IIRC test_save_load was also failing on the CI but not on my local
box, it might be easier to debug that on the CI first than the cross tests
* Debugging commit, will be reverted
* Revert "Debugging commit, will be reverted"
This reverts commit 8eafc8e41e20c4e95a3a90834f06a6e9f445e2d5.
* Override `test_save_load` and push model to save
Maybe this will help me repro this weird bug
* pass my repo_id
* add endpoint
* Pass a temp (write) token just for this CI
* Undo last few commits, still pushing to hub for model debugging
The issue seems to be with save_pretrained(), when I looked at the model saved
from the CI test failure it is basically empty and has no weights.
`self.save_weights(..)` seems to be failing in save_pretrained but needs
more debugging
* Add logging to modeling tf utils, will be reverted just for debugging
* Debugging, will revert
* Revert "Debugging, will revert"
This reverts commit 9d0d3075fb7c82d8cde3a5c76bc8f3876c5c55d3.
* Revert "Add logging to modeling tf utils, will be reverted just for debugging"
This reverts commit 774b6b7b1c17b3ce5d7634ade768f2f686cee617.
* Remove `test_save_load`
The CI failures are gone after my latest rebase, no idea why
but I was still saving the model to my hub on HF and the tf_model.h5
file now has everything.
* Run make fix-copies
* Run ruff format tests src utils
* Debugging commit, will be reverted
* Run ruff, also trigger CI run
* Run ruff again
* Undo debugging commit
---------
Co-authored-by: Matt <rocketknight1@gmail.com >
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com >
2024-05-13 15:59:46 +01:00
Joao Gante
de2f722172
Generate: remove near-duplicate sample/greedy copy ( #30773 )
2024-05-13 15:48:20 +01:00
NielsRogge
ce87dca1d7
[Object detection pipeline] Lower threshold ( #30710 )
...
* Lower threshold
* Address comment
2024-05-13 16:47:58 +02:00
Fanli Lin
69d9bca55a
enable Pipeline to get device from model ( #30534 )
...
* check model.device
* fix
* style fix
* move model device
* remove print
* add comment
* fix
* add unit test
* optimize
* change test names and add more cases
* Update tests/pipelines/test_pipelines_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-05-13 15:00:39 +01:00
Joao Gante
f4dc26d466
Qwen: incorrect setup flag ( #30776 )
...
qwen does not support the new cache classes
2024-05-13 14:12:58 +01:00
Younes Belkada
f823fec53e
Generation / FIX: Fix multi-device generation ( #30746 )
...
* attempt to fix multi-device generation
* fix
* final fix
* final fix
* fix
* fix
* fix
* fix
* add joao suggestion
* fix
2024-05-13 14:35:45 +02:00
Poedator
a0779b9e19
Llama: fix custom 4D masks, v2 ( #30348 )
...
* 4d mask fixes
* Update custom 4D mask logic
* test moved to mixin
* extra tests 4d mask
* upd 4d mask and StaticCache handling
* added Mask4DTestHard to mistral tests
* post-rebase fixes
* test fixes for StaticCache
* make fix-copies
* upd 1 after #30476
* fix common tests
* rm elif attention_mask.dim() == 4:
* tests combined, fixed, mixtral supported
* bigbird style chg reverted
* rm if attention_mask.dim() == 2
* modeling_llama formatting chg
---------
Co-authored-by: Joao Gante <joao@huggingface.co >
2024-05-13 13:46:06 +02:00
Eduardo Pacheco
453893ed15
[GroundingDino] Adding ms_deform_attn kernels ( #30768 )
...
* Adding ms_deform_attn kernels to GroundingDino
* Pointing to deformable detr kernels
2024-05-13 12:34:45 +01:00
Nilabhra Roy Chowdhury
e52741f601
Support for Falcon2-11B ( #30771 )
...
* remove unrelated changes
* remove unrelated changes on phi and stable LM
* add: Test for Falcon 10B
* fix: formatting
* fix: loading the falcon 10B in 8 bit precision using bitsanbytes.
* fix: device placement
* fix: broken tests.
* fix: backwards compatibility for falcon 1B architecture.
* chore: updated test.
* chore: test_modeling_falcon.py to use the 11B model.
* chore: minor edit
* chore: formating.
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com >
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com >
2024-05-13 13:32:43 +02:00
Zafir Stojanovski
f63d822242
Blip dynamic input resolution ( #30722 )
...
* blip with interpolated pos encoding
* feat: Add interpolate_pos_encoding option to other models from `BLIP` family.
* include check for textual generated content in tests
2024-05-13 12:20:16 +01:00
Younes Belkada
a4e530e3c8
Workflow: Replace actions/post-slack with centrally defined workflow ( #30737 )
...
* Remove commit details
* remove old workflow
2024-05-13 12:08:48 +02:00
Marc Sun
de6e0db184
[awq] replace scale when we have GELU ( #30074 )
...
* fix awq test
* style
* add log
* new fix
* style
* only modifying impacted model in the end
* rename function
2024-05-13 11:41:03 +02:00
mobicham
e0c3cee170
hqq - fix weight check in check_quantized_param ( #30748 )
...
* hqq - fix weight check in check_quantized_param
* ruff format
2024-05-10 19:29:35 +02:00