HuggingFace_transformer

Author	SHA1	Message	Date
Raushan Turganbay	e1688d28d3	[Model] Cohere2 Vision (#39810 ) * Add cohere2_vision to support CohereLabs/command-a-vision-07-2025 * update and add modualr file * update processors and check with orig impl later * delete unused files * image processor reduce LOC and re-use GotOCR2 * update the config to use modular * model tests pass * processor fixes * check model outputs decorator * address one more comment * Update tokens. Temp - need to read from tokenizer' * fix for multi-gpu * Fix image token handling * upadte image token expansion logic * fix a few issues with remote code loading * not related but modular forces us to change all files now * Add overview and code sample to cohere vision docs * add scripts. TMP. * Update inference script * Create script * set dtype in export script * TO revert: modular export fix * Fix scripts * Revert "TO revert: modular export fix" This reverts commit bdb2f305b61027a05f0032ce70d6ca698879191c. * Use modular weights * Upload to hub Removed OOD weights ad script * Updated docs * fix import error Update docs Added pipeline test * Updated docs * Run modular script remove modular for config Added patch_size Added docstrings in modular Fix OOM Add docs, fixup integration tests. 8-gpu passing * tiny updates * address comments + fixup * add test for chat template * check model outputs workaround * aya vision fix check model inputs * Revert "add test for chat template" This reverts commit 42c756e397f588d76b449ff1f93292d8ee0202d8. * reveert more changes * last revert * skip and merge * faulty copy from --------- Co-authored-by: Julian Mack <julian.mack@cohere.com> Co-authored-by: kyle-cohere <kyle@cohere.com>	2025-07-31 10:57:34 +00:00
Raushan Turganbay	ccb2e0e03b	Fix GPT2 with cross attention (#39754 ) * fix * use new mask API * style * fix copies and attention tests * fix head pruning tests	2025-07-29 15:40:31 +02:00
Yuanyuan Chen	95faabf0a6	Apply several ruff SIM rules (#37283 ) * Apply ruff SIM118 fix Signed-off-by: cyy <cyyever@outlook.com> * Apply ruff SIM910 fix Signed-off-by: cyy <cyyever@outlook.com> * Apply ruff SIM101 fix Signed-off-by: cyy <cyyever@outlook.com> * Format code Signed-off-by: cyy <cyyever@outlook.com> * More fixes Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-07-29 11:40:34 +00:00
Eric Bezzam	7623aa3e5f	Fix `Qwen2AudioForConditionalGeneration.forward()` and `test_flash_attn_kernels_inference_equivalence` (#39503 ) * Add missing cache_position argument. * Pass cache_position to language model. * Overwrite prepare_inputs_for_generation. * Set model to half precision for Flash Attention test. * Cast model to bfloat16.	2025-07-28 16:35:08 +02:00
Yih-Dar	28f2619868	skip `Glm4MoeModelTest::test_torch_compile_for_training` (#39670 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-28 16:30:40 +02:00
Cyril Vallez	18a7c29ff8	More robust tied weight test (#39681 ) * Update test_modeling_common.py * remove old ones * Update test_modeling_common.py * Update test_modeling_common.py * add * Update test_modeling_musicgen_melody.py	2025-07-25 22:03:21 +02:00
Cyril Vallez	d6e9f71a6e	Fix tied weight test (#39680 ) Update test_modeling_common.py	2025-07-25 20:09:33 +02:00
Raushan Turganbay	c392d47c9b	[attention] fix test for packed padfree masking (#39582 ) * fix most tests * skip a few more tests * address comments * fix chameleon tests * forgot to uncomment * qwen has its own tests with images, rename it as well	2025-07-25 07:44:52 +00:00
Pablo Montalvo	ea56eb6bed	Fix important models CI (#39576 ) * relax test boundaries and fix from config * eager is always supported.	2025-07-23 16:24:29 +02:00
Raushan Turganbay	eb1a007f7f	Rename `supports_static_cache` to `can_compile_fullgraph` (#39505 ) * update all * Apply suggestions from code review Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * apply suggestions * fix copies --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-07-23 09:35:18 +00:00
Arthur	efceeaf267	Kernels flash attn (#39474 ) * use partial to wrap around `transformers` utils! * try to refactor? * revert one wrong change * just a nit * push * reverter watever was wrong! * some nits * fixes when there is no attention mask * bring the licence back * some fixes * nit * style * remove prints * correct dtype * fa flags for testing * update * use paged attention if requested! * updates * a clone was needed, not sure why * automatically create cu seq lens when input is flash, this at least makes sure layers don't re-compute * simplify and improve? * flash attention is kinda broken on recent cuda version so allow the opportunity to use something else * fix! * protect kernels import * update * properly parse generation config being passed * revert and update * add two tests * some fixes * fix test FA2 * takes comment into account * fixup * revert changes * revert the clone, it is only needed because the metal kernel is not doing it? * [docs] update attention implementation and cache docs (#39547) * update docs * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * applu suggestions --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix mps on our side for now * Update src/transformers/integrations/flash_paged.py * no qa --------- Co-authored-by: Vasqu <antonprogamer@gmail.com> Co-authored-by: Raushan Turganbay <raushan@huggingface.co> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-22 15:41:06 +02:00
Anton Vlasjuk	b4115a426e	[`Ernie 4.5`] Add ernie text models (#39228 ) Some checks failed Release - Conda / build_and_package (push) Has been cancelled Details Secret Leaks / trufflehog (push) Has been cancelled Details * init * copied from remote * add proper structure and llama like structure * fixup * revert to state that works * get closer to llama * slow and steady * some removal * masks work * it is indeed the rope implementation, how dafuq does it mesh with the cache now hmm * nice * getting closer * closer to transformers style * let's simplify this, batching works now * simplified * working version with modular * it is indeed the rotation per weights, make it complete llama style * cleanup conversion, next to look at -> tokenizer * remove llama artefacts * fix modeling tests (common ones) * style * integration test + first look into tokenization (will need more work, focussing on modeling other models first) * style * working moe version, based on remote * lets keep it simple and go step by step - transformers annotations for modular and transformers style rope (complex view) * more cleanup * refactor namings and remove addition forXXX classes * our moe won't cut it it seems, correction bias seems to be missing in remote code version * tokenization change (remote) * our moe version works when adding normalization :D * cleanup moe * nits * cleanup modeling -> let's get to modular next * style * modular v1 * minor things + attempt at conversion (which doesn't work) * no conversion follow glm, fixup modular and other nits * modular cleanup * fixes * tests, tests, tests + some moe dtype forcing * simplify modular, fix fatal fa2 bug, remaining tests * fix import issue? * some initial docs, fix bnb faulty behavior --> needs to fix some tests because of gate needing to be float * fix sdpa test, load on init dtype only * fixup post merge * style * fix doc links * tokenization cleanup beginnings * simplify tokenizer by a lot as its basically llama * tokenizer is full llama with different defaults + extra special tokens * sync og special tokens of ernie * fix decoding with numbers (also in remote done what a timing), begin of tok tests * align with remote and preserve special tokens, adjust tests to ernie legacy behavior, warning for questionable behavior (also in llama) * nits * docs * my daily post merge it is * check * tokenization update with explanations and conversion script * review on modular (til), revert some tokenizer things i did prior, remove mtp comment (low prio) * post merge fixes * fixup tokenization, llama fast is the way to go * more fixups * check * import fixes * correction bias following the paddle code * fix * fix TP plan, fix correction bias sharding during forward * style * whoops * fix tied weights * docs and last nit * license * flasky tests * move repo id, update when merged on the hub	2025-07-21 19:51:49 +02:00
BUI Van Tuan	6b3a1f2f51	Fix missing initializations for models created in 2023 (#39239 ) * fix SwiftFormer * fix Kosmos2 * fix Owlv2 * fix Sam * fix Vits * fix Pvt * fix MobileViTV2 * fix PatchTST * fix Bros * fix Informer * fix BridgeTower * fix Mra and Yoso * fix Rwkv * fix EfficientNet * fix NllbMoe * fix Tvp * fix Clap * fix Autoformer * fix SwiftFormer * fix Mgpstr * fix Align * fix VitMatte * fix SpeechT5 * add conditional check for parameters * fix SpeechT5 * fix TimmBackbone and Clvp * fix SwiftFormer * fix SeamlessM4T and SeamlessM4Tv2 * fix Align * fix Owlv2 and OwlViT * add reviewed changes * add reviewed changes * fix typo --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>	2025-07-21 14:43:52 +02:00
Raushan Turganbay	8c102e2eb1	Rename `_supports_flash_attn_2` in examples and tests (#39471 ) * delete `_supports_flash_attn_2` from examples and tests * simplify docs	2025-07-21 14:02:57 +02:00
Cyril Vallez	3a152e3a5c	Fix the check in flex test (#39548 ) * fix the check * fix flags * flags	2025-07-21 13:29:44 +02:00
Raushan Turganbay	344012b3a6	[qwen2 vl] fix packing with all attentions (#39447 ) * fix qwen2 vl packing in FA2 * why? delete! * qwen2-5-vl seems to work now * update * fix tests * start by adapting FA2 tests * add similar tests for sdpa/eager * address comments * why is this even in conditional model and not base model?	2025-07-21 12:19:15 +02:00
Cyril Vallez	4ded9a4113	🚨🚨 Fix and simplify attention implementation dispatch and subconfigs handling (#39423 ) * first try * Update modeling_utils.py * Update modeling_utils.py * big refactor * Update modeling_utils.py * style * docstrings and simplify inner workings of configs * remove all trace of _internal * Update modeling_utils.py * fix logic error * Update modeling_utils.py * recursive on config * Update configuration_utils.py * fix * Update configuration_dpt.py * Update configuration_utils.py * Update configuration_utils.py * Update modeling_idefics.py * Update modeling_utils.py * fix for old models * more old models fixup * Update modeling_utils.py * Update configuration_utils.py * Remove outdated test * remove the deepcopy!! 🥵🥵 * Update test_modeling_gpt_bigcode.py * fix qwen dispatch * restrict to only models supporting it * style * switch name * Update modeling_utils.py * Update modeling_utils.py * add tests! * fix * rypo * remove bad copies * fix * Update modeling_utils.py * additional check * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * fix * skip	2025-07-18 13:41:54 +02:00
Yuanyuan Chen	60b5471da3	Enable some ruff checks for performance and readability (#39383 ) * Fix inefficient sequence tests Signed-off-by: cyy <cyyever@outlook.com> * Enable PERF102 Signed-off-by: cyy <cyyever@outlook.com> * Enable PLC1802 Signed-off-by: cyy <cyyever@outlook.com> * Enable PLC0208 Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-07-17 13:21:59 +00:00
Raushan Turganbay	c8524aeb07	[cache] make all classes cache compatible finally (#38635 ) * dump * push other models * fix simple greedy generation * xmod * add fmst and clean up some mentions of old cache format * gpt-bigcode now follows standards * delete tuple cache reference in generation * fix some models * fix some models * fix mambas and support cache in tapas * fix some more tests * fix copies * delete `_reorder_cache` * another fix copies * fix typos and delete unnecessary test * fix rag generate, needs special cache reordering * fix tapas and superglue * reformer create special cache * recurrent gemma `reorder_cache` was a no-op, delete * fix-copies * fix blio and musicgen pipeline tests * fix reformer * fix reformer, again... * delete `_supports_cache_class` * delete `supports_quantized_cache` * fix failing tests * fix copies * some minor clean up * style * style * fix copies * fix tests * fix copies * create causal mask now needs positions? * fixc copies * style * Update tests/test_modeling_common.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * clean-up of non-generative model after merging main * check `is_decoder` for cache * delete transpose for scores * remove tuple cache from docs everywhere * fix tests * fix copies * fix copies once more * properly deprecate `encoder_attention_mask` in Bert-like models * import `deprecate_kwarg` where needed * fix copies again * fix copies * delete `nex_decoder_cache` * fix copies asks to update for PLM * fix copies * rebasing had a few new models, fix them and merge asap! * fix copies once more * fix slow tests * fix tests and updare PLM checkpoint * add read token and revert accidentally removed line * oh com -on, style * just skip it, read token has no access to PLM yet --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-07-16 14:00:17 +02:00
Orion Weller	0e4b7938d0	Add ModernBERT Decoder Models - ModernBERT, but trained with CLM! (#38967 ) Some checks failed Release - Conda / build_and_package (push) Has been cancelled Details Secret Leaks / trufflehog (push) Has been cancelled Details * working locally; need to style and test * added docs and initial tests; need to debug and flesh out * fixed tests * working long context; batches * working fa2 and eager * update tests * add missing confnigs * remove default autoset * fix spacing * fix most tests * fixed tests * fix to init * refactor to match new transformers updates * remove static cache option * fa2 fix * fix docs * in progress * working on tests * fixed issue with attn outputs * remove debug * fix local config attr * update doc string * fix docstring * add docs to toc * correct typo in toc * add new updates from main w.r.t. ModernBERT RoPE * fix local param --------- Co-authored-by: oweller2 <oweller2@dsailogin.mgmt.ai.cluster> Co-authored-by: oweller2 <oweller2@l07.mgmt.ai.cluster> Co-authored-by: oweller2 <oweller2@n02.mgmt.ai.cluster> Co-authored-by: oweller2 <oweller2@l08.mgmt.ai.cluster> Co-authored-by: oweller2 <oweller2@l01.mgmt.ai.cluster> Co-authored-by: oweller2 <oweller2@l02.mgmt.ai.cluster>	2025-07-15 10:40:41 +02:00
Raushan Turganbay	8d6259b0b8	[refactor] set attention implementation (#38974 ) * update * fix some tests * init from config, changes it in-place, add deepcopy in tests * fix modernbert * don't delete thsi config attr * update * style and copies * skip tests in generation * fix style * accidentally removed flash-attn-3, revert * docs * forgot about flags set to False * fix copies * address a few comments * fix copies * custom code BC	2025-07-15 09:34:06 +02:00
Cyril Vallez	8165c703ab	Fix Lfm2 and common tests (#39398 ) * fix * better fix * typo	2025-07-14 12:02:59 +02:00
Yih-Dar	4798c05c64	skip `test_torchscript_*` for now until the majority of the community ask for it (#39307 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-09 15:35:48 +02:00
Rémi Ouazan	a325409a50	Expectations re-order and corrected FA3 skip (#39195 ) * Fix Expectations and a FA3 skip * Fixed docstring * Added context for Default expectation	2025-07-07 11:42:33 +02:00
Arthur	ca7e1a3756	Refactor the way we handle outputs for new llamas and new models (#39120 ) * just update 2 files * update other models as well just making fix-copies * also add the changes needed to modeling utils * put this on the pretrained model instead * nits and fixes * update generic, fix to use config value * update other modelings * use transformers kwargs instead * update * update * update other models * update * updates * update * update * update * fix * finally * very small nits * this fixes more tests * fix other models as well! * update modularqwen2 * update models based on qwen2 * update * update * remove the *flash stuff in favor of noraml kwargs update * propagate gemma? * remove output attentions * propagate * support cross attention edge case * same * test this * fixes * more fix * update * update * fix conflicts * update * fix emu3 * fix emu3 * move the fix a bit * quel enfer * some fixes, loss_kwargs should never had been * finish fixing gemma3n * fix small lm3 * fix another one * fix csm now * fux csm and mistral * fix mistral now * small fixes * fix janusss * only for some models * fixup * phix phi3 * more fixes? * dose this fix it? * update * holy shit it was just graph breaks * protect torch * updates * fix samhq? * fix moonshine * more moonshine fixes, 3 failures left! * nits * generic needs to support more * more fixes to moonshine! * fix cross attention outputs! * fix csm! * nits * fix stupid kosmos2 * current updates * fixes * use output recorder? * nicer! * a little bit of magic * update * fix protect * fix * small fixes * protect import * fix a bunch of more models * fix fixups * fix some of the last ones * nit * partly fix phi * update * fix import path * make something that is fullgraph compatible just to be sure * typing was wrong on llama so the rest was wrong as well * fucking ugly but at least it is still exportable * syle * supposed to fix moonshine, it still breaks * fix some default * fix the last bits of sam * update samhq * more fixes to am hq * nit * fix all output+hidden states and output_attentions! * fix? * fix diffllama * updates to fix initialization on the sam pips * ups there was a bug * fix the last sam hq test * fix gotocr * fix gotocr2! * fixes * skip stupid tests * there was one left :) * fixup * fix fix copies issues with this test file * fix copies for sam_hq * rm some comments * skip 2 more failing tests * fix * fix everything * Apply suggestions from code review Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * add more doc! * fix public init * fix modular qwen3 --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>	2025-07-05 11:34:28 +02:00
BUI Van Tuan	e355c0a11c	Fix missing initializations for models created in 2024 (#38987 ) * fix GroundingDino * fix SuperGlue * fix GroundingDino * fix MambaModel * fix OmDetTurbo * fix SegGpt * fix Qwen2Audio * fix Mamba2 * fix DabDetr * fix Dac * fix FalconMamba * skip timm initialization * fix Encodec and MusicgenMelody * fix Musicgen * skip timm initialization test * fix OmDetTurbo * clean the code Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * add reviewed changes * add back timm * style * better check for parametrizations --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-02 15:03:57 +02:00
Raushan Turganbay	f8b88866f5	[VLMs] support passing embeds along with pixels (#38467 ) * VLMs can work with embeds now * update more models * fix tests * fix copies * fixup * fix * style * unskip tests * fix copies * fix tests * style * omni modality models * qwen models had extra indentation * fix some other tests * fix copies * fix test last time * unrelated changes revert * we can't rely only on embeds * delete file * de-flake mistral3 * fix qwen models * fix style * fix tests * fix copies * deflake the test * modular reverted by fixes, fix again * flaky test, overwritten * fix copies * style	2025-07-01 11:33:20 +00:00
Cyril Vallez	dbc98328da	Several fixes for Gemma3n (#39135 ) * remove the skips * fix the epsilon to a small value (does not make sense otherwise) * safeguard * overload test_eager_matches_sdpa * Update test_modeling_common.py * skip appropriate tests * correct no_split_layer * fix all devices issue * fix backward * fix	2025-07-01 10:34:53 +02:00
Yih-Dar	ccf2ca162e	skip some `test_sdpa_can_dispatch_on_flash` (#39092 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-27 23:08:14 +02:00
Yih-Dar	2f50230c59	fix `t5gemma` tests (#39052 ) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-26 18:48:14 +02:00
Joao Gante	3e5cc12855	[tests] remove tests from libraries with deprecated support (flax, tensorflow_text, ...) (#39051 ) * rm tf/flax tests * more flax deletions * revert fixture change * reverted test that should not be deleted; rm tf/flax test * revert * fix a few add-model-like tests * fix add-model-like checkpoint source * a few more * test_get_model_files_only_pt fix * fix test_retrieve_info_for_model_with_xxx * fix test_retrieve_model_classes * relative paths are the devil * add todo	2025-06-26 16:25:00 +01:00
Jaeyong Sung	583db52bc6	Add Dia model (#38405 ) * add dia model * add tokenizer files * cleanup some stuff * brut copy paste code * rough cleanup of the modeling code * nuke some stuff * more nuking * more cleanups * updates * add mulitLayerEmbedding vectorization * nits * more modeling simplifications * updates * update rope * update rope * just fixup * update configuration files * more cleanup! * default config values * update * forgotten comma * another comma! * update, more cleanups * just more nits * more config cleanups * time for the encoder * fix * sa=mall nit * nits * n * refacto a bit * cleanup * update cv scipt * fix last issues * fix last nits * styling * small fixes * just run 1 generation * fixes * nits * fix conversion * fix * more fixes * full generate * ouf! * fixes! * updates * fix * fix cvrt * fixup * nits * delete wrong test * update * update * test tokenization * let's start changing things bit by bit - fix encoder step * removing custom generation, moving to GenerationMixin * add encoder decoder attention masks for generation * mask changes, correctness checked against ad29837 in dia repo * refactor a bit already --> next cache * too important not to push :) * minimal cleanup + more todos * make main overwrite modeling utils * add cfg filter & eos filter * add eos countdown & delay pattern * update eos countdown * add max step eos countdown * fix tests * fix some things * fix generation with testing * move cfg & eos stuff to logits processor * make RepetitionPenaltyLogitsProcessor flexible - can accept 3D scores like (batch_size, channel, vocab) * fix input_ids concatenation dimension in GenerationMixin for flexibility * Add DiaHangoverLogitsProcessor and DiaExponentialDecayLengthPenalty classes; refactor logits processing in DiaForConditionalGeneration to utilize new configurations and improve flexibility. * Add stopping criteria * refactor * move delay pattern from processor to modeling like musicgen. - add docs - change eos countdown to eos delay pattern * fix processor & fix tests * refactor types * refactor imports * format code * fix docstring to pass ci * add docstring to DiaConfig & add DiaModel to test * fix docstring * add docstring * fix some bugs * check * porting / merging results from other branch - IMPORTANT: it very likely breaks generation, the goal is to have a proper forward path first * experimental testing of left padding for first channel * whoops * Fix merge to make generation work * fix cfg filter * add position ids * add todos, break things * revert changes to generation --> we will force 2d but go 3d on custom stuff * refactor a lot, change prepare decoder ids to work with left padding (needs testing), add todos * some first fixes to get to 10. in generation * some more generation fixes / adjustment * style + rope fixes * move cfg out, simplify a few things, more todos * nit * start working on custom logit processors * nit * quick fixes * cfg top k * more refactor of logits processing, needs a decision if gen config gets the new attributes or if we move it to config or similar * lets keep changes to core code minimal, only eos scaling is questionable atm * simpler eos delay logits processor * that was for debugging :D * proof of concept rope * small fix on device mismatch * cfg fixes + delay logits max len * transformers rope * modular dia * more cleanup * keep modeling consistently 3D, generate handles 2D internally * decoder starts with bos if nothing * post processing prototype * style * lol * force sample / greedy + fixes on padding * style * fixup tokenization * nits * revert * start working on dia tests * fix a lot of tests * more test fixes * nit * more test fixes + some features to simplify code more * more cleanup * forgot that one * autodocs * small consistency fixes * fix regression * small fixes * dia feature extraction * docs * wip processor * fix processor order * processing goes brrr * transpose before * small fix * fix major bug but needs now a closer look into the custom processors esp cfg * small thing on logits * nits * simplify indices and shifts * add simpler version of padding tests back (temporarily) * add logit processor tests * starting tests on processor * fix mask application during generation * some fixes on the weights conversion * style + fixup logits order * simplify conversion * nit * remove padding tests * nits on modeling * hmm * fix tests * trigger * probably gonna be reverted, just a quick design around audio tokenizer * fixup typing * post merge + more typing * initial design for audio tokenizer * more design changes * nit * more processor tests and style related things * add to init * protect import * not sure why tbh * add another protect * more fixes * wow * it aint stopping :D * another missed type issue * ... * change design around audio tokenizer to prioritize init and go for auto - in regards to the review * change to new causal mask function + docstrings * change ternary * docs * remove todo, i dont think its essential tbh * remove pipeline as current pipelines do not fit in the current scheme, same as csm * closer to wrapping up the processor * text to audio, just for demo purposes (will likely be reverted) * check if it's this * save audio function * ensure no grad * fixes on prefixed audio, hop length is used via preprocess dac, device fixes * integration tests (tested locally on a100) + some processor utils / fixes * style * nits * another round of smaller things * docs + some fixes (generate one might be big) * msytery solved * small fix on conversion * add abstract audio tokenizer, change init check to abstract class * nits * update docs + fix some processing :D * change inheritance scheme for audio tokenizer * delete dead / unnecessary code in copied generate loop * last nits on new pipeline behavior (+ todo on tests) + style * trigger --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Vasqu <antonprogamer@gmail.com>	2025-06-26 11:04:23 +00:00
EduardDurech	a2eb75c891	Support for Flash Attention 3 (#38972 ) * Support `flash_attn_3` Implements fwd and tests for Flash Attention 3 https://github.com/Dao-AILab/flash-attention/commits/main/hopper - Includes checks for dropout>0 and ALiBi in `modeling_utils.PreTrainedModel._check_and_enable_flash_attn_3` (Dropout will likely be supported soon, so this will need to be updated and `modeling_flash_attention_utils._flash_attention_forward` at the `if _IS_FLASH_ATTN_3_AVAILABLE: ...` An example Llama implementation is included in `modeling_llama.py` but other models would still need to be updated Based on https://github.com/huggingface/transformers/pull/36190 which has model implementations and examples which could be merged * Add tests for Flash Attention 2 and 3 parity * ci fix * FA2 compatibiity - `_prepare_flash_attention_from_position_ids` ->`prepare_fa2_from_position_ids` - Remove bettertransformer check in Flash Attention 3 - Merge tests - Add licensing * ci fix * Test naming consistency * ci fix * Deprecation warning for `prepare_fa2_from_position_ids` * ci fix	2025-06-25 14:39:27 +02:00
efsotr	3ee72af6b6	Fix graph break in torch.compile when using FA2 with attention_mask=None and batch size > 1 (#37332 ) * Fix graph break in torch.compile when using FA2 with attention_mask=None and batch size > 1 * fix code format * add test; replace position_ids with query_states becasue position_ids.shape[0] is always 1 * add assert loss is not nan	2025-06-25 07:58:34 +00:00
eustlb	6bdd4ec952	Add kyutai stt (#38909 ) Some checks failed Release - Conda / build_and_package (push) Has been cancelled Details Secret Leaks / trufflehog (push) Has been cancelled Details * first draft * cleaner version * udpate tests + modeling * add tests * init * udpate test_modeling_common * fix tests * csm Processor draft * convertion update * mimi cache padding convolutions draft * mimi streaming udpates * update mimi padding cache test * udpate cache padding mimi test * make style mimi * updates generate moshi asr * moshi asr integration tests (single + batched) * update tests * update conversion script * good default sliding window value * udpdate generate * update test checkpoint * nit * fix mimi * fix codec prefix * revert * revert * update config * update config * unnecessary mimi input restriction * remove delay in tokens * remove _prepare_4d_causal_attention_mask_with_cache_position and _update_causal_mask * test update * modular update * make style * nit * rename * create codec model generation config at init * remove delay * max_new_tokens/length warning * correct conv1 padding cache import for modular * nit * fix on encoder_past_key_values * convert modular * move frame_size to config * move frame_size to config * update test name * handle first token is bos * better handling of max_new_tokens * fix * fix batch size in test input prep * update docstring * convert modular * make style * make style * add feature extractor * correct modular convention name for feature_extraction file * update convertion script * doc processor * update doc * udpate init * update model type * fixes * update tests * fix * make * add doc * nit * fix * doc * auto mappings * doc * nit * convert modular * doc * nit * extend _keep_in_fp32_modules to enforce fp32 * renaming to stt * doc update + test update * doc fixes * doc fix * doc fix * fix musicgen tests * fix musicgen tests * make style * fix musicgen tests * correct frame_rate config param for mimi * update mimi test * revert update mimi test * enforce cpu test * move cache init in cache class * convert modular * docstring update * update model id * feature_extractor -> feature_extraction (SEW) * convert modular * update model id	2025-06-24 18:01:15 +02:00
Rémi Ouazan	1a96127e46	Break tie in Expectations and gemma3 fixes (#38943 ) * Added major / minor version to Expectations ordering * Added fixes to gemma3 * Style	2025-06-23 15:13:27 +02:00
Ilyas Moutawwakil	984ff89e73	Gaudi3 CI (#38790 )	2025-06-23 10:56:51 +02:00
Rémi Ouazan	9bcdd5cde9	Modernbert fixes (#38912 ) * Removed deprecated argument in modernbert RotaryEmbedding * Skip test_sdpa_can_dispatch_on_flash for modernbert --------- Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-06-20 11:22:32 +02:00
ivarflakstad	af6120b3eb	Skip sdpa tests if submodule does not support sdpa (#38907 )	2025-06-19 13:11:01 +00:00
Rémi Ouazan	9ff246db00	Expectation fixes and added AMD expectations (#38729 )	2025-06-13 16:14:58 +02:00
Cyril Vallez	4b8ec667e9	Remove all traces of `low_cpu_mem_usage` (#38792 ) * remove it from all py files * remove it from the doc * remove it from examples * style * remove traces of _fast_init * Update test_peft_integration.py * CIs	2025-06-12 16:39:33 +02:00
Yih-Dar	89c46b648d	Skip some export tests on torch 2.7 (#38677 ) * skip * fix * better check * Update import_utils.py --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-06-12 12:47:15 +02:00
Marc Sun	11ad9be153	Better typing for num_items_in_batch (#38728 ) * fix * style * type checking ? * maybe this ? * fix * can't be an int anymore * fix	2025-06-11 16:26:41 +02:00
Pavel Iakubovskii	84710a4291	Add V-JEPA 2 (#38746 ) Some checks failed Release - Conda / build_and_package (push) Has been cancelled Details Secret Leaks / trufflehog (push) Has been cancelled Details * adding model and conversion scripts * add imports to test vjepa conversion * fix imports and make conversion work * fix computation for short side * replace attention with library attention function * cleanup more attention classes * remove config overrides * add test cases, fix some of the failing ones * fix the model outputs * fix outputs of the model per review * fix too big model test case * fix styling __init__.py * fix initialization test * remove all asserts per review * update sorting unsorting logic as per feedback * remove is_video per review * remove another is_video segment * remove unwanted stuff * small fixes * add docstrings for the model * revert adding vjepa2 config here * update styling * add config docstrings (wip) * fix dpr issue * removed test failing issues * update styles * merge predictor configs into main config * remove processing code, add video processor * remove permute which is not necessary now * fix styles * updated vjepa2 to be in video_processing_auto * update comment for preprocessing * test integration test and fix the outputs * update test values, change test to look at repeated frames for a given image * add a simple video processing test * refactoring pixel_values_videos and upload ckpts to original * fix torch_fx test cases * remove unused config * add all config docstrings * add more integration tests * add basic doc * revert unwanted styling changes * working make fixup * Fix model_type in config * update attention implementation to fit new hf standards * fix the preprocessing logic, ensure it matches the original model * remove use_rope logic, cleanup * fix docstrings * Further cleanup, update doc * Fix model prefix * fix get_vision_features * VJEPA2Embeddings style refactor * nit, style comment * change modules default values * Only `str` activation in config * GradientCheckpointingLayer * fixup * fix conversion script * Remove return_dict * remove None return typehint * Refactor VJEPA2Layer, remove use_SiLU * Fix fx tests * dpr -> drop_path_rates * move ModelOutput on top format docs bit * update docs * update docs * update doc example * remove prune_heads from model * remove unused config params * refactor embed signature * Add vjepa to docs * Fix config docstring * update defaults * Update docs/source/en/model_doc/vjepa2.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/vjepa2.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix import * Min refactoring * Update HUB_SOURCE and HUB_REPO in conversion script * Add missing headers * VJEPA -> V-JEPA in docs * Add image to doc * fix style * fix init weights * change checkpoint name in modeling tests --------- Co-authored-by: Koustuv Sinha <koustuv.sinha@mail.mcgill.ca> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: Koustuv Sinha <koustuvsinha@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>	2025-06-11 15:00:08 +01:00
Yih-Dar	3e35ea1782	Improve `test_initialization` (#38607 ) * fix flaky init tests * fix flaky init tests --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-06 10:08:05 +02:00
Armaghan Shakir	55736eea99	Add support for MiniMax's MiniMax-Text-01 (#35831 ) * end-to-end architecture * lightning-attn: refactor, clean, optimize * put minimax_text_01 in other files * use latest __init__ standards and auto-generate modular * support attention_mask for lightning-attn * Revert "use latest __init__ standards and auto-generate modular" This reverts commit d8d3c409d89e335c98a8cd36f47304a76eac7493. * fix modular conversion * pass both attention masks instead of tuple * formatting * Updated Dynamic Cache * created MiniMaxText01Cache * fix hardcoded slope_rate * update attn_type_list in config * fix lightning when use_cache=False * copy tests from mixtral * (checkpoint) all tests pass for normal attention * fix all unittests * fix import sorting * fix consistency and formatting tests * fix config * update tests, since changes in main * fix seq_len error * create dummy docs * fix checkpoint * add checkpoint in config docstring * run modular_conversion * update docs * fix checkpoint path and update tests * fix ruff * remove repeated expected_slice * update docs * rename "minimax-text-01" to "minimax" * inherit config from mixtral * remove from docs in other languages * undo files that should be untouched * move minimax to end in conversation docs * use MiniMaxForCausalLM as it is * ruff fixes * run modular * fix docstring example in causallm * refactor attention loop and decay factors * refactor config in modular * run modular * refactor cache * rename static_cache to linear_cache * make positional embeddings necessary * remove unnecessary layernorms declarations * fix import in tests * refactor attention in next tokens * remove outdated code * formatting and modular * update tests * rename layernorm alpha/beta factors * register decay factors as buffers * remove unused declarations of decay factors * update config for alpha/beta factors * run modular * remove head_dim in tests * remove minimax from fx.py * remove stuff that is not really needed * update __init__ * update qkv torch.split Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * fix qkv torch.split * quality fixes * remove mistakenly added dummy * purge unused ModelTester code * fix-copies * run fix-copies * fix head_dim * write cache formatting tests * remove postnorm * avoid contiguous in attention current states * update expected_slice * add generation test for integration * fix dtype in generation test * update authors * update with changes in main * update graident checkpointing and minor fixes * fix mutable attn_type_list * rename: attn_type -> layer_type * update for layer_types * update integration tests * update checkpoint * clean overview in docs --------- Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-06-04 09:38:40 +02:00
Raushan Turganbay	55ec319de6	Don't use default attn if pre-set in sub-config (#38526 ) * don't use default attn if pre-set in sib-config * style * add a test maybe	2025-06-03 07:53:07 +00:00
Raushan Turganbay	bf68dd9e6e	[tests] expand flex-attn test for vision models (#38434 ) * expand the test for VLMs * typo * mark models `supports_flex` + expand test for additional kwargs * flex attn for refactored vision models * fix copies * fix * unskip * style * address comments	2025-06-03 07:40:44 +00:00
Yaswanth Gali	1094dd34f7	Remove type annotation in Siglip Attention Module (#38503 ) * Remove type annotation * remove print statement	2025-06-02 17:51:07 +02:00
Anton Vlasjuk	badc71b9f6	🔴[`Attention`] Attention refactor for Whisper-based models (#38235 ) * start refactoring whisper * revert for now * first step * carry over attn fixes * check if this works * whisper has an off by one somewhere - cutting mask in any interface * make it based on interface * remove some tests that were skipped but now work * some fixes for whisper tests * interface changes * change the order of fix * some attention adjustments for eager + TP * fix scaling * mask changes * why does whisper contain those extra seq lens? * fix from config for fa2 as input_ids is invalid * fix another test * another fix * disable flex attn due to compile issues * copies and refactor for qwen audio since it somewhat relies on whisper * fix scaling and smaller things * retrigger * new new interface version + more fixups * adjust qwen * add comment * forgot this one * change copies as whisper cuts on the mask * add guard * add flex attention * switch to new mask function + add skips for torchscript * remove old api with cache position * last changes? * trigger ci	2025-05-28 13:32:38 +02:00

1 2 3 4 5 ...

498 Commits