* just update 2 files
* update other models as well just making fix-copies
* also add the changes needed to modeling utils
* put this on the pretrained model instead
* nits and fixes
* update generic, fix to use config value
* update other modelings
* use transformers kwargs instead
* update
* update
* update other models
* update
* updates
* update
* update
* update
* fix
* finally
* very small nits
* this fixes more tests
* fix other models as well!
* update modularqwen2
* update models based on qwen2
* update
* update
* remove the **flash stuff in favor of noraml kwargs
* update
* propagate gemma?
* remove output attentions
* propagate
* support cross attention edge case
* same
* test this
* fixes
* more fix
* update
* update
* fix conflicts
* update
* fix emu3
* fix emu3
* move the fix a bit
* quel enfer
* some fixes, loss_kwargs should never had been
* finish fixing gemma3n
* fix small lm3
* fix another one
* fix csm now
* fux csm and mistral
* fix mistral now
* small fixes
* fix janusss
* only for some models
* fixup
* phix phi3
* more fixes?
* dose this fix it?
* update
* holy shit it was just graph breaks
* protect torch
* updates
* fix samhq?
* fix moonshine
* more moonshine fixes, 3 failures left!
* nits
* generic needs to support more
* more fixes to moonshine!
* fix cross attention outputs!
* fix csm!
* nits
* fix stupid kosmos2
* current updates
* fixes
* use output recorder?
* nicer!
* a little bit of magic
* update
* fix protect
* fix
* small fixes
* protect import
* fix a bunch of more models
* fix fixups
* fix some of the last ones
* nit
* partly fix phi
* update
* fix import path
* make something that is fullgraph compatible just to be sure
* typing was wrong on llama so the rest was wrong as well
* fucking ugly but at least it is still exportable
* syle
* supposed to fix moonshine, it still breaks
* fix some default
* fix the last bits of sam
* update samhq
* more fixes to am hq
* nit
* fix all output+hidden states and output_attentions!
* fix?
* fix diffllama
* updates to fix initialization on the sam pips
* ups there was a bug
* fix the last sam hq test
* fix gotocr
* fix gotocr2!
* fixes
* skip stupid tests
* there was one left :)
* fixup
* fix fix copies issues with this test file
* fix copies for sam_hq
* rm some comments
* skip 2 more failing tests
* fix
* fix everything
* Apply suggestions from code review
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* add more doc!
* fix public init
* fix modular qwen3
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* remove trust_remote_code
* again
* Revert "Skip some tests for now (#38931)"
This reverts commit 31d30b7224.
* again
* style
* again
* again
* style
* fix integration test
* fix tests
* style
* fix
* fix
* fix the last ones
* style
* last one
* fix last
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* No more Tuple, List, Dict
* make fixup
* More style fixes
* Docstring fixes with regex replacement
* Trigger tests
* Redo fixes after rebase
* Fix copies
* [test all]
* update
* [test all]
* update
* [test all]
* make style after rebase
* Patch the hf_argparser test
* Patch the hf_argparser test
* style fixes
* style fixes
* style fixes
* Fix docstrings in Cohere test
* [test all]
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* remove it from all py files
* remove it from the doc
* remove it from examples
* style
* remove traces of _fast_init
* Update test_peft_integration.py
* CIs
* stash for now
* initial commit
* small updated
* up
* up
* works!
* nits and fixes
* don't loop too much
* finish working example
* update
* fix the small freeblocks issue
* feat: stream inputs to continuous batch
* fix: update attn from `eager` to `sdpa`
* refactor: fmt
* refactor: cleanup unnecessary code
* feat: add `update` fn to `PagedAttentionCache`
* feat: broken optimal block size computation
* fix: debugging invalid cache logic
* fix: attention mask
* refactor: use custom prompts for example
* feat: add streaming output
* fix: prefill split
refactor: add doc strings and unsound/redundant logic
fix: compute optimal blocks logic
* fix: send decoded tokens when `prefilling_split` -> `decoding`
* refactor: move logic to appropriate parent class
* fix: remove truncation as we split prefilling anyways
refactor: early return when we have enough selected requests
* feat: add paged attention forward
* push Ggraoh>
* add paged sdpa
* update
* btter mps defaults
* feat: add progress bar for `generate_batch`
* feat: add opentelemetry metrics (ttft + batch fill %age)
* feat: add tracing
* Add cuda graphs (#38059)
* draft cudagraphs addition
* nits
* styling
* update
* fix
* kinda draft of what it should look like
* fixes
* lol
* not sure why inf everywhere
* can generate but output is shit
* some fixes
* we should have a single device synch
* broken outputs but it does run
* refactor
* updates
* updates with some fixes
* fix mask causality
* another commit that casts after
* add error
* simplify example
* update
* updates
* revert llama changes
* fix merge conflicts
* fix: tracing and metrics
* my updates
* update script default values
* fix block allocation issue
* fix prefill split attnetion mask
* no bugs
* add paged eager
* fix
* update
* style
* feat: add pytorch traces
* fix
* fix
* refactor: remove pytorch profiler data
* style
* nits
* cleanup
* draft test file
* fix
* fix
* fix paged and graphs
* small renamings
* cleanups and push
* refactor: move tracing and metrics logic to utils
* refactor: trace more blocks of code
* nits
* nits
* update
* to profile or not to profile
* refactor: create new output object
* causal by default
* cleanup but generations are still off for IDK what reason
* simplifications but not running still
* this does work.
* small quality of life updates
* nits
* updaet
* fix the scheduler
* fix warning
* ol
* fully fixed
* nits
* different generation parameters
* nice
* just style
* feat: add cache memory usage
* feat: add kv cache free memory
* feat: add active/waiting count & req latency
* do the sampling
* fix: synchronize CUDA only if available and improve error handling in ContinuousBatchingManager
* fix on mps
* feat: add dashboard & histogram buckets
* perf: improve waiting reqs data structures
* attempt to compile, but we should only do it on mps AFAIK
* feat: decouple scheduling logic
* just a draft
* c;eanup and fixup
* optional
* style
* update
* update
* remove the draft documentation
* fix import as well
* update
* fix the test
* style doomed
---------
Co-authored-by: Luc Georges <luc.sydney.georges@gmail.com>
* accept custom device_mesh
* fix device_map
* assert that num_heads % tp_size == 0
* todo.
* ReplicateParallel
* handle tied weights
* handle dtensor in save_pretrained with safe_serialization
* tp test works
* doesnt work
* fix shard_and_distribute_module's rank should be local_rank
* tp=4 is correct
* dp+tp is broken
* todo allreduce with dtensors on another dim is annoying
* workaround to sync dp grads when using dtensors
* loading a checkpoint works
* wandb and compare losses with different tp/dp
* cleaning
* cleaning
* .
* .
* logs
* CP2 DP2 no mask works after commenting attn_mask and is_causal from scaled_dot_product_attention
* DP=2 TP=2 now works even with tied embeddings
* model.parameters() and model.module.parameters() are empty..
* reformat sanity_check_tensor_sync
* set atol=1e-4 for CP to pass
* try populate _parameters from named_modules
* refactors
TP2 DP2 works
CP2 DP2 works
* is_causal=True and pack sequences, no attn mask, and preshuffle dataset
* fix packing
* CP=4 doesn't work
* fix labels and position_ids for CP
* DP CP works with transformers 🥳🥳🥳
* refactor
* add example cp
* fixup
* revert sdpa changes
* example cleared
* add CP, DP to the mesh init
* nit
* clean
* use `ALL_PARALLEL_STYLES`
* style
* FSDP works
* log on 1 rank
* .
* fix?
* FSDP1 also has .parameters() bug
* reported gradnorm when using FSDP1 is wrong, but loss is correct so it's okay
* .
* style and fixup
* move stuff around
* fix tests
* style
* let's make it a check
* add missing licences
* warning should be an info
* tp plan should not be NONE
* test all
* god damn it
* test all
---------
Co-authored-by: nouamanetazi <nouamane98@gmail.com>
* accept custom device_mesh
* fix device_map
* assert that num_heads % tp_size == 0
* todo.
* ReplicateParallel
* handle tied weights
* handle dtensor in save_pretrained with safe_serialization
* tp test works
* doesnt work
* fix shard_and_distribute_module's rank should be local_rank
* tp=4 is correct
* dp+tp is broken
* todo allreduce with dtensors on another dim is annoying
* workaround to sync dp grads when using dtensors
* loading a checkpoint works
* wandb and compare losses with different tp/dp
* cleaning
* cleaning
* .
* .
* logs
* CP2 DP2 no mask works after commenting attn_mask and is_causal from scaled_dot_product_attention
* DP=2 TP=2 now works even with tied embeddings
* model.parameters() and model.module.parameters() are empty..
* reformat sanity_check_tensor_sync
* set atol=1e-4 for CP to pass
* try populate _parameters from named_modules
* refactors
TP2 DP2 works
CP2 DP2 works
* is_causal=True and pack sequences, no attn mask, and preshuffle dataset
* fix packing
* CP=4 doesn't work
* fix labels and position_ids for CP
* DP CP works with transformers 🥳🥳🥳
* refactor
* add example cp
* fixup
* revert sdpa changes
* example cleared
* add CP, DP to the mesh init
* nit
* clean
* use `ALL_PARALLEL_STYLES`
* style
* FSDP works
* log on 1 rank
* .
* fix?
* FSDP1 also has .parameters() bug
* reported gradnorm when using FSDP1 is wrong, but loss is correct so it's okay
* .
* style and fixup
* move stuff around
* fix tests
* style
* let's make it a check
* warning should be an info
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
* fix issue that some example with no trainer use accelerator.end_training in a wrong way
* reformat code
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Just import torch AdamW instead
* Update docs too
* Make AdamW undocumented
* make fixup
* Add a basic wrapper class
* Add it back to the docs
* Just remove AdamW entirely
* Remove some AdamW references
* Drop AdamW from the public init
* make fix-copies
* Cleanup some references
* make fixup
* Delete lots of transformers.AdamW references
* Remove extra references to adamw_hf