* fix qwen2 vl packing in FA2
* why? delete!
* qwen2-5-vl seems to work now
* update
* fix tests
* start by adapting FA2 tests
* add similar tests for sdpa/eager
* address comments
* why is this even in conditional model and not base model?
* fix type order
* change all Union[str, dict] to Union[dict, str]
* add hf_parser test && fix test order
* add deepspeed dependency
* replace deepspeed with accelerator
* Scaffolding
* Explicit content
* Naïve Responses API streaming implementation
* Cleanup
* Scaffolding
* Explicit content
* Naïve Responses API streaming implementation
* Cleanup
* use openai
* validate request, including detecting unused fields
* dict indexing
* dict var access
* tmp commit (tests failing)
* add slow
* use oai output type in completions
* (little rebase errors)
* working spec?
* guard type hint
* type hints. fix state (CB can now load different models)
* type hints; fn names; error type
* add docstrings
* responses + kv cache
* metadata support; fix kv cache; error event
* add output_index and content_index
* docstrings
* add test_build_response_event
* docs/comments
* gate test requirements; terminate cb manager on model switch
* nasty type hints
* more type hints
* disable validation by default; enable force models
* todo
* experiment: base model from typed dict
* audio working
* fix bad rebase
* load audio with librosa
* implement timed models
* almost working
* make fixup
* fix tests
* transcription request type
* tokenizer -> processor
* add example in docs
---------
Co-authored-by: Lysandre <hi@lysand.re>
* Add the `device` option for `generate()`
* Add device for default tensors to avoid tensor mismatch
* [test] Enable test_static_cache_exportability for torch_device
* infer device from the prompt_token_ids
* Add device for generated tensor
* [Test] Make `test_export_static_cache` tests to run on devices rather than only CPU
* fix format
* infer device from the model
* dump
* push other models
* fix simple greedy generation
* xmod
* add fmst and clean up some mentions of old cache format
* gpt-bigcode now follows standards
* delete tuple cache reference in generation
* fix some models
* fix some models
* fix mambas and support cache in tapas
* fix some more tests
* fix copies
* delete `_reorder_cache`
* another fix copies
* fix typos and delete unnecessary test
* fix rag generate, needs special cache reordering
* fix tapas and superglue
* reformer create special cache
* recurrent gemma `reorder_cache` was a no-op, delete
* fix-copies
* fix blio and musicgen pipeline tests
* fix reformer
* fix reformer, again...
* delete `_supports_cache_class`
* delete `supports_quantized_cache`
* fix failing tests
* fix copies
* some minor clean up
* style
* style
* fix copies
* fix tests
* fix copies
* create causal mask now needs positions?
* fixc copies
* style
* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* clean-up of non-generative model after merging main
* check `is_decoder` for cache
* delete transpose for scores
* remove tuple cache from docs everywhere
* fix tests
* fix copies
* fix copies once more
* properly deprecate `encoder_attention_mask` in Bert-like models
* import `deprecate_kwarg` where needed
* fix copies again
* fix copies
* delete `nex_decoder_cache`
* fix copies asks to update for PLM
* fix copies
* rebasing had a few new models, fix them and merge asap!
* fix copies once more
* fix slow tests
* fix tests and updare PLM checkpoint
* add read token and revert accidentally removed line
* oh com -on, style
* just skip it, read token has no access to PLM yet
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Added StableAdamW as an optimizer option for Trainer. Also wrote tests to verify its behaviour.
* Fixed issue with
* Added docs for StableAdamW. Also fixed a typo in schedule free optimizers
---------
Co-authored-by: Gautham Krithiwas <gauthamkrithiwas2003@gmail.com>
* add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in trainer
* Update src/transformers/optimization.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update optimization.py
fix the error of the unclosed "("
* Update optimization.py
remove whitespace in line 402 in order to pass the quality test
* Update src/transformers/optimization.py
* Update src/transformers/optimization.py
* Apply style fixes
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix vlm with retrieval
* we can't use AutoModel because new ColQwen was released after refactor
* no need for colqwen
* tied weight keys are necessary, if using IMageTextToText
* need to apply renaming in tied weights, only for ColPali
* overwrite tied keys in ColPali
* fix copies, modular can't handle if-statements
* working locally; need to style and test
* added docs and initial tests; need to debug and flesh out
* fixed tests
* working long context; batches
* working fa2 and eager
* update tests
* add missing confnigs
* remove default autoset
* fix spacing
* fix most tests
* fixed tests
* fix to init
* refactor to match new transformers updates
* remove static cache option
* fa2 fix
* fix docs
* in progress
* working on tests
* fixed issue with attn outputs
* remove debug
* fix local config attr
* update doc string
* fix docstring
* add docs to toc
* correct typo in toc
* add new updates from main w.r.t. ModernBERT RoPE
* fix local param
---------
Co-authored-by: oweller2 <oweller2@dsailogin.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l07.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@n02.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l08.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l01.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l02.mgmt.ai.cluster>
* plm template
* A working plm with fixed image features
* hacked processor
* First version that reproduced PLM output using PE from timm.
* Simplify and fix tie_word_embeddings
* Use PIL resize. Simplify converstion.
* First version that works with video input.
* simplifed image preprocessing (not batched)
* Minor fixes after rebasing on main.
* Video processor based on new API.
* Revert to use _preprocess for image processor.
* refactor with modular
* fix tie_word_embedding
* Testing with timm PE
* check in missed converstion from modular to model.py
* First working version of PLM with Eva PE. PLM-1B and 3B outputs are exactly the same as before. PLM-8B output has some differences.
* address review comments
* Fixed batching if video and image examples mixed.
* Simplify PE configuration.
* Enable AutoModel for PerceptionEncoder.
* Update PE config style.
* update all headers
* Minor fixes.
* Move lm_head to PerceptionLMForConditionalGeneration.
Fix vit_G model specification.
* Fix for testing_modeling_perception_lm.py
* Image processing refactoring to use more common parts.
* Fix processor test.
* update tests to use model from hub
* More test fixes.
* integration test GT update after rebasing; probably due to video preprocessing
* update test media path to hub
* Stop tracking local scripts
* address some review comments
* refactor image processing.
* small fixes
* update documentation and minor fixes
* remove scripts
* Minor fix for CI
* Fix image processing
* CI and doc fix
* CI formatting fix
* ruff fix
* ruff formatting
* ran utils/sort_auto_mappings.py
* update docstring
* more docstring udpates
* add vision_input_type default fallback for image processing
* more verbose variable naming
* test update
* Remove PE and PEConfig use AutoModel(TimmWrapper) instead
* Minor cleanup.
* Minor Fix: remove any ref to PE. Ruff format and check.
* fix docstring
* Fix modular/model consistency.Improvex docstringfor .
* Fix PerceptionLMForConditionalGenerationModelTest
* ruff fix
* fix for check_repo
* minor formatting
* dummy size arg to fix for processor test.
* Update docstring for PerceptionLMConfig
* Minor fixes from review feedback.
* Revert some minor changes per reviewer feedback.
* update base_model_prefix
* address reviewer feedback
* fix comment in modeling file
* address reviewer feedback
* ruff format
* Pre-merge test update.
* reapply modular and fix checkpoint name
* processor test path
* use modular a bit more
* remove dead code
* add token decorator
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* changes for video
* update modular
* change get_video_features
* update video token replacement
* update modular
* add test and fix typo
* lint
* fix order
* lint
* fix
* remove dependency
* lint
* lint
* remove todo
* resize video for test
* lint..
* fix test
* new a processor for video_test
* fix test
* add initial structure
* doc fixes, add model base logic
* update init files
* some fixes to config and modular
* some improvements for attention
* format
* remove unused attn
* some fixes for moe layer and for decoder
* adapt _compute_yarn_parameters for deepseek
* format
* small fix
* fix for decoder forward
* add tests, small refactoring
* fix dummies
* fix init
* fix doc
* fix config docs
* add sequce doc, fix init for gate
* fix issues in tests
* fix config doc
* remove unused args
* some fixes and refactoring after review
* fix doc for config
* small fixes for config args
* revert config refactoring
* small refactoring
* minor fixes after rebase
* small fix after merge
* fix modular
* remove rotaryembd from public init
* small test fix
* some rotary pos calculation improvement
* fix format
* some improvements and fixes
* fix config
* some refactoring
* adjust some unit tests
* skip test
* small fixes and tests adjustment
* reapply modular
* fix all tests except Integration
* fix integration testzs
* cleanup BC stuff
* rope
* fix integrations tests based on a10
* style
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>