* Add expectation classes + tests
* Use typing Union instead of |
* Use bits to track score in properties cmp method
* Add exceptions and tests + comments
* Remove compute cap minor as it is not needed currently
* Simplify. Remove Properties class
* Add example Exceptions usage
* Expectations as dict subclass
* Update example Exceptions usage
* Refactor. Improve type name. Document score fn.
* Rename to DeviceProperties.
Mistaken use of De Morgan's law. Fixed "not (X or Y)"
to correct "not (X and Y)" check to raise a ValueError.
Added corresponding test to check "positive int or None" condition.
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* fall back to eager if output_attentions
* improve relative position embeddings
* run modular on got_ocr2
* run-slow: sam
* fix run-length encoding
* fix tf processor errors
* update tf_sam
* fix compile error
* re-run tests
* Try working around the processor registration bugs
* oops
* Update error message
* Clarify error
* Docstring docstring docstring
* The extra content is indexed by config class, so let's grab some values out of there
* Commit my confusion as a TODO
* Resolve my confusion
* Cleanup and mostly revert to the original
* Better autoclass fallback
* Don't nest f-strings you lunatic
* Clearer error message
* Less getattr()
* Revert a lot of changes to try a different approach!
* Try the global registry
* Check the dynamic list as well as the transformers root
* Move the dynamic list somewhere safer
* Move the dynamic list somewhere even safer
* More import cleanup
* Simplify all the register_for_auto_class methods
* Set _auto_class in the register() methods
* Stop setting the cls attribute in register()
* Restore specifying the model class for Model derivatives only
* Fix accidentally taking the .__class__ of a class
* Revert register_for_auto_class changes
* Fix get_possibly_dynamic_module
* No more ALL_CUSTOM_CLASSES
* Fix up get_possibly_dynamic_module as well
* Revert unnecessary formatting changes
* Trigger tests
* Set best_model_checkpoint only when ckpt exists.
Rather than set it explicitly without checking if the checkpoint directory even exists as before, now we moved the setting logic inside of _save_checkpoint and are only setting it if it exists.
* Added best_global_step to TrainerState.
* Added tests for best_model_checkpoint.
* Fixed hard-coded values in test to prevent fail.
* Added helper func and removed hard-coded best_step.
* Added side effect patch generator for _eval.
* Added evaluate side effect func.
* Removed erroneous patching.
* Fixed minor bug.
* Applied Ruff.
* Fixed Ruff problem in make style.
* Used Trainer.set_initial_training_values.
* Don't accidentally mutate the base_model_tp_plan
* Co-authored by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Trigger tests
* Marking grad accum test as slow
* Add a flaky decorator
* Add a flaky decorator
* Use cyril's codeblock
* Don't copy() when it's None
* Use cyril's new codeblock
* make fixup
* test
* fix
* fix
* skip some and run some first
* test fsdp
* fix
* patches for generate
* test distributed
* copy
* don't test distributed loss for hpu
* require fp16 and run first
* changes from marc's PR fixing zero3
* better alternative
* return True when fp16 support on gaudi without creating bridge
* fix
* fix tested dtype in deepspeed inference test
* test
* fix
* test
* fix
* skip
* require fp16
* run first fsdp
* Apply suggestions from code review
* address comments
* address comments and refactor test
* reduce precison
* avoid doing gaudi1 specific stuff in the genreation loop
* document test_gradient_accumulation_loss_alignment_with_model_loss test a bit more
* Fix converter
* [Broken] Adds Gemma 3 to Hugging Face Transformers
* Consolidating Config and Processor params across impls
* Sorting out configuration parameters. Adds qk_norm before RoPE. Still not sure if RoPE is right.
* Additional plumbing for CausalLM and ConditionalGeneration variants
* incomplete draft of Orbax conversion script
* More complete checkpoint conversion
* Supporting Gemma 3 1B checkpoints
* Updating RoPE for multiple frequencies
* Adjustments to rotary embedder
* Proof of life for text-only operation
* Updating the conversion script to handle multimodal projection weights
* Fixing tet-only conversions
* Cleaner conversion script with multimodal support and a simpler processor
* Additional refatcors to the Gemma3Processor
* Simplified Processor to work over text representations
* Updated conversion script to join text and vision embeddings at converion time
* Logging for debugging
* Update src/transformers/models/gemma2/modeling_gemma2.py
Co-authored-by: Joshua Lochner <admin@xenova.com>
* Removed extraneous Config params
* Switching to fast tokenizer for checkpoint conversions
* isolating siglip for performance tetsing
* Minor changes for debugging tests against baselines
* Adding average pooling for soft tokens
* Updating processor code to enable simpler embedding interleaving for arbitrary number of images in prompts
* Updating conversion script for ShieldGemma 2 conversion compatibility
* Allow disable_compile to be provided as a kwarg
* Refresh from modular
* Updated conversion script and corrected sliding window
* Fix type mismatch in cache_position (#4)
* Fix dtype (#5)
* Fix type mismatch in cache_position
* Actually fix in the modular file
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
---------
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
* fixes for embedding table overflow and missing image_soft_token_mask from Gemma3Processor
* Adding 2D pooling for image embeddings
* Revert "Adding 2D pooling for image embeddings"
This reverts commit 65350cf531296f050b2078a5b8e46f61642b2648.
* Gemma3 average pooling changed from 1D to 2D
* Major refactor to Gemma3MultimodalInputProjection
* Updating Gemm 3 Auto* registrations
* Add option to save Gemma 3 chat template with tokenizer during weights conversion
* Removing unused imports
* Moving out-of-vocab handling from Gemma3Processor to Gemma3ForConditionalGeneration
* Removing duplicate config property
* Removing final logit softcapping and 1-indexing of position ids
* Fixing image processor config and none --> None typo
* Fixing sliding window size for 1B
* Updating image_mean and image_std in Image Processor
* Attention masking changed to lower triangular
* Moving image special tokens to conversion script
* Mirror image processor defaults from conversion script into Gemma3ProcessorKwargs
* Remove special token variables from symbol space
* Moving image soft token mask computation from Gemma3Processor to Gemma3ForConditionalGeneration
* tie lm_head and embedding weights
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
* Correct tied weights in Gemma3CausalLM
* iterative bidirectional attention
* resolving merge conflicts
* Reverting to Gemma 2 HybridCache with sldiing window support and a sliding_window_pattern of 6
* Correcting RoPE scaling
* clean up first pass, dummy model geenration works
* final clean up before fixing tests
* causal lm test works, so fine
* Fix conversion
* Update src/transformers/models/gemma3/processing_gemma3.py
* model tests are happy
* processor tests are happy
* image processing tests added
* fixup
* Fix pre-processing in conversion
* Inputs merging
* Do not normalize vision embeddings
* Apply Ryan's (and team) changes to attention
* token type ids + mask
* template
* move embed scale, add rope scale, fix tests
* Add chat template to tokenizer
* Use prefix for causal model loading
* use existing code for sliding mask from gemma2
* self.embed_tokens already normalizes
* Correcting Gemma3TextConfig parameters in conversion script
* typo, modular overwrites my fixes
* enable device map for text model
* Conversion updates
* ultra nit: no einsums
* update image token
* copy deepcopy config + some docs
* add some test, still WIP
* Refactoring --include_chat_tempalte logic in converter
* Update src/transformers/models/gemma3/modular_gemma3.py
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Add eos tokens for instruct models
* dump so i can work on dgx
* Removing add_bos by default
* dump
* add fast im proc
* docs for PaS + fixup
* another fixup
* one more fixup
* fix tests
* Inverting prior BOS change
* ultra nit
* Reverting to Tokenizer saved with add_bos_token=True and chat template starting with BOS
* resize embeds, remove sqrt, add slow test outputs
* FA2 but quality is meh
* nit
* skip FA2, no idea what happened
* last bit for green CI
* please, green CI for docs
* T_T
* Fix for Gemma3 logits
* Support both options for system prompt
* Update src/transformers/models/gemma3/image_processing_gemma3_fast.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/model_doc/gemma3.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/model_doc/gemma3.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/model_doc/gemma3.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/model_doc/gemma3.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/model_doc/gemma3.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Docs updates now that assets are live
* Style fixes
---------
Co-authored-by: Joshua Lochner <admin@xenova.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Lysandre <hi@lysand.re>
* fix: handle input_channel_dim == channels_last
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
* fix: default PIL images to channels_last
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
* Apply suggestions from code review
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fixup from review batch
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
* test: add 1x1 PIL image to ambiguous channel test
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
* fix(mllama): avoid 0 dimension for image with impractical aspect ratio
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
---------
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Fixed 2 issues regarding `tests/trainer/test_data_collator.py::TFDataCollatorIntegrationTest::test_all_mask_replacement`:
1. I got the error `RuntimeError: "bernoulli_tensor_cpu_p_" not implemented for 'Long'`. This is because the `mask_replacement_prob=1` and `torch.bernoulli` doesn't accept this type (which would be a `torch.long` dtype instead. I fixed this by manually casting the probability arguments in the `__post_init__` function of `DataCollatorForLanguageModeling`.
2. I also got the error `tensorflow.python.framework.errors_impl.InvalidArgumentError: cannot compute Equal as input #1(zero-based) was expected to be a int64 tensor but is a int32 tensor [Op:Equal]` due to the line `tf.reduce_all((batch["input_ids"] == inputs) | (batch["input_ids"] == tokenizer.mask_token_id))` in `test_data_collator.py`. This occurs because the type of the `inputs` variable is `tf.int32`. Solved this by manually casting it to `tf.int64` in the test, as the expected return type of `batch["input_ids"]` is `tf.int64`.
* initial commit
* small fix
* move stuff to image processing file
* remove stuff in validate turn and fix return tensor
* remove liquid stuff
* in the process of addressing comments
* changes to get the right tokenization
* new __init__ works
* fixing defulat std and mean
* works
* small testing scipt -- to be deleted before merge
* remove redundant code
* addressing comments
* fix inits, add docs templates
* refactor processor, switch to gotocr image processor
* remove image proc from init
* refactor to working llava-style architecture
* Change AyaVisionModel to AyaVisionForConditionalGeneration
* add tests
* fixups
* update doc
* Adding logits_to_keep explicitly in ayavision forward to enable compatibility with cohere model
* better variable names + remove code paths
* Updates to aya_vision.md
* address comments
* adding copied from
* make style and remove unused projector_hidden_act from config
* sort init
* include usage of fast image proc and proc on cuda in doc
* update checkpoint iin test processor
* update checkpoint in test processor 2
* remove test_model and update docstring
* skip failing tests
---------
Co-authored-by: Saurabh Dash <saurabh@cohere.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* Fix edge case for continue_final_message
* lstrip() correctly
* Add regression test
* Add a clearer error message when the final message is not present
* Add a clearer error message when the final message is not present
* Fix massive bug!
* Fix pipeline-peft interaction
* once again you have committed a debug breakpoint
* Remove extra testing line
* Add a test to check adapter loading
* Correct adapter path
* make fixup
* Remove unnecessary check
* Make check a little more stringent
* refactor image processor slow got ocr
* add working image processor fast
* fix fast image processor, update doc
* use one big loop for processing patches