HuggingFace_transformer

Author	SHA1	Message	Date
Yoni Gozlan	d8500cd229	Uniformize kwargs for Pixtral processor (#33521 ) * add uniformized pixtral and kwargs * update doc * fix _validate_images_text_input_order * nit	2024-09-17 14:44:27 -04:00
Nikita Krasnytskyi	c29a8694b0	Fix missing `sequences_scores` in the Whisper beam search output (#32970 ) * added sequences_scores to the output * added beam_indices to output * added test to check for beam_indices, sequences_scores and their shape * removed redundant whitespaces * make fixup	2024-09-17 19:36:11 +01:00
ErezSC42	46c27577b3	fix to jamba config, asserting attention and expert offset (#33316 ) * fix to jamba config, asserting attention and expert offset * fix foramtting * fix foramtting * fix foramtting * changed to error raise instead of assertion, added unittests * fix * changed t_ to property_ * changed t_ to property_ * quickfix * ran code styler	2024-09-17 19:29:27 +01:00
Wang, Yi	74026b473e	idefics2 enable_input_require_grads not aligned with disable_input_re… (#33194 ) * idefics2 enable_input_require_grads not aligned with disable_input_require_grads make peft+idefics2 checkpoints disable fail Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * split test case Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * fix ci failure Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * refine test Signed-off-by: Wang, Yi <yi.a.wang@intel.com> --------- Signed-off-by: Wang, Yi <yi.a.wang@intel.com>	2024-09-17 10:39:34 +01:00
Insu Jang	bcf8946f0a	Fix number of patch check for different vision feature select strategy (#32494 ) * Fix number of patch check for different vision feature select strategy * add test --------- Co-authored-by: raushan <raushan@huggingface.co>	2024-09-17 09:33:07 +02:00
Yoach Lacombe	18e1a9c719	Fix parametrization-based weight norm (#33275 ) * refactor weight_norm + propose uniformed solution to reconcile meta load_state_dict with classic loading * make style * fix sew * fix sew and sew_d tests	2024-09-17 08:05:21 +02:00
Steven Shimizu	ba1f1dc132	Updated Trainer's liger-kernel integration to call correct patching API (#33502 ) * Updated liger-kernel integration in Trainer to call correct patching API * Fixed styling	2024-09-17 02:40:24 +02:00
Yoach Lacombe	98adf24883	[Whisper test] Fix some failing tests (#33450 ) * Fix failing tensor placement in Whisper * fix long form generation tests * more return_timestamps=True * make fixup * [run_slow] whisper * [run_slow] whisper	2024-09-16 19:05:17 +02:00
Yoni Gozlan	2f62146f0e	Uniformize kwargs for LLaVa processor and update docs (#32858 ) * Uniformize kwargs for LlaVa and update docs * Change order of processor inputs in docstring * Improve BC support for reversed images and text inputs * cleanup llava processor call docstring * Add encoded inputs as valid text inputs in reverse input check, add deprecation version in warning * Put function check reversed images text outside base processor class * Refactor _validate_images_text_input_order * Add ProcessingUtilTester * fix processing and test_processing	2024-09-16 11:26:26 -04:00
Arthur	8bd2b1e8c2	Add support for Pixtral (#33449 ) * initial commit * gloups * updates * work * weights match * nits * nits * updates to support the tokenizer :) * updates * Pixtral processor (#33454) * rough outline * Add in image break and end tokens * Fix * Udo some formatting changes * Set patch_size default * Fix * Fix token expansion * nit in conversion script * Fix image token list creation * done * add expected results * Process list of list of images (#33465) * updates * working image and processor * this is the expected format * some fixes * push current updated * working mult images! * add a small integration test * Uodate configuration docstring * Formatting * Config docstring fix * simplify model test * fixup modeling and etests * Return BatchMixFeature in image processor * fix some copies * update * nits * Update model docstring * Apply suggestions from code review * Fix up * updates * revert modeling changes * update * update * fix load safe * addd liscence * update * use pixel_values as required by the model * skip some tests and refactor * Add pixtral image processing tests (#33476) * Image processing tests * Add processing tests * woops * defaults reflect pixtral image processor * fixup post merge * images -> pixel values * oups sorry Mr docbuilder * isort * fix * fix processor tests * small fixes * nit * update * last nits * oups this was really breaking! * nits * is composition needs to be true --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-14 12:28:39 +02:00
Marc Sun	6cc4dfe3f1	Fix the initialization of the cache when we have multi gpu (#33303 ) * init cache multi-gpu * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * switch to execution device map * naming more consistant * fix * mutually exclusive device * added an integration example * remove useless check * suggestion from joao + typing * fix couple of typo and add test * revert check --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-09-13 15:06:08 +02:00
Amit Garg	dfd31158ee	[Phi-3] Bug on stale kv cache (#33129 ) * fix long seq bug * fixed format * fixed fn copy inconsistency * fix long seq bug * fixed format * fixed fn copy inconsistency * Addressed comments * added a unit test * fixed cache position * Added a warning msg to the forward fn * fixed test case	2024-09-13 14:07:19 +02:00
Alvaro Moran	7a5659872a	Mitigate a conflict when using sentencepiece (#33327 ) * test(tokenizers): add a test showing conflict with sentencepiece This is due to the fact that protobuf C implementation uses a global pool for all added descriptors, so if two different files add descriptors, they will end up conflicting. * fix(tokenizers): mitigate sentencepiece/protobuf conflict When sentencepiece is available, use that protobuf instead of the internal one. * chore(style): fix with ruff	2024-09-13 13:19:06 +02:00
Raushan Turganbay	4b0418df11	Enable `padding_side` as call time kwargs (#33385 ) * fix * add padding-side kwarg * add padding side in all models & fix tests * fix copies * fix tests	2024-09-13 11:58:38 +01:00
Wing Lian	1027a532c5	add a callback hook right before the optimizer step (#33444 )	2024-09-13 10:43:45 +02:00
Raushan Turganbay	9c4639b622	Return image hidden states (#33426 ) * fix * return image hidden states * fix copies * fix test	2024-09-13 10:20:03 +02:00
benniekiss	5c6257d1fc	[whisper] Clarify error message when setting max_new_tokens (#33324 ) * clarify error message when setting max_new_tokens * sync error message in test_generate_with_prompt_ids_max_length * there is no self	2024-09-12 18:48:36 +02:00
Raushan Turganbay	2f611d30d9	Qwen2-VL: clean-up and add more tests (#33354 ) * clean-up on qwen2-vl and add generation tests * add video tests * Update tests/models/qwen2_vl/test_processing_qwen2_vl.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix and add better tests * Update src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update docs and address comments * Update docs/source/en/model_doc/qwen2_vl.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_vl.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update * remove size at all --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-12 18:24:04 +02:00
Hannan Komari	8ed635258c	Fix flax whisper tokenizer bug (#33151 ) * Update tokenization_whisper.py Fix issue with flax whisper model * Update tokenization_whisper_fast.py Fix issue with flax whisper model * Update tokenization_whisper.py just check len of token_ids * Update tokenization_whisper_fast.py just use len of token_ids * Update tokenization_whisper_fast.py and revert changes in _strip_prompt and add support to jax arrays in _convert_to_list * Update tokenization_whisper.py and revert changes in _strip_prompt and add support to jax arrays in _convert_to_list * Update test_tokenization_whisper.py to add test for _convert_to_list method * Update test_tokenization_whisper.py to fix code style issues * Fix code style * Fix code check again * Update test_tokenization)whisper.py to Improve code style * Update test_tokenization_whisper.py to run each of jax, tf and flax modules if available * Update tests/models/whisper/test_tokenization_whisper.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update test_tokenization_whisper.py and use require_xxx decorators instead of `is_xxx_available()` method * Revert the changes automatically applied by formatter and was unrelated to PR * Format for minimal changes --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-12 12:21:59 +01:00
Jonathan Mamou	7a51cbc65f	Dynamic number of speculative tokens in order to accelerate speculative decoding (#33258 ) * optimal Speculation Lookahead based on probability * update peer finished condition * add support to do_sample True * add stopping criteria * gitignore * add print * remove prints * minor * minor * git ignore * adding test to stopping ConfidenceCriteria * doc + format * add doc * Update .gitignore * update docstring and default value of assistant_confidence_threshold * add docstring * Update src/transformers/generation/configuration_utils.py implicit default value (None) Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * style fix --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-09-11 14:22:28 +02:00
Theia Vogel	e719b65c31	Fix `FbgemmFp8Linear` not preserving tensor shape (#33239 ) * add tests for linear shape behavior * fix linear shape behavior ended up adding the reshape at the end, after f8f8bf16_rowwise, because adding it directly after quantize_fp8_per_row caused f8f8bf16_rowwise to drop the seq_len dimension. (i.e., (17, 23, 1014) -> (17, 1024)) * save shape up front + comment	2024-09-11 13:26:44 +02:00
Ita Zaporozhets	781bbc4d98	use diff internal model in tests (#33387 ) * use diff internal model in tests * use diff internal model in tests	2024-09-11 11:27:00 +02:00
Guang Yang	f38590dade	Make StaticCache configurable at model construct time (#32830 ) * Make StaticCache configurable at model construct time * integrations import structure * add new doc file to toc --------- Co-authored-by: Guang Yang <guangyang@fb.com> Co-authored-by: Joao Gante <joao@huggingface.co>	2024-09-10 16:35:57 +01:00
Alazar	96429e74a8	Add support for GGUF Phi-3 (#31844 ) * Update docs for GGUF supported models * Add tensor mappings and define class GGUFPhi3Converter * Fix tokenizer * Working version * Attempt to fix some CI failures * Run ruff format * Add vocab, merges, decoder methods like LlamaConverter * Resolve conflicts since Qwen2Moe was added to gguf - I missed one place when resolving conflict - I also made a mistake with tests_ggml.py and now has been fixed to reflect its master version.	2024-09-10 13:32:38 +02:00
Maciej Adamiak	8e8e7d8558	fixed Mask2Former image processor segmentation maps handling (#33364 ) * fixed mask2former image processor segmentation maps handling * introduced review suggestions * introduced review suggestions	2024-09-10 11:19:56 +01:00
Raushan Turganbay	7d2d6ce9cb	VLM: fixes after refactor (#32907 ) * leave only half of the changes * fix tests * [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava * fix tests, first try * [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava * fix, second try * [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava * fix * [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava	2024-09-10 12:02:37 +02:00
Lysandre Debut	f24f084329	Import structure & first three model refactors (#31329 ) * Import structure & first three model refactors * Register -> Export. Export all in __all__. Sensible defaults according to filename. * Apply most comments from Amy and some comments from Lucain Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lucain Pouget <lucainp@gmail.com> * Style * Add comment * Clearer .py management * Raise if not in backend mapping * More specific type * More efficient listdir * Misc fixes --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lucain Pouget <lucainp@gmail.com>	2024-09-10 11:10:53 +02:00
amyeroberts	f745e7d3f9	Remove repeated prepare_images in processor tests (#33163 ) * Remove repeated prepare_images * Address comments - update docstring; explanatory comment	2024-09-09 13:20:27 +01:00
Raushan Turganbay	65bb284448	Compile compatibilty for decoder-only models (#32617 ) * squash into one commit * add qwen2-vl for rope standardization * fix mistral compile * fix qwen2-vl * fix-copies	2024-09-09 10:59:04 +02:00
Wing Lian	62aecd85ff	schedulefree optimizers (#30079 ) * schedulefree optimizers * fix train instead of eval for optimizer * fixes and update docs * chore: lint * add tests and drop overly-verbose _32bit suffix * chore: lint * fix for docs * fix code review issues * use duck-typing to avoid per-optimizer patches * fixup style * fixup style * warn if incorrect accelerate version with schedule free Co-authored-by: Aman Gupta Karmani <aman@tmm1.net> --------- Co-authored-by: Aman Karmani <aman@tmm1.net>	2024-09-09 09:51:39 +02:00
Ita Zaporozhets	e48e5f1f13	Support reading tiktoken tokenizer.model file (#31656 ) * use existing TikTokenConverter to read tiktoken tokenizer.model file * del test file * create titktoken integration file * adding tiktoken llama test * ALTNATIVE IMPLEMENTATION: supports llama 405B * fix one char * remove redundant line * small fix * rm unused import * flag for converting from tiktokeng * remove unneeded file * ruff * remove llamatiktokenconverter, stick to general converter * tiktoken support v2 * update test * remove stale changes * udpate doc * protect import * use is_protobuf_available * add templateprocessor in tiktokenconverter * reverting templateprocessor from tiktoken support * update test * add require_tiktoken * dev-ci * trigger build * trigger build again * dev-ci * [build-ci-image] tiktoken * dev-ci * dev-ci * dev-ci * dev-ci * change tiktoken file name * feedback review * feedback rev * applying feedback, removing tiktoken converters * conform test * adding docs for review * add doc file for review * add doc file for review * add doc file for review * support loading model without config.json file * Revert "support loading model without config.json file" This reverts commit 2753602e51c34cef2f184eb11f36d2ad1b02babb. * remove dev var * updating docs * safely import protobuf * fix protobuf import error * fix protobuf import error * trying isort to fix ruff error * fix ruff error * try to fix ruff again * try to fix ruff again * try to fix ruff again * doc table of contents * add fix for consistency.dockerfile torchaudio * ruff * applying feedback * minor typo * merging with push-ci-image * clean up imports * revert dockerfile consistency	2024-09-06 14:24:02 +02:00
Shiyu	342e800086	support 3D attention mask in bert (#32105 ) * support 3D/4D attention mask in bert * test cases * update doc * fix doc	2024-09-06 14:20:48 +02:00
GeLee	2b18354106	add self.head_dim for VisionAttention in Qwen2-VL (#33211 ) * add self.head_dim for VisionAttention in Qwen2-VL * add self.head_dim for VisionAttention in Qwen2-VL * fix ci * black the test_modeling_qwen2_vl.py * use ruff to format test_modeling_qwen2_vl.py * [run-slow] qwen2_vl * use tying for python3.8 * fix the import format * use ruff to fix the ci error I001 * [run-slow] qwen2_vl * remove unused import * commit for rebase * use ruff fix ci * [run-slow] qwen2_vl --------- Co-authored-by: root <liji>	2024-09-06 17:19:29 +05:00
Amir Mohammad Fakhimi	3314fe1760	Add validation for maximum sequence length in modeling_whisper.py (#33196 ) * Add validation for maximum sequence length in modeling_whisper.py Added a validation check to ensure that the sequence length of labels does not exceed the maximum allowed length of 448 tokens. If the sequence length exceeds this limit, a ValueError is raised with a descriptive error message. This change prevents the model from encountering errors or unexpected behavior due to excessively long sequences during training or fine-tuning, ensuring consistent input dimensions and improving overall robustness. * Change exception message in src/transformers/models/whisper/modeling_whisper.py The exception message is for whisper's label's sequence max length. Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Change 448 to config.max_target_positions in src/transformers/models/whisper/modeling_whisper.py It's for whisper's config.max_target_positions. Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Change method's documentation in src/transformers/models/whisper/modeling_whisper.py * Add test for maximum label's sequence length in test_modeling_whisper.py * Add self to modeling_whisper.py * Update test_modeling_whisper.py with respect to automatic validations * Update modeling_whisper.py with respect to ci/circleci: check_code_quality * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Update test_modeling_whisper.py with respect to ci/circleci: tests_generate * Update test_modeling_whisper.py with respect to ci/circleci: tests_generate * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Separate test_labels_sequence_max_length tests in test_modeling_whisper.py * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Remove assert from test_modeling_whisper.py * Add max_target_positions to WhisperModelTester in test_modeling_whisper.py * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Update test_modeling_whisper.py with respect to ci/circleci: tests_generate * Update test_modeling_whisper.py * Change test_labels_sequence_max_length_error_after_changing_config in test_modeling_whisper.py * Change self.config.max_target_positions to self.max_target_positions modeling_whisper.py * Add new tests in test_modeling_whisper.py * Update test_modeling_whisper.py --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>	2024-09-06 14:09:49 +02:00
Ita Zaporozhets	363301f221	support loading model without config.json file (#32356 ) * support loading model without config.json file * fix condition * update tests * add test * ruff * ruff * ruff	2024-09-06 13:49:47 +02:00
Xuehai Pan	e1c2b69c34	Load dynamic module (remote code) only once if code isn't change (#33162 ) * Load remote code only once * Use hash as load indicator * Add a new option `force_reload` for old behavior (i.e. always reload) * Add test for dynamic module is cached * Add more type annotations to improve code readability * Address comments from code review	2024-09-06 12:49:35 +01:00
Sanchit Gandhi	51d15eb1c1	[whisper] alternative fix for long-form timestamps (#32131 ) * [whisper] alternative fix for long-form timestamps * update test	2024-09-06 12:57:08 +02:00
Raushan Turganbay	1759bb9126	Fix: StaticCache & `inputs_embeds` (#32932 ) squash commit	2024-09-06 12:56:59 +05:00
Shijie	21fac7abba	simple align qwen2vl kv_seq_len calculation with qwen2 (#33161 ) * qwen2vl_align_kv_seqlen_to_qwen2 * flash att test * [run-slow] qwen2_vl * [run-slow] qwen2_vl fix OOM * [run-slow] qwen2_vl * Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> * Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> * code quality --------- Co-authored-by: baishuai.bs <1051314669@qq.com> Co-authored-by: ShuaiBai623 <baishuai623@icloud.com> Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com> Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>	2024-09-05 21:19:30 +05:00
Vladislav Bronzov	5d11de4a2f	Add Qwen2Moe GGUF loading support (#33264 ) * update gguf doc, config and tensor mapping * add qwen2moe architecture support, GGUFQwen2MoeConverter and q4 unit tests * apply code style fixes * reformat files * assign GGUFQwen2Converter to qwen2_moe	2024-09-05 17:42:03 +02:00
Joshua Lochner	c6d2848a23	🚨 Fix `torch.jit.trace` for `interpolate_pos_encoding` in all vision models (#33226 ) * Fix `torch.jit.tracing` for `interpolate_pos_encoding` in all vision models * Apply formatting * Add missing `self.config = config` * Fix copies * Fix hiera interpolation unit test * Formatting * Update `_import_structure` * make style * Fix docstring * Use `# Copied from` instead of utils * DeiT variable renaming (`class_and_dist_pos_embed`) * Fix Hiera `interpolate_pos_encoding`	2024-09-05 16:17:34 +02:00
Younes Belkada	47b096412d	Fix: Fix `FalconMamba` training issues due to incompatible kernels (#33195 ) * fix FM training kernels * fix copies * fix copies * propagate to slow path * make it BC * add comment * fix test	2024-09-05 11:55:08 +02:00
Raushan Turganbay	43df47d8e7	Llava Onevision: add model (#32673 ) * working version * fix copies * update * tests * update docs * codestyle * add more tests * add returns for docs * clean up * Update src/transformers/models/llava_onevision/processing_llava_onevision.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * updates * codestyle * style * shouldn't be reversed * [run-slow] llava_onevision * [run-slow] llava_onevision * add pooling in videos * [run-slow] llava_onevision * num-logits-to-keep * [run-slow] llava_onevision * [run-slow] llava_onevision * Update tests/test_modeling_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * video matched orig impl * fix tests * chat template was modified * Update docs/source/en/model_doc/llava_onevision.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add morer info in the doc page --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-05 14:43:20 +05:00
Yoni Gozlan	9230d78e76	Add validate images and text inputs order util for processors and test_processing_utils (#33285 ) * Add validate images and test processing utils * Remove encoded text from possible inputs in tests * Removed encoded inputs as valid in processing_utils * change text input check to be recursive * change text check to all element of lists and not just the first one in recursive checks	2024-09-04 13:50:31 -04:00
Aymeric Roucher	2cb543db77	Multi agents with manager (#32687 ) * Add Multi agents with a hierarchical system	2024-09-04 17:30:54 +02:00
amyeroberts	d2dcff96f8	[InstructBLIP] qformer_tokenizer is required input (#33222 ) * [InstructBLIP] qformer_tokenizer is required input * Bit safer * Add to instructblipvideo processor * Fix up * Use video inputs * Update tests/models/instructblipvideo/test_processor_instructblipvideo.py	2024-09-04 16:18:06 +01:00
Alex Sherstinsky	122ded0a11	Bugfix/alexsherstinsky/fix none check for attention factor in rope scaling 2024 08 28 0 (#33188 ) * Fixing a bug in the way "attention_factor" is validated in ROPE utilities. * Fixing a bug in the way "attention_factor" is validated in ROPE utilities. * Fixing a bug in the way "attention_factor" is validated in ROPE utilities.	2024-09-04 17:01:12 +02:00
laurentd-lunit	d703477265	[fix] LlavaNextProcessor '_get_unpadded_features' method (#33263 ) * [fix] LlavaNextProcessor '_get_unpadded_features' method * [tests] add test_image_token_filling * [chore] style + comment * [minor] improve readability * [chore] run make fix-copies	2024-09-04 17:41:51 +05:00
Joao Gante	d750b509fc	Config: unified logic to retrieve text config (#33219 )	2024-09-04 12:03:30 +01:00
Raushan Turganbay	ebbe8d8014	Cache docs: update (#32929 ) * some changes * more updates * fix cache copy * nits * nits * add tests	2024-09-04 15:05:31 +05:00

1 2 3 4 5 ...

4058 Commits