HuggingFace_transformer

Author	SHA1	Message	Date
Joaquin Caballero	031ef8802c	fix FSDP + torch.compile bug when saving pretrained model (#37725 ) * args keep_torch_compile=False in _save and _wwrap_method * Fix FSDP execution on evaluation for torch_compile mode * add test trainer FSDP + Torch Compile * fix quality code * make style * Revert " make style" This reverts commit 77e797f8829c50992cc21496be3d9a3e480e1c97. * make style	2025-05-06 17:51:28 +02:00
Yao Matrix	5534b80b7f	enable xpu in test_trainer (#37774 ) * enable xpu in test_trainer Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> * enhance _device_agnostic_dispatch to cover value Signed-off-by: Yao Matrix <matrix.yao@intel.com> * add default values for torch not available case Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com> Signed-off-by: Yao Matrix <matrix.yao@intel.com>	2025-05-06 17:13:35 +02:00
Cyril Vallez	0cfbf9c95b	Force torch>=2.6 with torch.load to avoid vulnerability issue (#37785 ) * fix all main files * fix test files * oups forgot modular * add link * update message	2025-04-25 16:57:09 +02:00
Yao Matrix	19e9079dc1	enable 4 test_trainer cases on XPU (#37645 ) Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-04-23 21:29:42 +02:00
youngrok cha	31ea547b7a	[fix] make legacy bnb code work (#37331 ) * [fix] make legacy bnb code work * [fix] use get with default instead of getter * add test for bnb 8bit optim skip embed * [fix] style * add require annotation of bnb --------- Co-authored-by: jaycha <jaycha@ncsoft.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-22 11:17:29 +02:00
Garrett Goon	503541d7ef	add FlashAttentionKwargs and seq_idx to flat collator (#36456 ) * add flash attn kwargs to flattening collator * add return_seq_idx option * doc string edits * cleaner max len updates * various fixes * temp testing code * return int32 seq_idx and FlashAttnKwargs * DataCollatorIntegrationTest impl * fix batch dims and dtypes * fill out remaining collator tests * test name change and fmt * rm unused var * fmt * minor change * fmt * add missing pos_ids check * consistent {np,pt,tf} tests * split pt tests into 3, like np/tf tests * mv comment, rename fa test * remove batch dim comment * simply wrapping * compute cu_seq_len/max_length once * fmt * remove tf code * rm warning * move separator_id back to 2nd pos * use cleaner lists in tests * ret -> batch * fmt * attr ordering * use py ints for max_length_{k,q}	2025-04-16 15:45:03 +02:00
cyyever	371c44d0ef	Remove old code for PyTorch, Accelerator and tokenizers (#37234 ) * Remove unneeded library version checks Signed-off-by: cyy <cyyever@outlook.com> * Remove PyTorch condition Signed-off-by: cyy <cyyever@outlook.com> * Remove PyTorch condition Signed-off-by: cyy <cyyever@outlook.com> * Fix ROCm get_device_capability Signed-off-by: cyy <cyyever@outlook.com> * Revert "Fix ROCm get_device_capability" This reverts commit 0e756434bd7e74ffd73de5500476072b096570a6. * Remove unnecessary check Signed-off-by: cyy <cyyever@outlook.com> * Revert changes Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-04-10 20:54:21 +02:00
cyyever	1e6b546ea6	Use Python 3.9 syntax in tests (#37343 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-04-08 14:12:08 +02:00
cyyever	41a0e58e5b	Set weights_only in torch.load (#36991 )	2025-03-27 14:55:50 +00:00
cyyever	2b550c47b2	Remove deprecated training arguments (#36946 ) * Remove deprecated training arguments * More fixes * More fixes * More fixes	2025-03-26 16:44:48 +00:00
Yih-Dar	c6814b4ee8	Update ruff to `0.11.2` (#36962 ) * update * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-03-25 16:00:11 +01:00
gautham	48385aa4f4	Added support for seed in `DataCollatorForWholeWordMask` (#36903 ) * Added support for seed in `DataCollatorForWholeWordMask`, and also wrote tests. Also fixed bugs where the code hardcoded values for mask replacement probability and random replacement probability, instead of using the values passed by the user. * formatting issues * Used better way to generate seed in TF. Made tests more consistent.	2025-03-24 16:57:17 +00:00
Afanti	26c83490d2	chore: fix typos in the tests directory (#36813 ) * chore: fix typos in the tests * chore: fix typos in the tests * chore: fix typos in the tests * chore: fix typos in the tests * chore: fix typos in the tests * chore: fix typos in the tests * chore: fix typos in the tests * chore: fix typos in the tests * chore: fix typos in the tests * chore: fix typos in the tests * chore: fix typos in the tests * chore: fix typos in the tests * chore: fix typos in the tests * fix: format codes * chore: fix copy mismatch issue * fix: format codes * chore: fix copy mismatch issue * chore: fix copy mismatch issue * chore: fix copy mismatch issue * chore: restore previous words * chore: revert unexpected changes	2025-03-21 10:20:05 +01:00
gautham	9e771bf402	Add support for seed in `DataCollatorForLanguageModeling` (#36497 ) Add support for `seed` in `DataCollatorForLanguageModeling`. Also wrote tests for verifying behaviour.	2025-03-20 18:27:43 +00:00
yutong_liu	8b479e39bb	Saving `Trainer.collator.tokenizer` in when `Trainer.processing_class` is `None` (#36552 ) * feat: Saving tokenizer in collator when processing_class is None * chore: Style issue * chore: Typo * dbg: Check why test failed * dbg: Remove logics and another test failed which successed before, so should be the stablibility issue * test: Init unit-test * chore: Style * chore: Add err log * fix: Case * Update tests/trainer/test_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * chore: Try to use get_regression_trainer * fix: Impl and style * fix: Style * fix: Case * fix: Import err * fix: Missed import * fix: Import block un-sorted problem * fix: Try another tokenizer * fix: Test logic * chore: Light updates * chore: Reformat --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-03-20 11:27:47 +01:00
Matt	9be4728af8	Just import torch AdamW instead (#36177 ) * Just import torch AdamW instead * Update docs too * Make AdamW undocumented * make fixup * Add a basic wrapper class * Add it back to the docs * Just remove AdamW entirely * Remove some AdamW references * Drop AdamW from the public init * make fix-copies * Cleanup some references * make fixup * Delete lots of transformers.AdamW references * Remove extra references to adamw_hf	2025-03-19 18:29:40 +00:00
Afanti	7f5077e536	fix typos in the tests directory (#36717 )	2025-03-17 17:45:57 +00:00
Ilyas Moutawwakil	6f3e0b68e0	Fix grad accum arbitrary value (#36691 )	2025-03-14 22:03:01 +01:00
Sean (Seok-Won) Yi	691d1b52c3	Fix/best model checkpoint fix (#35885 ) * Set best_model_checkpoint only when ckpt exists. Rather than set it explicitly without checking if the checkpoint directory even exists as before, now we moved the setting logic inside of _save_checkpoint and are only setting it if it exists. * Added best_global_step to TrainerState. * Added tests for best_model_checkpoint. * Fixed hard-coded values in test to prevent fail. * Added helper func and removed hard-coded best_step. * Added side effect patch generator for _eval. * Added evaluate side effect func. * Removed erroneous patching. * Fixed minor bug. * Applied Ruff. * Fixed Ruff problem in make style. * Used Trainer.set_initial_training_values.	2025-03-14 14:24:53 +01:00
bd793fcb	87b30c3589	fix wandb hp search unable to resume from sweep_id (#35883 ) * fix wandb hp search unable to resume from sweep_id * format styles --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-03-13 12:32:26 +01:00
Matt	c7eb95581a	Don't accidentally mutate the base_model_tp_plan (#36677 ) * Don't accidentally mutate the base_model_tp_plan * Co-authored by: Joao Gante <joaofranciscocardosogante@gmail.com> * Trigger tests * Marking grad accum test as slow * Add a flaky decorator * Add a flaky decorator * Use cyril's codeblock * Don't copy() when it's None * Use cyril's new codeblock * make fixup	2025-03-12 18:59:13 +00:00
Ilyas Moutawwakil	89f6956015	HPU support (#36424 ) * test * fix * fix * skip some and run some first * test fsdp * fix * patches for generate * test distributed * copy * don't test distributed loss for hpu * require fp16 and run first * changes from marc's PR fixing zero3 * better alternative * return True when fp16 support on gaudi without creating bridge * fix * fix tested dtype in deepspeed inference test * test * fix * test * fix * skip * require fp16 * run first fsdp * Apply suggestions from code review * address comments * address comments and refactor test * reduce precison * avoid doing gaudi1 specific stuff in the genreation loop * document test_gradient_accumulation_loss_alignment_with_model_loss test a bit more	2025-03-12 09:08:12 +01:00
gautham	a1cf9f3390	Fixed datatype related issues in `DataCollatorForLanguageModeling` (#36457 ) Fixed 2 issues regarding `tests/trainer/test_data_collator.py::TFDataCollatorIntegrationTest::test_all_mask_replacement`: 1. I got the error `RuntimeError: "bernoulli_tensor_cpu_p_" not implemented for 'Long'`. This is because the `mask_replacement_prob=1` and `torch.bernoulli` doesn't accept this type (which would be a `torch.long` dtype instead. I fixed this by manually casting the probability arguments in the `__post_init__` function of `DataCollatorForLanguageModeling`. 2. I also got the error `tensorflow.python.framework.errors_impl.InvalidArgumentError: cannot compute Equal as input #1(zero-based) was expected to be a int64 tensor but is a int32 tensor [Op:Equal]` due to the line `tf.reduce_all((batch["input_ids"] == inputs) \| (batch["input_ids"] == tokenizer.mask_token_id))` in `test_data_collator.py`. This occurs because the type of the `inputs` variable is `tf.int32`. Solved this by manually casting it to `tf.int64` in the test, as the expected return type of `batch["input_ids"]` is `tf.int64`.	2025-03-07 14:09:27 +00:00
Manny Cortes	082834dd79	fix: prevent model access error during Optuna hyperparameter tuning (#36395 ) * fix: prevent model access error during Optuna hyperparameter tuning The `transformers.integrations.integration_utils.run_hp_search_optuna` function releases model memory and sets trainer.model to None after each trial. This causes an AttributeError when subsequent Trainer.train calls attempt to access the model before reinitialization. This is only an issue when `fp16_full_eval` or `bf16_full_eval` flags are enabled. * Update src/transformers/trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-02-26 17:06:48 +01:00
Nosimus	effaef334b	fix: prevent second save in the end of training if last step was saved already (#36219 ) * fix: prevent second save in the end of training * fix: prevent second save in the end of training * test: added test for no duplicate save on epoch save strategy * fix: removed TrainerControl * chore: style formatting --------- Co-authored-by: JaktensTid <jaktenstid1@gmail.com>	2025-02-20 17:38:52 +01:00
Thomas Bauwens	8f137b2427	Move `DataCollatorForMultipleChoice` from the docs to the package (#34763 ) * Add implementation for DataCollatorForMultipleChoice based on docs. * Add DataCollatorForMultipleChoice to import structure. * Remove custom DataCollatorForMultipleChoice implementations from example scripts. * Remove custom implementations of DataCollatorForMultipleChoice from docs in English, Spanish, Japanese and Korean. * Refactor torch version of DataCollatorForMultipleChoice to be more easily understandable. * Apply suggested changes and run make fixup. * fix copies, style and fixup * add missing documentation * nits * fix docstring * style * nits * isort --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>	2025-02-13 12:01:28 +01:00
Zach Mueller	1fae54c721	Add more rigerous non-slow grad accum tests (#35668 ) * Add more rigerous non-slow grad accum tests * Further nits * Re-add space * Readbility * Use tinystories instead * Revert transformer diff * tweak threshs	2025-02-12 10:26:21 -05:00
hsilva664	281c0c8b5b	adding option to save/reload scaler (#34932 ) * Adding option to save/reload scaler * Removing duplicate variable * Adding save/reload test * Small fixes on deterministic algorithm call * Moving LLM test to another file to isolate its environment * Moving back to old file and using subprocess to run test isolated * Reverting back accidental change * Reverting back accidental change	2025-02-12 15:48:16 +01:00
kang sheng	a33ac830af	Fix multi gpu loss sync condition, add doc and test (#35743 ) * Fix multi gpu loss sync condition, add doc and test * rename function and class * loss should not scale during inference * fix typo	2025-02-12 15:41:31 +01:00
zhuHQ	08c4959a23	Optim: APOLLO optimizer integration (#36062 ) * Added APOLLO optimizer integration * fix comment * Remove redundancy: Modularize low-rank optimizer construction * Remove redundancy: Remove useless comment * Fix comment: Add typing * Fix comment: Rewrite apollo desc	2025-02-12 15:33:43 +01:00
nhamanasu	377d8e2b9c	add RAdamScheduleFree optimizer (#35313 ) * add RAdamScheduleFree optimizer * revert schedulefree version to the minimum requirement * refine is_schedulefree_available so that it can take min_version * refine documents --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-02-12 11:31:51 +01:00
Ryoo Kwangrok	b1954fd64a	layernorm_decay_fix (#35927 ) * layernorm_decay_fix * W293 fix * ruff format fix * black format * ruff format * erase last layer * add test_get_parameter_names_rmsnorm * rmsnorm fix	2025-02-04 11:01:49 +01:00
Arthur	b912f5ee43	use torch.testing.assertclose instead to get more details about error in cis (#35659 ) * use torch.testing.assertclose instead to get more details about error in cis * fix * style * test_all * revert for I bert * fixes and updates * more image processing fixes * more image processors * fix mamba and co * style * less strick * ok I won't be strict * skip and be done * up	2025-01-24 16:55:28 +01:00
fzyzcjy	dc10f7906a	Support adamw_torch_8bit (#34993 ) * var * more * test	2025-01-21 14:17:49 +01:00
kang sheng	2cbcc5877d	Fix condition when GA loss bug fix is not performed (#35651 ) * fix condition when GA loss bug fix is not performed * max loss diff is 2.29 * fix typo * add an extra validation that loss should not vary too much	2025-01-16 13:59:53 +01:00
Mahdi Baghbanzadeh	c61fcde910	Enhance DataCollatorForLanguageModeling with Configurable Token Replacement Probabilities (#35251 ) * DataCollatorForLanguageModeling class was updated with new parameters that provides more control over the token masking and relacing * DataCollatorForLanguageModeling class was updated with new parameters that provides more control over the token masking and relacing * Addressed review comments, modified the docstring and made a test for the DataCollatorForLanguageModeling	2025-01-14 17:01:10 +00:00
Fanli Lin	2fa876d2d8	[tests] make cuda-only tests device-agnostic (#35607 ) * intial commit * remove unrelated files * further remove * Update test_trainer.py * fix style	2025-01-13 14:48:39 +01:00
Zach Mueller	b02828e4af	Let `EarlyStoppingCallback` not require `load_best_model_at_end` (#35101 ) * Bookmark * Add warning	2025-01-10 10:25:32 -05:00
Zach Mueller	1211e616a4	Use inherit tempdir makers for tests + fix failing DS tests (#35600 ) * Use existing APIs to make tempdir folders * Fixup deepspeed too * output_dir -> tmp_dir	2025-01-10 10:01:58 -05:00
nhamanasu	b32938aeee	Fix all output_dir in test_trainer.py to use tmp_dir (#35266 ) * update codecarbon * replace directly-specified-test-dirs with tmp_dir * pass tmp_dir to all get_regression_trainer * test_trainer.py: Use tmp_dir consistently for all output_dir arguments * fix some with...as tmp_dir blocks * reflect the comments to improve test_trainer.py * refresh .gitignore	2025-01-08 19:44:39 +01:00
Sean (Seok-Won) Yi	88e18b3c63	Update doc for `metric_for_best_model` when `save_strategy="best"`. (#35389 ) * Updated docstring for _determine_best_metric. * Updated docstring for metric_for_best_model. * Added test case for save strategy. * Updated incorrect test case. * Changed eval_strategy to match save_strategy. * Separated test cases for metric. * Allow load_best_model when save_strategy == "best". * Updated docstring for metric_for_best_model.	2025-01-08 16:32:35 +01:00
kang sheng	1ccca8f48c	Fix GA loss bugs and add unit test (#35121 ) * fix GA bugs and add unit test * narrow down model loss unit test diff gap * format code to make ruff happy * send num_items_in_batch argument to decoder * fix GA loss bug in BertLMHeadModel * use TinyStories-33M to narrow down diff gap * fotmat code * missing .config * avoid add extra args --------- Co-authored-by: kangsheng <kangsheng@meituan.com>	2024-12-09 09:57:41 +01:00
Yih-Dar	b0a51e5cff	Fix flaky Hub CI (`test_trainer.py`) (#35062 ) * fix * Update src/transformers/testing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * fix * fix * fix * fix * fix * fix * fix * fix * check * check * check * check * check * check * Update src/transformers/testing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * Update src/transformers/testing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * check * check * check * Final space * Final adjustment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Lucain <lucainp@gmail.com>	2024-12-05 17:02:27 +01:00
AbdelKarim ELJANDOUBI	8d50fda644	Remove FSDP wrapping from sub-models. (#34452 ) * Remove FSDP wrapping from sub-models. * solve conflict trainer.py * make fixup * add unit test for fsdp_auto_wrap_policy when using auto_find_batch_size * put back extract_model_from_parallel * use transformers unwrap_model	2024-11-15 23:00:03 +01:00
Raushan Turganbay	187439c3fa	VLM: special multimodal Tokenizer (#34461 ) * kinda works * update * add tests * update * use special tokens in processors * typo * fix copies * fix * fix moshi after rebase * update * fix tests * update * Update docs/source/en/main_classes/tokenizer.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update docs * test for load time adding tokens * fix some more tests which are now fetched better * one more fix --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-11-04 16:37:51 +01:00
Zach Mueller	ef976a7e18	Update trainer for easier handling of accumulate, compile fixes, and proper reporting (#34511 ) * Update trainer for easier handling of accumulate + proper reporting * test * Fixup tests * Full fix * Fix style * rm comment * Fix tests * Minimize test + remove py 311 check * Unused import * Forward contrib credits from discussions * Fix reported metrics * Refactor, good as it's going to get * rm pad tok id check * object detection and audio are being annoying * Fin * Fin x2 --------- Co-authored-by: Gyanateet Dutta <Ryukijano@users.noreply.github.com>	2024-11-04 07:47:34 -05:00
Sean (Seok-Won) Yi	c1753436db	New option called `"best"` for `args.save_strategy`. (#31817 ) * Add _determine_best_metric and new saving logic. 1. Logic to determine the best logic was separated out from `_save_checkpoint`. 2. In `_maybe_log_save_evaluate`, whether or not a new best metric was achieved is determined after each evaluation, and if the save strategy is "best' then the TrainerControl is updated accordingly. * Added SaveStrategy. Same as IntervalStrategy, but with a new attribute called BEST. * IntervalStrategy -> SaveStrategy * IntervalStratgy -> SaveStrategy for save_strat. * Interval -> Save in docstring. * Updated docstring for save_strategy. * Added SaveStrategy and made according changes. `save_strategy` previously followed `IntervalStrategy` but now follows `SaveStrategy`. Changes were made accordingly to the code and the docstring. * Changes from `make fixup`. * Removed redundant metrics argument. * Added new test_save_best_checkpoint test. 1. Checks for both cases where `metric_for_best_model` is explicitly provided and when it's not provided. 2. The first case should have two checkpoints saved, whereas the second should have three saved. * Changed should_training_end saving logic. The Trainer saves a checkpoints at the end of training by default as long as `save_strategy != SaveStrategy.NO`. This condition was modified to include `SaveStrategy.BEST` because it would be counterintuitive that we'd only want the best checkpoint to be saved but the last one is as well. * `args.metric_for_best_model` default to loss. * Undo metric_for_best_model update. * Remove checking metric_for_best_model. * Added test cases for loss and no metric. * Added error for metric and changed default best_metric. * Removed unused import. * `new_best_metric` -> `is_new_best_metric` Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Applied `is_new_best_metric` to all. Changes were made for consistency and also to fix a potential bug. --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2024-10-28 16:02:22 +01:00
AbdelKarim ELJANDOUBI	8b3b9b48fc	exclude fsdp from delay_optimizer_creation (#34140 ) * exclude fsdp from delay_optimizer_creation * add test case for trainer: FSDP mode and fp8 as mixed precision * rearrange imports * ruff formatted * adapt _init_fsdp to fp8 * use _init_fsdp only when resume_from_checkpoint * In case of FDP, self.layer will be CheckpointWrapper which has no len() method * delete _init_fsdp * solve conflict * fix conflict * make fixup	2024-10-28 13:50:16 +01:00
Zach Mueller	6ba31a8a94	Enable users to use their own loss functions + deal with prefetching for grad accum (#34198 ) * bookmark * Bookmark * Bookmark * Actually implement * Pass in kwarg explicitly * Adjust for if we do or don't have labels * Bookmark fix for od * bookmark * Fin * closer * Negate accelerate grad accum div * Fixup not training long enough * Add in compute_loss to take full model output * Document * compute_loss -> compute_loss_fn * Add a test * Refactor * Refactor * Uncomment tests * Update tests/trainer/test_trainer.py Co-authored-by: Daniel Han <danielhanchen@gmail.com> --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-10-17 17:01:56 -04:00
Marc Sun	3f06f95ebe	Revert "Fix FSDP resume Initialization issue" (#34193 ) Revert "Fix FSDP resume Initialization issue (#34032)" This reverts commit `4de1bdbf63`.	2024-10-16 15:25:18 -04:00

1 2 3 4

184 Commits