HuggingFace_transformer

Author	SHA1	Message	Date
Lucain	6232c380f2	Fix `.push_to_hub` and cleanup `get_full_repo_name` usage (#25120 ) * Fix .push_to_hub and cleanup get_full_repo_name usage * Do not rely on Python bool conversion magic * request changes	2023-07-28 11:40:08 +02:00
Zach Mueller	a1c4954d25	🚨🚨🚨Change default from `adamw_hf` to `adamw_torch` 🚨🚨🚨 (#25109 ) * Change defaults * Sylvain's comments	2023-07-27 09:11:28 -04:00
Xuehai Pan	6bc61aa7af	Set `TF32` flag for PyTorch cuDNN backend (#25075 )	2023-07-25 08:04:48 -04:00
Zach Mueller	3b734f5042	Add dispatch_batches to training arguments (#25038 ) * Dispatch batches * Copy items	2023-07-24 09:27:19 -04:00
Sylvain Gugger	a6484c89b9	Fix type annotation for deepspeed training arg (#24988 )	2023-07-21 09:42:05 -04:00
Sourab Mangrulkar	f4eb459ef2	fsdp fixes and enhancements (#24980 ) * fix fsdp prepare to remove the warnings and fix excess memory usage * Update training_args.py * parity for FSDP+XLA * Update trainer.py	2023-07-21 17:52:48 +05:30
Shauray Singh	e75cb0cb3c	fix type annotations for arguments in training_args (#24550 ) * testing * example script * fix typehinting * some tests * make test * optional update * Union of arguments * does this fix the issue * remove reports * set default to False * documentation change * None support * does not need None * Fix typing annotations for FSDP and DeepSpeed in TrainingArguments (#24549) * Fix typing annotations for FSDP and DeepSpeed in TrainingArguments * Change dict to Dict * Revert "Fix typing annotations for FSDP and DeepSpeed in TrainingArguments" (#24574) Revert "Fix typing annotations for FSDP and DeepSpeed in TrainingArguments (#24549)" This reverts commit `c5e29d4381`. * Fix typing annotations for FSDP and DeepSpeed in TrainingArguments (#24549) * Fix typing annotations for FSDP and DeepSpeed in TrainingArguments * Change dict to Dict * merge * hacky fix * fixup --------- Co-authored-by: Max Ryabinin <mryabinin0@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-07-20 10:13:13 -04:00
Madhava Jay	aa4afa67f3	Fixed issue where ACCELERATE_USE_CPU="False" results in bool(True) (#24907 ) - This results in cpu mode on Apple Silicon mps	2023-07-19 07:30:01 -04:00
Zach Mueller	476be08c4a	Check for accelerate env var when doing CPU only (#24890 ) Check for use-cpu	2023-07-18 18:40:37 -04:00
Zach Mueller	a982c0225e	Disable ipex env var if false (#24885 ) Disable ipex if in use	2023-07-18 16:07:02 -04:00
statelesshz	9c875839c0	add ascend npu accelerator support (#24879 ) * Add Ascend NPU accelerator support * fix style warining	2023-07-18 08:20:32 -04:00
Marc Sun	9dc965bb40	deprecate no_cuda (#24863 ) * deprecate no_cuda * style * remove doc * remove doc 2 * fix style	2023-07-17 14:52:28 -04:00
statelesshz	0f4502d335	Remove deprecated codes (#24837 ) * remove `xpu_backend` training argument * always call `contextlib.nullcontext()` since transformers updated to python3.8 * these codes will not be executed	2023-07-17 14:45:59 -04:00
statelesshz	8ba26c18cf	deprecate `sharded_ddp` training argument (#24825 ) * deprecate fairscale's ShardedDDP * fix code style * roll back * deprecate the `sharded_ddp` training argument --------- Co-authored-by: jihuazhong <jihuazhong1@huawei.com>	2023-07-17 06:57:42 -04:00
Bram Vanroy	6ba4d5de3a	[DOC] Clarify relationshi load_best_model_at_end and save_total_limit (#24614 ) * Update training_args.py Clarify the relationship between `load_best_model_at_end` and `save_total_limit`. * fix: faulty quotes * make quality * Update src/transformers/training_args.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * DOCS: add explicit `True` * DOCS: make style/quality --------- Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-07-13 07:36:16 -04:00
Zach Mueller	0284285501	Fix pad across processes dim in trainer and not being able to set the timeout (#24775 ) * dim, and rm copy * Don't rm copy for now * Oops * pad index * Should be a working test * Tickle down ddp timeout * Put fix back in now that testing locally is done * Better comment specifying timeout Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-07-12 10:01:51 -04:00
Sylvain Gugger	2dc5e1a120	Revert "Fix typing annotations for FSDP and DeepSpeed in TrainingArguments" (#24574 ) Revert "Fix typing annotations for FSDP and DeepSpeed in TrainingArguments (#24549)" This reverts commit `c5e29d4381`.	2023-06-29 08:14:43 -04:00
Max Ryabinin	c5e29d4381	Fix typing annotations for FSDP and DeepSpeed in TrainingArguments (#24549 ) * Fix typing annotations for FSDP and DeepSpeed in TrainingArguments * Change dict to Dict	2023-06-28 10:36:17 -04:00
Meghan Cowan	be2d9f2e47	Fix tpu_metrics_debug (#24452 ) fix for tpu metrics debugs string	2023-06-26 10:59:07 +01:00
Zach Mueller	127e81c272	Remove redundant code from TrainingArgs (#24401 ) Remove redundant code	2023-06-21 11:51:27 -04:00
Zach Mueller	1a6fb930fb	Clean up dist import (#24402 )	2023-06-21 11:19:42 -04:00
Bearnardd	4c6e429589	fix type annotation for debug arg (#24033 ) * fix type annotation for debug arg * fix TypeErorr	2023-06-21 11:42:21 +01:00
Teven	ee88ae5994	Adding ddp_broadcast_buffers argument to Trainer (#24326 ) adding ddp_broadcast_buffers argument	2023-06-16 15:14:03 -04:00
Sourab Mangrulkar	3723329d01	deprecate `use_mps_device` (#24239 )	2023-06-13 19:48:36 +05:30
Zachary Mueller	5eb3d3c702	Up pinned accelerate version (#24089 ) * Min accelerate * Also min version * Min accelerate * Also min version * To different minor version * Empty	2023-06-07 16:21:51 -04:00
Zachary Mueller	84bac652f3	Move import check to before state reset (#23906 ) * Move import check to before state reset * Guard better	2023-05-31 10:49:43 -04:00
Sourab Mangrulkar	a73b1d59a3	accelerate deepspeed and gradient accumulation integrate (#23236 ) * mixed precision support via accelerate * fix issues * fix for the sharded ddp case * fix flax and tf failing tests * `refactor the place to create `Accelerator` object * move ddp prep to accelerate * fix 😅 * resolving comments * move fsdp handling to accelerate * fixex * fix saving * shift torch dynamo handling to accelerate * shift deepspeed integration and save & load utils to accelerate * fix accelerate launcher support * oops * fix 🐛 * save ckpt fix * Trigger CI * nasty 🐛 😅 * as deepspeed needs grad_acc fixes, transfer grad_acc to accelerate * make tests happy * quality ✨ * loss tracked needs to account for grad_acc * fixing the deepspeed tests * quality ✨ * 😅😅😅 * tests 😡 * quality ✨ * Trigger CI * resolve comments and fix the issue with the previous merge from branch * Trigger CI * accelerate took over deepspeed integration --------- Co-authored-by: Stas Bekman <stas@stason.org>	2023-05-31 15:16:22 +05:30
Sourab Mangrulkar	03db591047	shift torch dynamo handling to accelerate (#23168 ) * mixed precision support via accelerate * fix issues * fix for the sharded ddp case * fix flax and tf failing tests * `refactor the place to create `Accelerator` object * move ddp prep to accelerate * fix 😅 * resolving comments * move fsdp handling to accelerate * fixex * fix saving * shift torch dynamo handling to accelerate	2023-05-31 14:42:07 +05:30
Sourab Mangrulkar	0b774074a5	move fsdp handling to accelerate (#23158 ) * mixed precision support via accelerate * fix issues * fix for the sharded ddp case * fix flax and tf failing tests * `refactor the place to create `Accelerator` object * move ddp prep to accelerate * fix 😅 * resolving comments * move fsdp handling to accelerate * fixex * fix saving	2023-05-31 14:10:46 +05:30
Sourab Mangrulkar	9f0646a555	Smangrul/accelerate mp integrate (#23148 ) * mixed precision support via accelerate * fix issues * fix for the sharded ddp case * fix flax and tf failing tests * `refactor the place to create `Accelerator` object * address comments by removing debugging print statements	2023-05-31 12:27:51 +05:30
Wang, Yi	b7b729b38d	no_cuda does not take effect in non distributed environment (#23795 ) Signed-off-by: Wang, Yi <yi.a.wang@intel.com>	2023-05-26 10:47:51 -04:00
Zachary Mueller	75bbf20bce	Fix sagemaker DP/MP (#23681 ) * Check for use_sagemaker_dp * Add a check for is_sagemaker_mp when setting _n_gpu again. Should be last broken thing * Try explicit check? * Quality	2023-05-24 15:51:09 -04:00
Tim Dettmers	796162c512	Paged Optimizer + Lion Optimizer for Trainer (#23217 ) * Added lion and paged optimizers and made original tests pass. * Added tests for paged and lion optimizers. * Added and fixed optimizer tests. * Style and quality checks. --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com>	2023-05-24 12:53:28 +02:00
Zachary Mueller	fe34486f12	Muellerzr fix deepspeed (#23657 ) * Fix deepspeed recursion * Better fix	2023-05-22 11:22:54 -04:00
Zachary Mueller	b455ad0a64	Fix parallel mode check (#23409 ) * Fix sagemaker/distributed state * Fix correctly * Bring back -1 * Bring back local rank for distributed check * better version * Cleanest option	2023-05-19 12:44:24 -04:00
Zachary Mueller	45e3d6496a	Update error message when Accelerate isn't installed (#23373 ) Update error	2023-05-17 11:16:02 -04:00
Konstantin Dobler	650a71e157	Support ratios for `logging_steps`, `eval_steps`, and `save_steps` (#23235 ) * Ratio option for `logging_steps`, `eval_steps`, `save_steps` * Add guards if arguments are not set * Add more detailed comments + formatting * Update src/transformers/training_args.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Convert args values to `int` if bigger than 1 * `black` * `make fixup` --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-05-09 13:05:13 -04:00
Zachary Mueller	9884862383	Depricate xpu_backend for ddp_backend (#23085 ) * Depricate xpu_backend for ddp_backend * Typo * Only do a minor deprecation, no need for major Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-05-01 09:44:47 -04:00
Maxime Méloux	9b435204b1	Add Trainer support for ReduceLROnPlateau (#23010 ) * Add Trainer support for ReduceLROnPlateau Fixes #16503 * Remove training argument and add default instance --------- Co-authored-by: mmeloux <maxime.meloux@loria.fr>	2023-04-28 09:17:30 -04:00
Zachary Mueller	8b129030cb	Bring back PartialState DeepSpeed (#22921 ) * Bring back deepspeed integration * Branchname * Self-scheduled * newline * Use deepspeed env var * Remove comment * Del env var after partialstate	2023-04-26 15:35:59 -04:00
Zachary Mueller	5764e67cee	Revert DeepSpeed stuff from accelerate integration (#22899 )	2023-04-20 14:23:59 -04:00
Zachary Mueller	a8aad0ec93	Fixup multigpu local_rank (#22869 ) Fixup multigpu tests	2023-04-19 14:37:16 -04:00
Zachary Mueller	5bb4ec6233	Raise err if minimum Accelerate version isn't available (#22841 ) * Add warning about accelerate * Version block Accelerate * Include parse * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Check partial state * Update param --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-04-18 14:25:02 -04:00
Zachary Mueller	aec10d162f	Update accelerate version + warning check fix (#22833 )	2023-04-18 12:51:32 -04:00
Zachary Mueller	03462875cc	Introduce `PartialState` as the device handler in the `Trainer` (#22752 ) * Use accelerate for device management * Add accelerate to setup Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-04-17 15:09:45 -04:00
Stas Bekman	d85bf95436	[trainer] update url (#22747 ) * [trainer] update url * style	2023-04-13 09:23:55 -07:00
Michael Benayoun	10fab90fe2	`torch.distributed` group initialization for `torch_neuron` disabled when `optimum-neuron` is installed (#22728 ) * Make the process group initialization not happen if optimum_neuron is installed * Add warning * Remove list and added warning	2023-04-12 17:42:50 +02:00
Viktor Scherbakov	871598be55	Implemented safetensors checkpoints save/load for Trainer (#22498 ) * implemented safetensors save/load * remove duplicated file * added tests * more tests * style fix * fix tf tests * change to list comprehension Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * review fixes + safe load for sharded checkpoint * style fix * remove rogue import * remove partial to avoid undefined exception * use naming alias instead of safetensors.torch * fix safe sharding in tests * grammar Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * update docs Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * update docs Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * minor corrections * style --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-04-04 09:05:04 -04:00
Stas Bekman	500fce073b	[Trainer] add disclaimer that full_determinism is slow (#22368 )	2023-03-24 12:46:41 -07:00
heya5	cf0af9a31b	[Trainer] Add optional communication backends for torch.distributed when using GPU (#22247 ) Update training_args.py	2023-03-20 09:17:34 -04:00

1 2 3 4 5 ...

268 Commits