Sourab Mangrulkar
|
350c5d1566
|
Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA (#29587)
* fsdp+qlora related changes
* fixes
* Update quantization_config.py
* support fsdp+qlora and dsz3+qlora
* Update quantization_config.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* handle fsdp+qlora and dsz3+qlora correctly while model loading
* fix param count
* quality
* fsdp related changes
* fsdp changes only when using LoRA/QLoRA
* add accelerate version check
* refactor, update min accelerate version and add tests
1. Update minimum accelerate version to 0.26.0
2. Clean the trainer wrt accelerate version checks
3. FSDP refactor and test for fsdp config
4. use `itemsize` instead of `dtype2bytes` dict
* fix test
* Address comments
Co-Authored-By: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* fix the conditional flag
* fix conditional flag
* address comments
Co-Authored-By: Zach Mueller <7831895+muellerzr@users.noreply.github.com>
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Zach Mueller <7831895+muellerzr@users.noreply.github.com>
|
2024-03-13 22:03:02 +05:30 |
|
Lysandre Debut
|
f497f564bb
|
Update all references to canonical models (#29001)
* Script & Manual edition
* Update
|
2024-02-16 08:16:58 +01:00 |
|
Sourab Mangrulkar
|
238d2e3c44
|
fix resuming from ckpt when using FSDP with FULL_STATE_DICT (#27891)
* fix resuming from ckpt when suing FSDP with FULL_STATE_DICT
* update tests
* fix tests
|
2023-12-16 19:41:43 +05:30 |
|
Hz, Ji
|
82c7e87987
|
device agnostic fsdp testing (#27120)
* make fsdp test cases device agnostic
* make style
|
2023-11-01 07:17:06 +01:00 |
|
Yih-Dar
|
3e93dd295b
|
Skip TrainerIntegrationFSDP::test_basic_run_with_cpu_offload if torch < 2.1 (#26764)
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
|
2023-10-12 18:22:09 +02:00 |
|
Sourab Mangrulkar
|
86ffd5ffa2
|
fix name error when accelerate is not available (#26278)
* fix name error when accelerate is not available
* fix `is_fsdp_available`
|
2023-09-20 08:02:55 +02:00 |
|
Sourab Mangrulkar
|
382ba670ed
|
FSDP tests and checkpointing fixes (#26180)
* add fsdp tests
* Update test_fsdp.py
* Update test_fsdp.py
* fixes
* checks
* Update trainer.py
* fix
* fixes for saving/resuming checkpoints
* fixes
* add tests and delete debug statements
* fixing tests
* Update test_fsdp.py
* fix tests
* fix tests
* minor nits
* fix code style and quality
* refactor and modularize test code
* reduce the time of tests
* reduce the test time
* fix test
* reduce test time
* reduce test time
* fix failing tests
* fix
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* resolve comments
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
|
2023-09-20 10:26:16 +05:30 |
|