Enhancing SFT Training Efficiency Using Packing and FlashAttention2 with Position IDs (#31629)

* add DataCollatorBatchFlattening

* Update data_collator.py

* change name

* new FA2 flow if position_ids is provided

* add comments

* minor fix

* minor fix data collator

* add test cases for models

* add test case for data collator

* remove extra code

* formating for ruff check and check_repo.py

* ruff format

ruff format tests src utils

* custom_init_isort.py
This commit is contained in:
RhuiDih
2024-07-23 21:56:41 +08:00
committed by GitHub
parent 7d92009af6
commit 9cf4f2aa9a
20 changed files with 226 additions and 0 deletions

View File

@@ -66,3 +66,8 @@ Examples of use can be found in the [example scripts](../examples) or [example n
- numpy_mask_tokens
- tf_mask_tokens
- torch_mask_tokens
## DataCollatorWithFlattening
[[autodoc]] data.data_collator.DataCollatorWithFlattening