Enhancing SFT Training Efficiency Using Packing and FlashAttention2 with Position IDs (#31629)
* add DataCollatorBatchFlattening * Update data_collator.py * change name * new FA2 flow if position_ids is provided * add comments * minor fix * minor fix data collator * add test cases for models * add test case for data collator * remove extra code * formating for ruff check and check_repo.py * ruff format ruff format tests src utils * custom_init_isort.py
This commit is contained in:
@@ -66,3 +66,8 @@ Examples of use can be found in the [example scripts](../examples) or [example n
|
||||
- numpy_mask_tokens
|
||||
- tf_mask_tokens
|
||||
- torch_mask_tokens
|
||||
|
||||
## DataCollatorWithFlattening
|
||||
|
||||
[[autodoc]] data.data_collator.DataCollatorWithFlattening
|
||||
|
||||
|
||||
Reference in New Issue
Block a user