feat: add flexible Liger Kernel configuration to TrainingArguments (#38911)
* feat: add flexible Liger Kernel configuration to TrainingArguments
Add support for granular Liger Kernel configuration through a new
`liger_kernel_config` parameter in TrainingArguments. This allows users
to selectively enable/disable specific kernels (rope, swiglu, cross_entropy,
etc.) instead of the current approach that rely on default configuration.
Features:
- Add `liger_kernel_config` dict parameter to TrainingArguments
- Support selective kernel application for all supported models
- Maintain full backward compatibility with existing `use_liger_kernel` flag
Example usage:
```python
TrainingArguments(
use_liger_kernel=True,
liger_kernel_config={
"rope": True,
"swiglu": True,
"cross_entropy": False,
"fused_linear_cross_entropy": True
}
)
Closes #38905
* Address comments and update Liger section in Trainer docs
This commit is contained in:
committed by
GitHub
parent
89b35be618
commit
797860c68c
@@ -493,6 +493,33 @@ training_args = TrainingArguments(
|
||||
)
|
||||
```
|
||||
|
||||
You can also configure which specific kernels to apply using the `liger_kernel_config` parameter. This dict is passed as keyword arguments to the `_apply_liger_kernel_to_instance` function, allowing fine-grained control over kernel usage. Available options vary by model but typically include: `rope`, `swiglu`, `cross_entropy`, `fused_linear_cross_entropy`, `rms_norm`, etc.
|
||||
|
||||
```py
|
||||
from transformers import TrainingArguments
|
||||
|
||||
# Apply only specific kernels
|
||||
training_args = TrainingArguments(
|
||||
output_dir="your-model",
|
||||
learning_rate=2e-5,
|
||||
per_device_train_batch_size=16,
|
||||
per_device_eval_batch_size=16,
|
||||
num_train_epochs=2,
|
||||
weight_decay=0.01,
|
||||
eval_strategy="epoch",
|
||||
save_strategy="epoch",
|
||||
load_best_model_at_end=True,
|
||||
push_to_hub=True,
|
||||
use_liger_kernel=True,
|
||||
liger_kernel_config={
|
||||
"rope": True,
|
||||
"cross_entropy": True,
|
||||
"rms_norm": False, # Don't apply Liger's RMSNorm kernel
|
||||
"swiglu": True,
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### NEFTune
|
||||
|
||||
[NEFTune](https://hf.co/papers/2310.05914) adds noise to the embedding vectors during training to improve model performance. Enable it in [`Trainer`] with the `neftune_noise_alpha` parameter in [`TrainingArguments`] to control how much noise is added.
|
||||
|
||||
Reference in New Issue
Block a user