[PEFT] Peft integration alternative design (#25077)
* a draft version * v2 integration * fix * make it more generic and works for IA3 * add set adapter and multiple adapters support * fixup * adapt a bit * oops * oops * oops * adapt more * fix * add more refactor * now works with model class * change it to instance method as it causes issues with `jit`. * add CR * change method name * add `add_adapter` method * clean up * Update src/transformers/adapters/peft_mixin.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * add moe utils * fixup * Update src/transformers/adapters/peft_mixin.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * adapt * oops * fixup * add is_peft_available * remove `requires_backend` * trainer compatibility * fixup + docstring * more details * trigger CI * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_utils.py * fixup + is_main_process * added `save_peft_format` in save_pretrained * up * fix nits here and there * nits here and there. * docs * revert `encoding="utf-8"` * comment * added slow tests before the PEFT release. * fixup and nits * let's be on the safe zone * added more comments * v1 docs * add remaining docs * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * move to `lib_integrations` * fixup * this time fixup * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * address final comments * refactor to use `token` * add PEFT to DockerFile for slow tests. * added pipeline support. --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This commit is contained in:
@@ -19,6 +19,8 @@
|
||||
title: Train with a script
|
||||
- local: accelerate
|
||||
title: Set up distributed training with 🤗 Accelerate
|
||||
- local: peft
|
||||
title: Load and train adapters with 🤗 PEFT
|
||||
- local: model_sharing
|
||||
title: Share your model
|
||||
- local: transformers_agents
|
||||
|
||||
216
docs/source/en/peft.md
Normal file
216
docs/source/en/peft.md
Normal file
@@ -0,0 +1,216 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
-->
|
||||
|
||||
# Load adapters with 🤗 PEFT
|
||||
|
||||
[[open-in-colab]]
|
||||
|
||||
[Parameter-Efficient Fine Tuning (PEFT)](https://huggingface.co/blog/peft) methods freeze the pretrained model parameters during fine-tuning and add a small number of trainable parameters (the adapters) on top of it. The adapters are trained to learn task-specific information. This approach has been shown to be very memory-efficient with lower compute usage while producing results comparable to a fully fine-tuned model.
|
||||
|
||||
Adapters trained with PEFT are also usually an order of magnitude smaller than the full model, making it convenient to share, store, and load them.
|
||||
|
||||
<div class="flex flex-col justify-center">
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/PEFT-hub-screenshot.png"/>
|
||||
<figcaption class="text-center">The adapter weights for a OPTForCausalLM model stored on the Hub are only ~6MB compared to the full size of the model weights, which can be ~700MB.</figcaption>
|
||||
</div>
|
||||
|
||||
If you're interested in learning more about the 🤗 PEFT library, check out the [documentation](https://huggingface.co/docs/peft/index).
|
||||
|
||||
## Setup
|
||||
|
||||
Get started by installing 🤗 PEFT:
|
||||
|
||||
```bash
|
||||
pip install peft
|
||||
```
|
||||
|
||||
If you want to try out the brand new features, you might be interested in installing the library from source:
|
||||
|
||||
```bash
|
||||
pip install git+https://github.com/huggingface/peft.git
|
||||
```
|
||||
|
||||
## Supported PEFT models
|
||||
|
||||
🤗 Transformers natively supports some PEFT methods, meaning you can load adapter weights stored locally or on the Hub and easily run or train them with a few lines of code. The following methods are supported:
|
||||
|
||||
- [Low Rank Adapters](https://huggingface.co/docs/peft/conceptual_guides/lora)
|
||||
- [IA3](https://huggingface.co/docs/peft/conceptual_guides/ia3)
|
||||
- [AdaLoRA](https://arxiv.org/abs/2303.10512)
|
||||
|
||||
If you want to use other PEFT methods, such as prompt learning or prompt tuning, or about the 🤗 PEFT library in general, please refer to the [documentation](https://huggingface.co/docs/peft/index).
|
||||
|
||||
|
||||
## Load a PEFT adapter
|
||||
|
||||
To load and use a PEFT adapter model from 🤗 Transformers, make sure the Hub repository or local directory contains an `adapter_config.json` file and the adapter weights, as shown in the example image above. Then you can load the PEFT adapter model using the `AutoModelFor` class. For example, to load a PEFT adapter model for causal language modeling:
|
||||
|
||||
1. specify the PEFT model id
|
||||
2. pass it to the [`AutoModelForCausalLM`] class
|
||||
|
||||
```py
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
peft_model_id = "ybelkada/opt-350m-lora"
|
||||
model = AutoModelForCausalLM.from_pretrained(peft_model_id)
|
||||
```
|
||||
|
||||
<Tip>
|
||||
|
||||
You can load a PEFT adapter with either an `AutoModelFor` class or the base model class like `OPTForCausalLM` or `LlamaForCausalLM`.
|
||||
|
||||
</Tip>
|
||||
|
||||
You can also load a PEFT adapter by calling the `load_adapter` method:
|
||||
|
||||
```py
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_id = "facebook/opt-350m"
|
||||
peft_model_id = "ybelkada/opt-350m-lora"
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id)
|
||||
model.load_adapter(peft_model_id)
|
||||
```
|
||||
|
||||
## Load in 8bit or 4bit
|
||||
|
||||
The `bitsandbytes` integration supports 8bit and 4bit precision data types, which are useful for loading large models because it saves memory (see the `bitsandbytes` integration [guide](./quantization#bitsandbytes-integration) to learn more). Add the `load_in_8bit` or `load_in_4bit` parameters to [`~PreTrainedModel.from_pretrained`] and set `device_map="auto"` to effectively distribute the model to your hardware:
|
||||
|
||||
```py
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
peft_model_id = "ybelkada/opt-350m-lora"
|
||||
model = AutoModelForCausalLM.from_pretrained(peft_model_id, device_map="auto", load_in_8bit=True)
|
||||
```
|
||||
|
||||
## Add a new adapter
|
||||
|
||||
You can use [`~peft.PeftModel.add_adapter`] to add a new adapter to a model with an existing adapter as long as the new adapter is the same type as the current one. For example, if you have an existing LoRA adapter attached to a model:
|
||||
|
||||
```py
|
||||
from transformers import AutoModelForCausalLM, OPTForCausalLM, AutoTokenizer
|
||||
from peft import PeftConfig
|
||||
|
||||
model_id = "facebook/opt-350m"
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id)
|
||||
|
||||
lora_config = LoraConfig(
|
||||
target_modules=["q_proj", "k_proj"],
|
||||
init_lora_weights=False
|
||||
)
|
||||
|
||||
model.add_adapter(lora_config, adapter_name="adapter_1")
|
||||
```
|
||||
|
||||
To add a new adapter:
|
||||
|
||||
```py
|
||||
# attach new adapter with same config
|
||||
model.add_adapter(lora_config, adapter_name="adapter_2")
|
||||
```
|
||||
|
||||
Now you can use [`~peft.PeftModel.set_adapter`] to set which adapter to use:
|
||||
|
||||
```py
|
||||
# use adapter_1
|
||||
model.set_adapter("adapter_1")
|
||||
output = model.generate(**inputs)
|
||||
print(tokenizer.decode(output_disabled[0], skip_special_tokens=True))
|
||||
|
||||
# use adapter_2
|
||||
model.set_adapter("adapter_2")
|
||||
output_enabled = model.generate(**inputs)
|
||||
print(tokenizer.decode(output_enabled[0], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
## Enable and disable adapters
|
||||
|
||||
Once you've added an adapter to a model, you can enable or disable the adapter module. To enable the adapter module:
|
||||
|
||||
```py
|
||||
from transformers import AutoModelForCausalLM, OPTForCausalLM, AutoTokenizer
|
||||
from peft import PeftConfig
|
||||
|
||||
model_id = "facebook/opt-350m"
|
||||
adapter_model_id = "ybelkada/opt-350m-lora"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
text = "Hello"
|
||||
inputs = tokenizer(text, return_tensors="pt")
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id)
|
||||
peft_config = PeftConfig.from_pretrained(adapter_model_id)
|
||||
|
||||
# to initiate with random weights
|
||||
peft_config.init_lora_weights = False
|
||||
|
||||
model.add_adapter(peft_config)
|
||||
model.enable_adapters()
|
||||
output = model.generate(**inputs)
|
||||
```
|
||||
|
||||
To disable the adapter module:
|
||||
|
||||
```py
|
||||
model.disable_adapters()
|
||||
output = model.generate(**inputs)
|
||||
```
|
||||
|
||||
## Train a PEFT adapter
|
||||
|
||||
PEFT adapters are supported by the [`Trainer`] class so that you can train an adapter for your specific use case. It only requires adding a few more lines of code. For example, to train a LoRA adapter:
|
||||
|
||||
<Tip>
|
||||
|
||||
If you aren't familiar with fine-tuning a model with [`Trainer`], take a look at the [Fine-tune a pretrained model](training) tutorial.
|
||||
|
||||
</Tip>
|
||||
|
||||
1. Define your adapter configuration with the task type and hyperparameters (see [`~peft.LoraConfig`] for more details about what the hyperparameters do).
|
||||
|
||||
```py
|
||||
from peft import LoraConfig
|
||||
|
||||
peft_config = LoraConfig(
|
||||
lora_alpha=16,
|
||||
lora_dropout=0.1,
|
||||
r=64,
|
||||
bias="none",
|
||||
task_type="CAUSAL_LM",
|
||||
)
|
||||
```
|
||||
|
||||
2. Add adapter to the model.
|
||||
|
||||
```py
|
||||
model.add_adapter(peft_config)
|
||||
```
|
||||
|
||||
3. Now you can pass the model to [`Trainer`]!
|
||||
|
||||
```py
|
||||
trainer = Trainer(model=model, ...)
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
To save your trained adapter and load it back:
|
||||
|
||||
```py
|
||||
model.save_pretrained(save_dir)
|
||||
model = AutoModelForCausalLM.from_pretrained(save_dir)
|
||||
```
|
||||
|
||||
<!--
|
||||
TODO: (@younesbelkada @stevhliu)
|
||||
- Link to PEFT docs for further details
|
||||
- Trainer
|
||||
- 8-bit / 4-bit examples ?
|
||||
-->
|
||||
Reference in New Issue
Block a user