Docs / Quantization: Replace all occurences of load_in_8bit with bnb config (#31136)

Replace all occurences of `load_in_8bit` with bnb config
This commit is contained in:
Younes Belkada
2024-05-30 16:47:35 +02:00
committed by GitHub
parent cda9c82a63
commit f5590deaa8
11 changed files with 37 additions and 37 deletions

View File

@@ -357,10 +357,10 @@ Int8混合精度行列分解は、行列乗算を2つのストリームに分割
必要なライブラリをインストールした後、ミックス 8 ビットモデルを読み込む方法は次の通りです:
```py
from transformers import AutoModelForCausalLM
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
model_name = "bigscience/bloom-2b5"
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
```
以下はシンプルな例です:
@@ -370,11 +370,11 @@ model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto",
```py
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
model_name = "bigscience/bloom-2b5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
prompt = "Hello, my llama is cute"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
@@ -388,7 +388,7 @@ outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
```py
model_name = "bigscience/bloom-2b5"
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
```
`accelerate`を使用して各GPUに割り当てるGPU RAMを制御する際には、以下のように`max_memory`引数を使用します: