From 3a8769f6a9672bfe6e499923a8ebb6fb25d8393b Mon Sep 17 00:00:00 2001 From: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Fri, 22 Dec 2023 10:18:32 +0100 Subject: [PATCH] =?UTF-8?q?[`Docs`]=C2=A0Add=204-bit=20serialization=20doc?= =?UTF-8?q?s=20(#28182)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * add 4-bit serialization docs * up * up --- docs/source/en/quantization.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/source/en/quantization.md b/docs/source/en/quantization.md index 1c3f383551..3a1c542c0b 100644 --- a/docs/source/en/quantization.md +++ b/docs/source/en/quantization.md @@ -345,7 +345,7 @@ model_4bit = AutoModelForCausalLM.from_pretrained("facebook/opt-350m", load_in_4 model_4bit.model.decoder.layers[-1].final_layer_norm.weight.dtype ``` -Once a model is quantized to 4-bit, you can't push the quantized weights to the Hub. +If you have `bitsandbytes>=0.41.3`, you can serialize 4-bit models and push them on Hugging Face Hub. Simply call `model.push_to_hub()` after loading it in 4-bit precision. You can also save the serialized 4-bit models locally with `model.save_pretrained()` command. @@ -468,6 +468,7 @@ Try 4-bit quantization in this [notebook](https://colab.research.google.com/driv This section explores some of the specific features of 4-bit models, such as changing the compute data type, using the Normal Float 4 (NF4) data type, and using nested quantization. + #### Compute data type To speedup computation, you can change the data type from float32 (the default value) to bf16 using the `bnb_4bit_compute_dtype` parameter in [`BitsAndBytesConfig`]: