From ca974aff0fba284d0f062956d8eecef527de067e Mon Sep 17 00:00:00 2001 From: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Tue, 18 Jul 2023 13:39:08 +0200 Subject: [PATCH] [`Docs`] Clarify 4bit docs (#24878) * clarify 4bit docs * Apply suggestions from code review Co-authored-by: lewtun --------- Co-authored-by: lewtun --- docs/source/en/main_classes/quantization.md | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/docs/source/en/main_classes/quantization.md b/docs/source/en/main_classes/quantization.md index 8984ac4787..c8547ab0c7 100644 --- a/docs/source/en/main_classes/quantization.md +++ b/docs/source/en/main_classes/quantization.md @@ -38,11 +38,21 @@ Make sure that you have installed the requirements below before running any of t - Latest `bitsandbytes` library `pip install bitsandbytes>=0.39.0` -- Install latest `accelerate` from source -`pip install git+https://github.com/huggingface/accelerate.git` +- Install latest `accelerate` +`pip install --upgrade accelerate` - Install latest `transformers` from source -`pip install git+https://github.com/huggingface/transformers.git` +`pip install --upgrade transformers` + +#### Tips and best practices + +- **Advanced usage:** Refer to [this Google Colab notebook](https://colab.research.google.com/drive/1ge2F1QSK8Q7h0hn3YKuBCOAS0bK8E0wf) for advanced usage of 4-bit quantization with all the possible options. + +- **Faster inference with `batch_size=1` :** Since the `0.40.0` release of bitsandbytes, for `batch_size=1` you can benefit from fast inference. Check out [these release notes](https://github.com/TimDettmers/bitsandbytes/releases/tag/0.40.0) and make sure to have a version that is greater than `0.40.0` to benefit from this feature out of the box. + +- **Training:** According to [QLoRA paper](https://arxiv.org/abs/2305.14314), for training 4-bit base models (e.g. using LoRA adapters) one should use `bnb_4bit_quant_type='nf4'`. + +- **Inference:** For inference, `bnb_4bit_quant_type` does not have a huge impact on the performance. However for consistency with the model's weights, make sure you use the same `bnb_4bit_compute_dtype` and `torch_dtype` arguments. #### Load a large model in 4bit