Files
HuggingFace_transformer/docs/source/en/quantization/spqr.md
Tommy Chiang 4fcf455517 Fix broken links (#39809)
Replace links in the form of `[text]((url))` to `[text](url)`. This is
the correct format of a url in the markdown.
2025-07-31 13:23:04 +00:00

1.6 KiB

SpQR

The SpQR quantization algorithm involves a 16x16 tiled bi-level group 3-bit quantization structure with sparse outliers.

Tip

To quantize a model with SpQR, refer to the Vahe1994/SpQR repository.

Load a SpQR-quantized model with [~PreTrainedModel.from_pretrained].

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

quantized_model = AutoModelForCausalLM.from_pretrained(
    "elvircrn/Llama-2-7b-SPQR-3Bit-16x16-red_pajama-hf",
    torch_dtype=torch.half,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("elvircrn/Llama-2-7b-SPQR-3Bit-16x16-red_pajama-hf")