diff --git a/docs/source/en/quantization/overview.md b/docs/source/en/quantization/overview.md index 48e701f5fb..1c0f48988d 100644 --- a/docs/source/en/quantization/overview.md +++ b/docs/source/en/quantization/overview.md @@ -24,26 +24,26 @@ Use the Space below to help you pick a quantization method depending on your har | Quantization Method | On the fly quantization | CPU | CUDA GPU | ROCm GPU | Metal (Apple Silicon) | Intel GPU | Torch compile() | Bits | PEFT Fine Tuning | Serializable with πŸ€—Transformers | πŸ€—Transformers Support | Link to library | |-----------------------------------------------|----------------------|-----------------|----------|-----------|------------------------------------|-----------------|-----------------|---------------|------------------|-----------------------------|-------------------------|---------------------------------------------| -| [AQLM](./aqlm.md) | πŸ”΄ | 🟒 | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 🟒 | 1/2 | 🟒 | 🟒 | 🟒 | https://github.com/Vahe1994/AQLM | -| [AWQ](./awq.md) | πŸ”΄ | 🟒 | 🟒 | 🟒 | πŸ”΄ | 🟒 | ? | 4 | 🟒 | 🟒 | 🟒 | https://github.com/casper-hansen/AutoAWQ | -| [bitsandbytes](./bitsandbytes.md) | 🟒 | 🟑 | 🟒 | 🟑 | πŸ”΄ | 🟑 | πŸ”΄ | 4/8 | 🟒 | 🟒 | 🟒 | https://github.com/bitsandbytes-foundation/bitsandbytes | -| [compressed-tensors](./compressed_tensors.md) | πŸ”΄ | 🟒 | 🟒 | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 1/8 | 🟒 | 🟒 | 🟒 | https://github.com/neuralmagic/compressed-tensors | -| [EETQ](./eetq.md) | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | ? | 8 | 🟒 | 🟒 | 🟒 | https://github.com/NetEase-FuXi/EETQ | -| [GGUF / GGML (llama.cpp)](../gguf.md) | 🟒 | 🟒 | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | 1/8 | πŸ”΄ | [See Notes](../gguf.md) | [See Notes](../gguf.md) | https://github.com/ggerganov/llama.cpp | -| [GPTQModel](./gptq.md) | πŸ”΄ | 🟒 | 🟒 | 🟒 | 🟒 | 🟒 | πŸ”΄ | 2/3/4/8 | 🟒 | 🟒 | 🟒 | https://github.com/ModelCloud/GPTQModel | -| [AutoGPTQ](./gptq.md) | πŸ”΄ | πŸ”΄ | 🟒 | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 2/3/4/8 | 🟒 | 🟒 | 🟒 | https://github.com/AutoGPTQ/AutoGPTQ | -| [HIGGS](./higgs.md) | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 🟒 | 2/4 | πŸ”΄ | 🟒 | 🟒 | https://github.com/HanGuo97/flute | -| [HQQ](./hqq.md) | 🟒 | 🟒 | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 🟒 | 1/8 | 🟒 | πŸ”΄ | 🟒 | https://github.com/mobiusml/hqq/ | -| [optimum-quanto](./quanto.md) | 🟒 | 🟒 | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | 🟒 | 2/4/8 | πŸ”΄ | πŸ”΄ | 🟒 | https://github.com/huggingface/optimum-quanto | -| [FBGEMM_FP8](./fbgemm_fp8.md) | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | πŸ”΄ | 8 | πŸ”΄ | 🟒 | 🟒 | https://github.com/pytorch/FBGEMM | -| [torchao](./torchao.md) | 🟒 | 🟒 | 🟒 | πŸ”΄ | 🟑 | πŸ”΄ | | 4/8 | | πŸŸ’πŸ”΄ | 🟒 | https://github.com/pytorch/ao | -| [VPTQ](./vptq.md) | πŸ”΄ | πŸ”΄ | 🟒 | 🟑 | πŸ”΄ | πŸ”΄ | 🟒 | 1/8 | πŸ”΄ | 🟒 | 🟒 | https://github.com/microsoft/VPTQ | -| [FINEGRAINED_FP8](./finegrained_fp8.md) | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | πŸ”΄ | 8 | πŸ”΄ | 🟒 | 🟒 | | -| [SpQR](./spqr.md) | πŸ”΄ | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 🟒 | 3 | πŸ”΄ | 🟒 | 🟒 | https://github.com/Vahe1994/SpQR/ | +| [AQLM](./aqlm) | πŸ”΄ | 🟒 | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 🟒 | 1/2 | 🟒 | 🟒 | 🟒 | https://github.com/Vahe1994/AQLM | +| [AWQ](./awq) | πŸ”΄ | 🟒 | 🟒 | 🟒 | πŸ”΄ | 🟒 | ? | 4 | 🟒 | 🟒 | 🟒 | https://github.com/casper-hansen/AutoAWQ | +| [bitsandbytes](./bitsandbytes) | 🟒 | 🟑 | 🟒 | 🟑 | πŸ”΄ | 🟑 | πŸ”΄ | 4/8 | 🟒 | 🟒 | 🟒 | https://github.com/bitsandbytes-foundation/bitsandbytes | +| [compressed-tensors](./compressed_tensors) | πŸ”΄ | 🟒 | 🟒 | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 1/8 | 🟒 | 🟒 | 🟒 | https://github.com/neuralmagic/compressed-tensors | +| [EETQ](./eetq) | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | ? | 8 | 🟒 | 🟒 | 🟒 | https://github.com/NetEase-FuXi/EETQ | +| [GGUF / GGML (llama.cpp)](../gguf) | 🟒 | 🟒 | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | 1/8 | πŸ”΄ | [See Notes](../gguf) | [See Notes](../gguf) | https://github.com/ggerganov/llama.cpp | +| [GPTQModel](./gptq) | πŸ”΄ | 🟒 | 🟒 | 🟒 | 🟒 | 🟒 | πŸ”΄ | 2/3/4/8 | 🟒 | 🟒 | 🟒 | https://github.com/ModelCloud/GPTQModel | +| [AutoGPTQ](./gptq) | πŸ”΄ | πŸ”΄ | 🟒 | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 2/3/4/8 | 🟒 | 🟒 | 🟒 | https://github.com/AutoGPTQ/AutoGPTQ | +| [HIGGS](./higgs) | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 🟒 | 2/4 | πŸ”΄ | 🟒 | 🟒 | https://github.com/HanGuo97/flute | +| [HQQ](./hqq) | 🟒 | 🟒 | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 🟒 | 1/8 | 🟒 | πŸ”΄ | 🟒 | https://github.com/mobiusml/hqq/ | +| [optimum-quanto](./quanto) | 🟒 | 🟒 | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | 🟒 | 2/4/8 | πŸ”΄ | πŸ”΄ | 🟒 | https://github.com/huggingface/optimum-quanto | +| [FBGEMM_FP8](./fbgemm_fp8) | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | πŸ”΄ | 8 | πŸ”΄ | 🟒 | 🟒 | https://github.com/pytorch/FBGEMM | +| [torchao](./torchao) | 🟒 | 🟒 | 🟒 | πŸ”΄ | 🟑 | πŸ”΄ | | 4/8 | | πŸŸ’πŸ”΄ | 🟒 | https://github.com/pytorch/ao | +| [VPTQ](./vptq) | πŸ”΄ | πŸ”΄ | 🟒 | 🟑 | πŸ”΄ | πŸ”΄ | 🟒 | 1/8 | πŸ”΄ | 🟒 | 🟒 | https://github.com/microsoft/VPTQ | +| [FINEGRAINED_FP8](./finegrained_fp8) | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | πŸ”΄ | 8 | πŸ”΄ | 🟒 | 🟒 | | +| [SpQR](./spqr) | πŸ”΄ | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 🟒 | 3 | πŸ”΄ | 🟒 | 🟒 | https://github.com/Vahe1994/SpQR/ | ## Resources If you are new to quantization, we recommend checking out these beginner-friendly quantization courses in collaboration with DeepLearning.AI. * [Quantization Fundamentals with Hugging Face](https://www.deeplearning.ai/short-courses/quantization-fundamentals-with-hugging-face/) -* [Quantization in Depth](https://www.deeplearning.ai/short-courses/quantization-in-depth/) +* [Quantization in Depth](https://www.deeplearning.ai/short-courses/quantization-in-depth \ No newline at end of file