🚨 Support dequantization for most GGML types (#32625)

* use gguf internal dequantize * add Q5_0 test * add iq1 test * add remained test * remove duplicated test * update docs * add gguf version limit * make style * update gguf import catch * revert vocab_size patch * make style * use GGUF_MIN_VERSION everywhere
2024-09-03 18:58:14 +08:00
parent 979f4774f6
commit edeca4387c
7 changed files with 169 additions and 356 deletions
--- a/docs/source/en/gguf.md
+++ b/docs/source/en/gguf.md
@@ -46,16 +46,30 @@ The initial supported quantization types are decided according to the popular qu
 on the Hub.

 - F32
+- F16
+- BF16
+- Q4_0
+- Q4_1
+- Q5_0
+- Q5_1
+- Q8_0
 - Q2_K
 - Q3_K
- Q4_0
 - Q4_K
 - Q5_K
 - Q6_K
- Q8_0
+- IQ1_S
+- IQ1_M
+- IQ2_XXS
+- IQ2_XS
+- IQ2_S
+- IQ3_XXS
+- IQ3_S
+- IQ4_XS
+- IQ4_NL

-We take example from the excellent [99991/pygguf](https://github.com/99991/pygguf) Python parser to dequantize the 
-weights.
+> [!NOTE]
+> To support gguf dequantization, `gguf>=0.10.0` installation is required.

 ### Supported model architectures