🚨 Support dequantization for most GGML types (#32625)

* use gguf internal dequantize

* add Q5_0 test

* add iq1 test

* add remained test

* remove duplicated test

* update docs

* add gguf version limit

* make style

* update gguf import catch

* revert vocab_size patch

* make style

* use GGUF_MIN_VERSION everywhere
This commit is contained in:
Isotr0py
2024-09-03 18:58:14 +08:00
committed by GitHub
parent 979f4774f6
commit edeca4387c
7 changed files with 169 additions and 356 deletions

View File

@@ -46,16 +46,30 @@ The initial supported quantization types are decided according to the popular qu
on the Hub.
- F32
- F16
- BF16
- Q4_0
- Q4_1
- Q5_0
- Q5_1
- Q8_0
- Q2_K
- Q3_K
- Q4_0
- Q4_K
- Q5_K
- Q6_K
- Q8_0
- IQ1_S
- IQ1_M
- IQ2_XXS
- IQ2_XS
- IQ2_S
- IQ3_XXS
- IQ3_S
- IQ4_XS
- IQ4_NL
We take example from the excellent [99991/pygguf](https://github.com/99991/pygguf) Python parser to dequantize the
weights.
> [!NOTE]
> To support gguf dequantization, `gguf>=0.10.0` installation is required.
### Supported model architectures