🚨 Support dequantization for most GGML types (#32625)
* use gguf internal dequantize * add Q5_0 test * add iq1 test * add remained test * remove duplicated test * update docs * add gguf version limit * make style * update gguf import catch * revert vocab_size patch * make style * use GGUF_MIN_VERSION everywhere
This commit is contained in:
@@ -46,16 +46,30 @@ The initial supported quantization types are decided according to the popular qu
|
||||
on the Hub.
|
||||
|
||||
- F32
|
||||
- F16
|
||||
- BF16
|
||||
- Q4_0
|
||||
- Q4_1
|
||||
- Q5_0
|
||||
- Q5_1
|
||||
- Q8_0
|
||||
- Q2_K
|
||||
- Q3_K
|
||||
- Q4_0
|
||||
- Q4_K
|
||||
- Q5_K
|
||||
- Q6_K
|
||||
- Q8_0
|
||||
- IQ1_S
|
||||
- IQ1_M
|
||||
- IQ2_XXS
|
||||
- IQ2_XS
|
||||
- IQ2_S
|
||||
- IQ3_XXS
|
||||
- IQ3_S
|
||||
- IQ4_XS
|
||||
- IQ4_NL
|
||||
|
||||
We take example from the excellent [99991/pygguf](https://github.com/99991/pygguf) Python parser to dequantize the
|
||||
weights.
|
||||
> [!NOTE]
|
||||
> To support gguf dequantization, `gguf>=0.10.0` installation is required.
|
||||
|
||||
### Supported model architectures
|
||||
|
||||
|
||||
Reference in New Issue
Block a user