Skip to content

Commit 448665a

Browse files
authored
llama.cpp: add IQ3_XXS quantization models (#8)
* doc: add iq3_xxs perf. --------- Co-authored-by: ymcui <[email protected]>
1 parent ee7937e commit 448665a

File tree

2 files changed

+15
-15
lines changed

2 files changed

+15
-15
lines changed

README.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -188,13 +188,13 @@ Mixtral是一个稀疏混合专家模型。该模型与以往的LLaMA等主流
188188

189189
在llama.cpp下,测试了Chinese-Mixtral量化版模型的性能,如下表所示。
190190

191-
| | F16 | Q8_0 | Q6_K | Q5_K | Q5_0 | Q4_K | Q4_0 | Q3_K | Q2_K | IQ2_XS | IQ2_XXS |
192-
| ------------ | ---: | -----: | -----: | -----: | -----: | -----: | -----: | -----: | -----: | -----: | ------: |
193-
| Size (GB) | 87.0 | 46.2 | 35.7 | 30.0 | 30.0 | 24.6 | 24.6 | 19.0 | 16.1 | 12.7 | 11.4 |
194-
| BPW | 16.0 | 8.50 | 6.57 | 5.69 | 5.52 | 4.87 | 4.53 | 3.86 | 2.96 | 2.34 | 2.10 |
195-
| PPL | - | 4.4076 | 4.4092 | 4.4192 | 4.4224 | 4.4488 | 4.4917 | 4.5545 | 5.1846 | 6.9784 | 8.5981 |
196-
| M3 Max Speed | - | - | 36.0 | 36.9 | 35.7 | 31.2 | 27.8 | 37.6 | 29.1 | - | - |
197-
| A100 Speed | - | - | 29.9 | 22.6 | 20.5 | 21.7 | 17.1 | 21.7 | 20.3 | 23.7 | 22.5 |
191+
| | F16 | Q8_0 | Q6_K | Q5_K | Q5_0 | Q4_K | Q4_0 | Q3_K | IQ3_XXS | Q2_K | IQ2_XS | IQ2_XXS |
192+
| ------------ | ---: | -----: | -----: | -----: | -----: | -----: | -----: | -----: | ------: | -----: | -----: | ------: |
193+
| Size (GB) | 87.0 | 46.2 | 35.7 | 30.0 | 30.0 | 24.6 | 24.6 | 19.0 | 17.1 | 16.1 | 12.7 | 11.4 |
194+
| BPW | 16.0 | 8.50 | 6.57 | 5.69 | 5.52 | 4.87 | 4.53 | 3.86 | 3.14 | 2.96 | 2.34 | 2.10 |
195+
| PPL | - | 4.4076 | 4.4092 | 4.4192 | 4.4224 | 4.4488 | 4.4917 | 4.5545 | 4.5990 | 5.1846 | 6.9784 | 8.5981 |
196+
| M3 Max Speed | - | - | 36.0 | 36.9 | 35.7 | 31.2 | 27.8 | 37.6 | - | 29.1 | - | - |
197+
| A100 Speed | - | - | 29.9 | 22.6 | 20.5 | 21.7 | 17.1 | 21.7 | 20.6 | 20.3 | 23.7 | 22.5 |
198198

199199
> [!NOTE]
200200
>

README_EN.md

+8-8
Original file line numberDiff line numberDiff line change
@@ -184,17 +184,17 @@ To evaluate the effectiveness of the related models, this project conducted both
184184
| Chinese-Alpaca-2-7B-64K | 44.7 | 28.1 | 14.4 | 39.0 | 44.6 | 5.0 | 29.3 |
185185
| Chinese-LLaMA-2-7B-64K | 27.2 | 16.4 | 6.5 | 33.0 | 7.8 | 5.0 | 16.0 |
186186

187-
### Quantitative Effect Evaluation
187+
### Quantization Effect Evaluation
188188

189189
Under llama.cpp, the performance of the quantized version of the Chinese-Mixtral model was tested, as shown in the table below.
190190

191-
| | F16 | Q8_0 | Q6_K | Q5_K | Q5_0 | Q4_K | Q4_0 | Q3_K | Q2_K | IQ2_XS | IQ2_XXS |
192-
| ------------ | ---: | -----: | -----: | -----: | -----: | -----: | -----: | -----: | -----: | -----: | ------: |
193-
| Size (GB) | 87.0 | 46.2 | 35.7 | 30.0 | 30.0 | 24.6 | 24.6 | 19.0 | 16.1 | 12.7 | 11.4 |
194-
| BPW | 16.0 | 8.50 | 6.57 | 5.69 | 5.52 | 4.87 | 4.53 | 3.86 | 2.96 | 2.34 | 2.10 |
195-
| PPL | - | 4.4076 | 4.4092 | 4.4192 | 4.4224 | 4.4488 | 4.4917 | 4.5545 | 5.1846 | 6.9784 | 8.5981 |
196-
| M3 Max Speed | - | - | 36.0 | 36.9 | 35.7 | 31.2 | 27.8 | 37.6 | 29.1 | - | - |
197-
| A100 Speed | - | - | 29.9 | 22.6 | 20.5 | 21.7 | 17.1 | 21.7 | 20.3 | 23.7 | 22.5 |
191+
| | F16 | Q8_0 | Q6_K | Q5_K | Q5_0 | Q4_K | Q4_0 | Q3_K | IQ3_XXS | Q2_K | IQ2_XS | IQ2_XXS |
192+
| ------------ | ---: | -----: | -----: | -----: | -----: | -----: | -----: | -----: | ------: | -----: | -----: | ------: |
193+
| Size (GB) | 87.0 | 46.2 | 35.7 | 30.0 | 30.0 | 24.6 | 24.6 | 19.0 | 17.1 | 16.1 | 12.7 | 11.4 |
194+
| BPW | 16.0 | 8.50 | 6.57 | 5.69 | 5.52 | 4.87 | 4.53 | 3.86 | 3.14 | 2.96 | 2.34 | 2.10 |
195+
| PPL | - | 4.4076 | 4.4092 | 4.4192 | 4.4224 | 4.4488 | 4.4917 | 4.5545 | 4.5990 | 5.1846 | 6.9784 | 8.5981 |
196+
| M3 Max Speed | - | - | 36.0 | 36.9 | 35.7 | 31.2 | 27.8 | 37.6 | - | 29.1 | - | - |
197+
| A100 Speed | - | - | 29.9 | 22.6 | 20.5 | 21.7 | 17.1 | 21.7 | 20.6 | 20.3 | 23.7 | 22.5 |
198198

199199
> [!NOTE]
200200
>

0 commit comments

Comments
 (0)