@@ -184,17 +184,17 @@ To evaluate the effectiveness of the related models, this project conducted both
184
184
| Chinese-Alpaca-2-7B-64K | 44.7 | 28.1 | 14.4 | 39.0 | 44.6 | 5.0 | 29.3 |
185
185
| Chinese-LLaMA-2-7B-64K | 27.2 | 16.4 | 6.5 | 33.0 | 7.8 | 5.0 | 16.0 |
186
186
187
- ### Quantitative Effect Evaluation
187
+ ### Quantization Effect Evaluation
188
188
189
189
Under llama.cpp, the performance of the quantized version of the Chinese-Mixtral model was tested, as shown in the table below.
190
190
191
- | | F16 | Q8_0 | Q6_K | Q5_K | Q5_0 | Q4_K | Q4_0 | Q3_K | Q2_K | IQ2_XS | IQ2_XXS |
192
- | ------------ | ---: | -----: | -----: | -----: | -----: | -----: | -----: | -----: | -----: | -----: | ------: |
193
- | Size (GB) | 87.0 | 46.2 | 35.7 | 30.0 | 30.0 | 24.6 | 24.6 | 19.0 | 16.1 | 12.7 | 11.4 |
194
- | BPW | 16.0 | 8.50 | 6.57 | 5.69 | 5.52 | 4.87 | 4.53 | 3.86 | 2.96 | 2.34 | 2.10 |
195
- | PPL | - | 4.4076 | 4.4092 | 4.4192 | 4.4224 | 4.4488 | 4.4917 | 4.5545 | 5.1846 | 6.9784 | 8.5981 |
196
- | M3 Max Speed | - | - | 36.0 | 36.9 | 35.7 | 31.2 | 27.8 | 37.6 | 29.1 | - | - |
197
- | A100 Speed | - | - | 29.9 | 22.6 | 20.5 | 21.7 | 17.1 | 21.7 | 20.3 | 23.7 | 22.5 |
191
+ | | F16 | Q8_0 | Q6_K | Q5_K | Q5_0 | Q4_K | Q4_0 | Q3_K | IQ3_XXS | Q2_K | IQ2_XS | IQ2_XXS |
192
+ | ------------ | ---: | -----: | -----: | -----: | -----: | -----: | -----: | -----: | ------: | ----- : | -----: | ------: |
193
+ | Size (GB) | 87.0 | 46.2 | 35.7 | 30.0 | 30.0 | 24.6 | 24.6 | 19.0 | 17.1 | 16.1 | 12.7 | 11.4 |
194
+ | BPW | 16.0 | 8.50 | 6.57 | 5.69 | 5.52 | 4.87 | 4.53 | 3.86 | 3.14 | 2.96 | 2.34 | 2.10 |
195
+ | PPL | - | 4.4076 | 4.4092 | 4.4192 | 4.4224 | 4.4488 | 4.4917 | 4.5545 | 4.5990 | 5.1846 | 6.9784 | 8.5981 |
196
+ | M3 Max Speed | - | - | 36.0 | 36.9 | 35.7 | 31.2 | 27.8 | 37.6 | - | 29.1 | - | - |
197
+ | A100 Speed | - | - | 29.9 | 22.6 | 20.5 | 21.7 | 17.1 | 21.7 | 20.6 | 20.3 | 23.7 | 22.5 |
198
198
199
199
> [ !NOTE]
200
200
>
0 commit comments