llama.cpp

Last updated on Mar 4th, 2025.

This repo is cloned from llama.cpp commit 06c2b1561d8b882bc018554591f8c35eb04ad30e. It is compatible with llama-cpp-python commit 710e19a81284e5af0d5db93cef7a9063b3e8534f

Customize quantization group size at compilation (CPU inference only)

The only thing that is different is to add -DQK4_0 flag when cmake.

cmake -B build_cpu_g128 -DQK4_0=128
cmake --build build_cpu_g128

To quantize the model with the customized group size, run

./build_cpu_g128/bin/llama-quantize <model_path.gguf> <quantization_type>

To run the quantized model, run

./build_cpu_g128/bin/llama-cli -m <quantized_model_path.gguf>

Note:

You should make sure that the model you run is quantized to the same group size as the one you compile with. Or you'll receive a runtime error when loading the model.

Name		Name	Last commit message	Last commit date
Latest commit History 4,905 Commits
.github		.github
Sources/llama		Sources/llama
ci		ci
cmake		cmake
common		common
docs		docs
examples		examples
ggml		ggml
gguf-py		gguf-py
grammars		grammars
include		include
media		media
pocs		pocs
prompts		prompts
requirements		requirements
scripts		scripts
spm-headers		spm-headers
spm/omnivlm		spm/omnivlm
src		src
swift/LlavaTests		swift/LlavaTests
tests		tests
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.gitignore		.gitignore
AUTHORS		AUTHORS
CMakeLists.txt		CMakeLists.txt
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
Package.swift		Package.swift
README.md		README.md
SECURITY.md		SECURITY.md
convert_hf_to_gguf.py		convert_hf_to_gguf.py
convert_hf_to_gguf_update.py		convert_hf_to_gguf_update.py
convert_llama_ggml_to_gguf.py		convert_llama_ggml_to_gguf.py
convert_lora_to_gguf.py		convert_lora_to_gguf.py
flake.lock		flake.lock
flake.nix		flake.nix
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llama.cpp

Customize quantization group size at compilation (CPU inference only)

Note:

About

Releases

Packages

Languages

License

NexaAI/llama.cpp

Folders and files

Latest commit

History

Repository files navigation

llama.cpp

Customize quantization group size at compilation (CPU inference only)

Note:

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages