llama.cpp

This repo is cloned from llama.cpp commit 74d73dc85cc2057446bf63cc37ff649ae7cebd80. It is compatible with llama-cpp-python commit 7ecdd944624cbd49e4af0a5ce1aa402607d58dcc

Customize quantization group size at compilation (CPU inference only)

The only thing that is different is to add -DQK4_0 flag when cmake.

cmake -B build_cpu_g128 -DQK4_0=128
cmake --build build_cpu_g128

To quantize the model with the customized group size, run

./build_cpu_g128/bin/llama-quantize <model_path.gguf> <quantization_type>

To run the quantized model, run

./build_cpu_g128/bin/llama-cli -m <quantized_model_path.gguf>

Note:

You should make sure that the model you run is quantized to the same group size as the one you compile with. Or you'll receive a runtime error when loading the model.

Name		Name	Last commit message	Last commit date
Latest commit History 3,738 Commits
.github		.github
ci		ci
cmake		cmake
common		common
docs		docs
examples		examples
ggml		ggml
gguf-py		gguf-py
grammars		grammars
include		include
media		media
pocs		pocs
prompts		prompts
requirements		requirements
scripts		scripts
spm-headers		spm-headers
spm/omnivlm		spm/omnivlm
src		src
swift/LlavaTests		swift/LlavaTests
tests		tests
.gitignore		.gitignore
AUTHORS		AUTHORS
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Package.swift		Package.swift
README.md		README.md
SECURITY.md		SECURITY.md
convert_hf_to_gguf.py		convert_hf_to_gguf.py
convert_hf_to_gguf_update.py		convert_hf_to_gguf_update.py
convert_llama_ggml_to_gguf.py		convert_llama_ggml_to_gguf.py
convert_lora_to_gguf.py		convert_lora_to_gguf.py
flake.lock		flake.lock
flake.nix		flake.nix
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llama.cpp

Customize quantization group size at compilation (CPU inference only)

Note:

About

Releases

Packages

Languages

License

NexaAI/llama.cpp

Folders and files

Latest commit

History

Repository files navigation

llama.cpp

Customize quantization group size at compilation (CPU inference only)

Note:

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages