Add ktransformers loader #7280

hermannklie · 2025-10-24T06:17:47Z

ktransformers enables both small and large models to run more efficiently on consumer hardware within Text-Generation-WebUI.

It provides FP8 support, allowing models to run directly without quantization with safetensors, increasing usable context and inference speed.

I was able to run Qwen3-4B-Instruct-2507-FP8 on my laptop with 8gb vram and 64 GB Ram with ktransformers as loader in textgenwebui over the gui, with context lenght 4096, flash_attention_2, cache fp8, no cpu offload besides a bug appearing otherwise
For bigger models the hybrid offloading did not work, but that seems to be a problem of this version of textgenwebui, since it happened with other loaders too if i try to offload to cpu the hybrid offload failed was only gpu or cpu.

If textgenwebui team fix the bug with hybrid offloading to CPU and disk the Big models of DeepSeek 685b, Gwen3 235b and others are reachable for 5k- 10k cost lokal server builds. Models like Qwen3-next 80b 3A can so be used on FP8 with good consumer hardware, bringing midsize AI to the people :-) .

Implementation

Added new loader entry ktransformers in modules/models.py and modules/loaders.py.
Fully compatible with the existing one-click Conda environment (installer_files/env).

1. Priority is ktransformers must be installed in the same environment as the one click installation of textgenwebui to be found so open a terminal

cd ~/text-generation-webui
./cmd_linux.sh -c 'echo "CONDA_PREFIX=$CONDA_PREFIX"; which python'

Should show:

CONDA_PREFIX=/home//text-generation-webui/installer_files/env
/home//text-generation-webui/installer_files/env/bin/python

You are in "installer_files" Conda-env of WebUI

python -c "import sys; print(sys.executable)"

2. perhaps some tools are needed before installing

./cmd_linux.sh
sudo apt-get update
sudo apt-get install -y build-essential cmake ninja-build patchelf

numpy i needed, if some conflicts arise modern llm can help you to assist with solving version conflicts

pip install -U packaging ninja cpufeature numpy

minimal CUDA-Compiler in this Conda-Env conda 12.4.1 or higher:

conda install -y -c nvidia/label/cuda-12.4.1 cuda-nvcc
export CUDA_HOME="$CONDA_PREFIX"
export PATH="$CUDA_HOME/bin:$PATH"
export LD_LIBRARY_PATH="$CUDA_HOME/lib64:$LD_LIBRARY_PATH"
nvcc -V

3. Do not install Ktransformers with pip, it has too old versions, use git instead for new version with HTTP/1.1

im WebUI-Conda-Shell:

mkdir -p repositories && cd repositories
git -c http.version=HTTP/1.1 clone --depth 1 --recurse-submodules
https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git -c http.version=HTTP/1.1 submodule update --init --recursive --depth 1 --recommend-shallow --jobs 1

build without pip

python setup.py build_ext --inplace
python - <<'PY'
import site, os
repo = os.path.abspath(".")
cands = site.getsitepackages() or [site.getusersitepackages()]
pth = os.path.join(cands[0], "ktransformers_local.pth")
with open(pth, "w") as f: f.write(repo + "\n")
print("Wrote:", pth, "->", repo)
PY

4. sanity check out of the one click environment

cd ~/text-generation-webui
./cmd_linux.sh -c 'python - <<PY
import sys, ktransformers
print("python:", sys.executable)
print("ktransformers:", getattr(ktransformers,"version","git"), "from:", ktransformers.file)
PY'

should show: ~/text-generation-webui/repositories/ktransformers/...

Checklist:

I have read the Contributing guidelines.

This PR adds native support for the KTransformers backend as a selectable loader in Text-Generation-WebUI. It provides a reproducible installation and integration process compatible with the one-click installer (Conda environment). The integration is not limited to small models — it has meant to be used with Qwen3-Next-80B-A3B-Instruct-FP8 and other larger architectures using FP8 like DeepSeeK FP8 model and FlashAttention-2. Smaller models (e.g., Qwen3-4B-Instruct) now run efficiently, confirming broad coverage from laptop to workstation setups.

Add KTransformers loader integration

the def load_model(model_name, loader=None) we fill in ktransformers . before the def unload_model(keep_model_name=False) fill def ktransformers_loader

hermannklie and others added 4 commits October 21, 2025 20:44

Merge pull request #1 from hermannklie/ktransformers_in_textgenwebui

02c7049

Add KTransformers loader integration

Update models.py

7797952

the def load_model(model_name, loader=None) we fill in ktransformers . before the def unload_model(keep_model_name=False) fill def ktransformers_loader

Update loaders.py

eb6c5a1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ktransformers loader #7280

Add ktransformers loader #7280

hermannklie commented Oct 24, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add ktransformers loader #7280

Are you sure you want to change the base?

Add ktransformers loader #7280

Conversation

hermannklie commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ktransformers enables both small and large models to run more efficiently on consumer hardware within Text-Generation-WebUI.

It provides FP8 support, allowing models to run directly without quantization with safetensors, increasing usable context and inference speed.

If textgenwebui team fix the bug with hybrid offloading to CPU and disk the Big models of DeepSeek 685b, Gwen3 235b and others are reachable for 5k- 10k cost lokal server builds. Models like Qwen3-next 80b 3A can so be used on FP8 with good consumer hardware, bringing midsize AI to the people :-) .

Implementation

1. Priority is ktransformers must be installed in the same environment as the one click installation of textgenwebui to be found so open a terminal

Should show:

You are in "installer_files" Conda-env of WebUI

2. perhaps some tools are needed before installing

numpy i needed, if some conflicts arise modern llm can help you to assist with solving version conflicts

minimal CUDA-Compiler in this Conda-Env conda 12.4.1 or higher:

3. Do not install Ktransformers with pip, it has too old versions, use git instead for new version with HTTP/1.1

im WebUI-Conda-Shell:

build without pip

4. sanity check out of the one click environment

should show: ~/text-generation-webui/repositories/ktransformers/...

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hermannklie commented Oct 24, 2025 •

edited

Loading