Skip to content

Conversation

@hermannklie
Copy link

@hermannklie hermannklie commented Oct 24, 2025

ktransformers enables both small and large models to run more efficiently on consumer hardware within Text-Generation-WebUI.

It provides FP8 support, allowing models to run directly without quantization with safetensors, increasing usable context and inference speed.

I was able to run Qwen3-4B-Instruct-2507-FP8 on my laptop with 8gb vram and 64 GB Ram with ktransformers as loader in textgenwebui over the gui, with context lenght 4096, flash_attention_2, cache fp8, no cpu offload besides a bug appearing otherwise
For bigger models the hybrid offloading did not work, but that seems to be a problem of this version of textgenwebui, since it happened with other loaders too if i try to offload to cpu the hybrid offload failed was only gpu or cpu.

If textgenwebui team fix the bug with hybrid offloading to CPU and disk the Big models of DeepSeek 685b, Gwen3 235b and others are reachable for 5k- 10k cost lokal server builds. Models like Qwen3-next 80b 3A can so be used on FP8 with good consumer hardware, bringing midsize AI to the people :-) .

Implementation

Added new loader entry ktransformers in modules/models.py and modules/loaders.py.
Fully compatible with the existing one-click Conda environment (installer_files/env).

1. Priority is ktransformers must be installed in the same environment as the one click installation of textgenwebui to be found so open a terminal

cd ~/text-generation-webui
./cmd_linux.sh -c 'echo "CONDA_PREFIX=$CONDA_PREFIX"; which python'

Should show:

CONDA_PREFIX=/home//text-generation-webui/installer_files/env
/home//text-generation-webui/installer_files/env/bin/python

You are in "installer_files" Conda-env of WebUI

python -c "import sys; print(sys.executable)"

2. perhaps some tools are needed before installing

./cmd_linux.sh
sudo apt-get update
sudo apt-get install -y build-essential cmake ninja-build patchelf

numpy i needed, if some conflicts arise modern llm can help you to assist with solving version conflicts

pip install -U packaging ninja cpufeature numpy

minimal CUDA-Compiler in this Conda-Env conda 12.4.1 or higher:

conda install -y -c nvidia/label/cuda-12.4.1 cuda-nvcc
export CUDA_HOME="$CONDA_PREFIX"
export PATH="$CUDA_HOME/bin:$PATH"
export LD_LIBRARY_PATH="$CUDA_HOME/lib64:$LD_LIBRARY_PATH"
nvcc -V

3. Do not install Ktransformers with pip, it has too old versions, use git instead for new version with HTTP/1.1

im WebUI-Conda-Shell:

mkdir -p repositories && cd repositories
git -c http.version=HTTP/1.1 clone --depth 1 --recurse-submodules
https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git -c http.version=HTTP/1.1 submodule update --init --recursive --depth 1 --recommend-shallow --jobs 1

build without pip

python setup.py build_ext --inplace
python - <<'PY'
import site, os
repo = os.path.abspath(".")
cands = site.getsitepackages() or [site.getusersitepackages()]
pth = os.path.join(cands[0], "ktransformers_local.pth")
with open(pth, "w") as f: f.write(repo + "\n")
print("Wrote:", pth, "->", repo)
PY

4. sanity check out of the one click environment

cd ~/text-generation-webui
./cmd_linux.sh -c 'python - <<PY
import sys, ktransformers
print("python:", sys.executable)
print("ktransformers:", getattr(ktransformers,"version","git"), "from:", ktransformers.file)
PY'

should show: ~/text-generation-webui/repositories/ktransformers/...

Checklist:

hermannklie and others added 4 commits October 21, 2025 20:44
This PR adds native support for the KTransformers backend as a selectable loader in Text-Generation-WebUI.
It provides a reproducible installation and integration process compatible with the one-click installer (Conda environment).

The integration is not limited to small models — it has meant to be used with Qwen3-Next-80B-A3B-Instruct-FP8 and other larger architectures using FP8 like DeepSeeK FP8 model and FlashAttention-2.
Smaller models (e.g., Qwen3-4B-Instruct) now run efficiently, confirming broad coverage from laptop to workstation setups.
 the def load_model(model_name, loader=None) we fill in ktransformers .

 before the def unload_model(keep_model_name=False) fill def ktransformers_loader
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant