Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
ktransformers enables both small and large models to run more efficiently on consumer hardware within Text-Generation-WebUI.
It provides FP8 support, allowing models to run directly without quantization with safetensors, increasing usable context and inference speed.
I was able to run Qwen3-4B-Instruct-2507-FP8 on my laptop with 8gb vram and 64 GB Ram with ktransformers as loader in textgenwebui over the gui, with context lenght 4096, flash_attention_2, cache fp8, no cpu offload besides a bug appearing otherwise
For bigger models the hybrid offloading did not work, but that seems to be a problem of this version of textgenwebui, since it happened with other loaders too if i try to offload to cpu the hybrid offload failed was only gpu or cpu.
If textgenwebui team fix the bug with hybrid offloading to CPU and disk the Big models of DeepSeek 685b, Gwen3 235b and others are reachable for 5k- 10k cost lokal server builds. Models like Qwen3-next 80b 3A can so be used on FP8 with good consumer hardware, bringing midsize AI to the people :-) .
Implementation
Added new loader entry ktransformers in modules/models.py and modules/loaders.py.
Fully compatible with the existing one-click Conda environment (installer_files/env).
1. Priority is ktransformers must be installed in the same environment as the one click installation of textgenwebui to be found so open a terminal
cd ~/text-generation-webui
./cmd_linux.sh -c 'echo "CONDA_PREFIX=$CONDA_PREFIX"; which python'
Should show:
CONDA_PREFIX=/home//text-generation-webui/installer_files/env
/home//text-generation-webui/installer_files/env/bin/python
You are in "installer_files" Conda-env of WebUI
python -c "import sys; print(sys.executable)"
2. perhaps some tools are needed before installing
./cmd_linux.sh
sudo apt-get update
sudo apt-get install -y build-essential cmake ninja-build patchelf
numpy i needed, if some conflicts arise modern llm can help you to assist with solving version conflicts
pip install -U packaging ninja cpufeature numpy
minimal CUDA-Compiler in this Conda-Env conda 12.4.1 or higher:
conda install -y -c nvidia/label/cuda-12.4.1 cuda-nvcc
export CUDA_HOME="$CONDA_PREFIX"
export PATH="$CUDA_HOME/bin:$PATH"
export LD_LIBRARY_PATH="$CUDA_HOME/lib64:$LD_LIBRARY_PATH"
nvcc -V
3. Do not install Ktransformers with pip, it has too old versions, use git instead for new version with HTTP/1.1
im WebUI-Conda-Shell:
mkdir -p repositories && cd repositories
git -c http.version=HTTP/1.1 clone --depth 1 --recurse-submodules
https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git -c http.version=HTTP/1.1 submodule update --init --recursive --depth 1 --recommend-shallow --jobs 1
build without pip
python setup.py build_ext --inplace
python - <<'PY'
import site, os
repo = os.path.abspath(".")
cands = site.getsitepackages() or [site.getusersitepackages()]
pth = os.path.join(cands[0], "ktransformers_local.pth")
with open(pth, "w") as f: f.write(repo + "\n")
print("Wrote:", pth, "->", repo)
PY
4. sanity check out of the one click environment
cd ~/text-generation-webui
./cmd_linux.sh -c 'python - <<PY
import sys, ktransformers
print("python:", sys.executable)
print("ktransformers:", getattr(ktransformers,"version","git"), "from:", ktransformers.file)
PY'
should show: ~/text-generation-webui/repositories/ktransformers/...
Checklist: