Releases: dogkeeper886/ollama37
v2.0.3 New Models, Bug Fixes & Benchmark Tool
What's New
New Model Support
- Gemma 4 — Parser, renderer, and model architecture port; added to OllamaEngineRequired (#53, #65)
- FunctionGemma — Template and tool-calling support (#42)
- Qwen3.5 Ollama Engine — Ported to ollama engine with DeltaNet recurrent state (#86)
Bug Fixes
- Qwen3.5 attn_gate shape mismatch — Fixed tensor shape for 27b variant (#66)
- Ghost GPU allocations — Fixed llama engine leaking GPU memory on model unload (#75)
- Vision reservation — Reduced image reservation for non-flash GPUs (K80) (#71)
- Ministral-3 — Fixed template function and YaRN RoPE parameters (#64)
- DeltaNet Go ops — Added then reverted Go wrappers (not ready) (#83, reverted)
CI/CD
- Debug logging framework — Added logging guidelines, OLLAMA_DEBUG toggle in CI workflows, removed temporary INSTRUMENT logging (#89)
- GPU count validation — Model tests now verify expected GPU count (#79)
- GPU memory check — Added nvidia-smi validation for ministral-3 (#76)
- Memory profile parser — Parse nvidia-smi output in test framework (#68)
- Throughput benchmark — Standalone tok/s measurement tool and CI workflow (#92)
- num_predict guard — All model tests now set num_predict:200 to prevent infinite generation (#85)
- Development lifecycle — Formalized state machine for issue → PR → merge flow (#41)
- Test framework improvements — Test ID filtering, OLLAMA37_ROOT env var, model pull scheduling
Maintenance
- Ministral-3 regression test — Added TC for template error (#56)
- Model test expansion — TC-MODELS-006 through 012 for broader coverage
- Removed duplicate commands/skills — Cleaned up .claude directory structure
Throughput (Tesla K80, 4x GPU)
| Model | Prompt tok/s | Gen tok/s | GPU% | VRAM |
|---|---|---|---|---|
| gemma3:4b | 68.38 | 16.15 | 100% | 4785/11441 MiB |
| qwen3:4b | 58.68 | 16.07 | 100% | 3089/11441 MiB |
Docker
docker pull dogkeeper886/ollama37:v2.03Full Changelog
v2.02 Qwen3.5 DeltaNet & CI Improvements
What's New
New Features
- Qwen3.5 DeltaNet support — Full architecture registration, graph builder, custom GGML ops (softplus, cumsum, tri, solve_tri, fill, diag), chat renderer/parser (#12, #13, #14, #15, #16, #18, #20, #21, #22)
- CUDA backends for DeltaNet ops — GPU-accelerated implementations for all 6 new GGML ops (#19, #26)
Performance
CI/CD
- LLM judge improvements — Structured JSON prompt, 8K context model, Ollama JSON mode, token count logging, boolean coercion fix (#32, #33, #34, #35, #36, #37)
- Qwen3.5 model test — Added TC-MODELS-004 for qwen3.5:9b (#30, #31)
- Test management flow — Formalized User Story → TestLink → YAML workflow (#28, #29)
- Full test suite in CI — All 4 model tests run in pipeline
Bug Fixes
- GGML ops hardening — Added contiguity asserts for cumsum/fill, GGML_ASSERT for solve_tri (#23, #24)
- Per-layer n_head_kv — Fixed qwen3.5 attention head count
Docker
docker pull dogkeeper886/ollama37:v2.02Full Changelog
v1.4.0 GPT-OSS Support and Tesla K80 Stability
v1.4.0 GPT-OSS Support and Tesla K80 Stability
v1.4.0 (2025-08-10)
This release introduces GPT-OSS support and delivers critical stability improvements for Tesla K80
GPUs:
New Model Support:
- GPT-OSS: Open-source GPT implementation with optimized context management for smaller VRAM GPUs
Tesla K80 Improvements:
- Fixed VMM pool crashes through proper memory alignment granularity
- Resolved multi-GPU model switching deadlocks and silent failures
- Enhanced BF16 compatibility for Compute Capability 3.7 devices
- Optimized Docker build performance with parallel compilation
This release ensures reliable operation across single and multi-GPU Tesla K80 configurations while
expanding model support with the latest open-source innovations.
v1.3.0 Gemma 3n and Qwen2.5-VL Support
v1.3.0 (2025-07-01)
This release expands model support while maintaining full Tesla K80 compatibility:
New Model Support:
- Qwen2.5-VL: Multi-modal vision-language model for image understanding
- Gemma 3n: Efficient models designed for execution on everyday devices such as laptops, tablets or phones
v1.2.0 Qwen 3 Support
This release introduces support for Qwen3 models, marking a significant step in our commitment to staying Tesla K80 with leading open-source language models. Testing includes successful execution of Gemma 3 12B, Phi-4 Reasoning 14B, and Qwen3 14B, ensuring compatibility with models expected to be widely used in May 2025.
v1.1.0 Gemma 3 Support
Adds support for the new Gemma 3 language model. This release builds upon existing support for Phi-4 and DeepSeek-R1. See the full release notes below for details.