Skip to content

Releases: dogkeeper886/ollama37

v2.0.3 New Models, Bug Fixes & Benchmark Tool

13 Apr 03:35

Choose a tag to compare

What's New

New Model Support

  • Gemma 4 — Parser, renderer, and model architecture port; added to OllamaEngineRequired (#53, #65)
  • FunctionGemma — Template and tool-calling support (#42)
  • Qwen3.5 Ollama Engine — Ported to ollama engine with DeltaNet recurrent state (#86)

Bug Fixes

  • Qwen3.5 attn_gate shape mismatch — Fixed tensor shape for 27b variant (#66)
  • Ghost GPU allocations — Fixed llama engine leaking GPU memory on model unload (#75)
  • Vision reservation — Reduced image reservation for non-flash GPUs (K80) (#71)
  • Ministral-3 — Fixed template function and YaRN RoPE parameters (#64)
  • DeltaNet Go ops — Added then reverted Go wrappers (not ready) (#83, reverted)

CI/CD

  • Debug logging framework — Added logging guidelines, OLLAMA_DEBUG toggle in CI workflows, removed temporary INSTRUMENT logging (#89)
  • GPU count validation — Model tests now verify expected GPU count (#79)
  • GPU memory check — Added nvidia-smi validation for ministral-3 (#76)
  • Memory profile parser — Parse nvidia-smi output in test framework (#68)
  • Throughput benchmark — Standalone tok/s measurement tool and CI workflow (#92)
  • num_predict guard — All model tests now set num_predict:200 to prevent infinite generation (#85)
  • Development lifecycle — Formalized state machine for issue → PR → merge flow (#41)
  • Test framework improvements — Test ID filtering, OLLAMA37_ROOT env var, model pull scheduling

Maintenance

  • Ministral-3 regression test — Added TC for template error (#56)
  • Model test expansion — TC-MODELS-006 through 012 for broader coverage
  • Removed duplicate commands/skills — Cleaned up .claude directory structure

Throughput (Tesla K80, 4x GPU)

Model Prompt tok/s Gen tok/s GPU% VRAM
gemma3:4b 68.38 16.15 100% 4785/11441 MiB
qwen3:4b 58.68 16.07 100% 3089/11441 MiB

Docker

docker pull dogkeeper886/ollama37:v2.03

Full Changelog

v2.02...v2.03

v2.02 Qwen3.5 DeltaNet & CI Improvements

09 Mar 02:26
9d61df4

Choose a tag to compare

What's New

New Features

  • Qwen3.5 DeltaNet support — Full architecture registration, graph builder, custom GGML ops (softplus, cumsum, tri, solve_tri, fill, diag), chat renderer/parser (#12, #13, #14, #15, #16, #18, #20, #21, #22)
  • CUDA backends for DeltaNet ops — GPU-accelerated implementations for all 6 new GGML ops (#19, #26)

Performance

  • Fix ~4 min cold-start delay — Switch from PTX JIT compilation to native CUBIN for K80 (#25, #27)

CI/CD

  • LLM judge improvements — Structured JSON prompt, 8K context model, Ollama JSON mode, token count logging, boolean coercion fix (#32, #33, #34, #35, #36, #37)
  • Qwen3.5 model test — Added TC-MODELS-004 for qwen3.5:9b (#30, #31)
  • Test management flow — Formalized User Story → TestLink → YAML workflow (#28, #29)
  • Full test suite in CI — All 4 model tests run in pipeline

Bug Fixes

  • GGML ops hardening — Added contiguity asserts for cumsum/fill, GGML_ASSERT for solve_tri (#23, #24)
  • Per-layer n_head_kv — Fixed qwen3.5 attention head count

Docker

docker pull dogkeeper886/ollama37:v2.02

Full Changelog

v1.4.0...v2.02

v1.4.0 GPT-OSS Support and Tesla K80 Stability

10 Aug 00:10

Choose a tag to compare

v1.4.0 GPT-OSS Support and Tesla K80 Stability

v1.4.0 (2025-08-10)

This release introduces GPT-OSS support and delivers critical stability improvements for Tesla K80
GPUs:

New Model Support:

  • GPT-OSS: Open-source GPT implementation with optimized context management for smaller VRAM GPUs

Tesla K80 Improvements:

  • Fixed VMM pool crashes through proper memory alignment granularity
  • Resolved multi-GPU model switching deadlocks and silent failures
  • Enhanced BF16 compatibility for Compute Capability 3.7 devices
  • Optimized Docker build performance with parallel compilation

This release ensures reliable operation across single and multi-GPU Tesla K80 configurations while
expanding model support with the latest open-source innovations.

v1.3.0 Gemma 3n and Qwen2.5-VL Support

20 Jul 01:25

Choose a tag to compare

v1.3.0 (2025-07-01)

This release expands model support while maintaining full Tesla K80 compatibility:

New Model Support:

  • Qwen2.5-VL: Multi-modal vision-language model for image understanding
  • Gemma 3n: Efficient models designed for execution on everyday devices such as laptops, tablets or phones

v1.2.0 Qwen 3 Support

05 May 16:04
286f459

Choose a tag to compare

This release introduces support for Qwen3 models, marking a significant step in our commitment to staying Tesla K80 with leading open-source language models. Testing includes successful execution of Gemma 3 12B, Phi-4 Reasoning 14B, and Qwen3 14B, ensuring compatibility with models expected to be widely used in May 2025.

v1.1.0 Gemma 3 Support

07 Apr 15:59

Choose a tag to compare

Adds support for the new Gemma 3 language model. This release builds upon existing support for Phi-4 and DeepSeek-R1. See the full release notes below for details.