Releases · dogkeeper886/ollama37

13 Apr 03:35

dogkeeper886

v2.0.3

b742b12

v2.0.3 New Models, Bug Fixes & Benchmark Tool Latest

Latest

What's New

New Model Support

Gemma 4 — Parser, renderer, and model architecture port; added to OllamaEngineRequired (#53, #65)
FunctionGemma — Template and tool-calling support (#42)
Qwen3.5 Ollama Engine — Ported to ollama engine with DeltaNet recurrent state (#86)

Bug Fixes

Qwen3.5 attn_gate shape mismatch — Fixed tensor shape for 27b variant (#66)
Ghost GPU allocations — Fixed llama engine leaking GPU memory on model unload (#75)
Vision reservation — Reduced image reservation for non-flash GPUs (K80) (#71)
Ministral-3 — Fixed template function and YaRN RoPE parameters (#64)
DeltaNet Go ops — Added then reverted Go wrappers (not ready) (#83, reverted)

CI/CD

Debug logging framework — Added logging guidelines, OLLAMA_DEBUG toggle in CI workflows, removed temporary INSTRUMENT logging (#89)
GPU count validation — Model tests now verify expected GPU count (#79)
GPU memory check — Added nvidia-smi validation for ministral-3 (#76)
Memory profile parser — Parse nvidia-smi output in test framework (#68)
Throughput benchmark — Standalone tok/s measurement tool and CI workflow (#92)
num_predict guard — All model tests now set num_predict:200 to prevent infinite generation (#85)
Development lifecycle — Formalized state machine for issue → PR → merge flow (#41)
Test framework improvements — Test ID filtering, OLLAMA37_ROOT env var, model pull scheduling

Maintenance

Ministral-3 regression test — Added TC for template error (#56)
Model test expansion — TC-MODELS-006 through 012 for broader coverage
Removed duplicate commands/skills — Cleaned up .claude directory structure

Throughput (Tesla K80, 4x GPU)

Model	Prompt tok/s	Gen tok/s	GPU%	VRAM
gemma3:4b	68.38	16.15	100%	4785/11441 MiB
qwen3:4b	58.68	16.07	100%	3089/11441 MiB

Docker

docker pull dogkeeper886/ollama37:v2.03

Full Changelog

v2.02...v2.03

Assets 2

09 Mar 02:26

dogkeeper886

v2.02

9d61df4

v2.02 Qwen3.5 DeltaNet & CI Improvements

What's New

New Features

Qwen3.5 DeltaNet support — Full architecture registration, graph builder, custom GGML ops (softplus, cumsum, tri, solve_tri, fill, diag), chat renderer/parser (#12, #13, #14, #15, #16, #18, #20, #21, #22)
CUDA backends for DeltaNet ops — GPU-accelerated implementations for all 6 new GGML ops (#19, #26)

Performance

Fix ~4 min cold-start delay — Switch from PTX JIT compilation to native CUBIN for K80 (#25, #27)

CI/CD

LLM judge improvements — Structured JSON prompt, 8K context model, Ollama JSON mode, token count logging, boolean coercion fix (#32, #33, #34, #35, #36, #37)
Qwen3.5 model test — Added TC-MODELS-004 for qwen3.5:9b (#30, #31)
Test management flow — Formalized User Story → TestLink → YAML workflow (#28, #29)
Full test suite in CI — All 4 model tests run in pipeline

Bug Fixes

GGML ops hardening — Added contiguity asserts for cumsum/fill, GGML_ASSERT for solve_tri (#23, #24)
Per-layer n_head_kv — Fixed qwen3.5 attention head count

Docker

docker pull dogkeeper886/ollama37:v2.02

Full Changelog

v1.4.0...v2.02

Assets 2

10 Aug 00:10

dogkeeper886

v1.4.0

c61e0ce

v1.4.0 GPT-OSS Support and Tesla K80 Stability

v1.4.0 (2025-08-10)

This release introduces GPT-OSS support and delivers critical stability improvements for Tesla K80
GPUs:

New Model Support:

GPT-OSS: Open-source GPT implementation with optimized context management for smaller VRAM GPUs

Tesla K80 Improvements:

Fixed VMM pool crashes through proper memory alignment granularity
Resolved multi-GPU model switching deadlocks and silent failures
Enhanced BF16 compatibility for Compute Capability 3.7 devices
Optimized Docker build performance with parallel compilation

This release ensures reliable operation across single and multi-GPU Tesla K80 configurations while
expanding model support with the latest open-source innovations.

Assets 2

20 Jul 01:25

dogkeeper886

v1.3.0

7c02974

v1.3.0 Gemma 3n and Qwen2.5-VL Support

v1.3.0 (2025-07-01)

This release expands model support while maintaining full Tesla K80 compatibility:

New Model Support:

Qwen2.5-VL: Multi-modal vision-language model for image understanding
Gemma 3n: Efficient models designed for execution on everyday devices such as laptops, tablets or phones

Assets 2

05 May 16:04

dogkeeper886

v1.2.0

286f459

v1.2.0 Qwen 3 Support

This release introduces support for Qwen3 models, marking a significant step in our commitment to staying Tesla K80 with leading open-source language models. Testing includes successful execution of Gemma 3 12B, Phi-4 Reasoning 14B, and Qwen3 14B, ensuring compatibility with models expected to be widely used in May 2025.

Assets 2

07 Apr 15:59

dogkeeper886

v1.10

749dbc4

v1.1.0 Gemma 3 Support

Adds support for the new Gemma 3 language model. This release builds upon existing support for Phi-4 and DeepSeek-R1. See the full release notes below for details.

Assets 2

Releases: dogkeeper886/ollama37

v2.0.3 New Models, Bug Fixes & Benchmark Tool

What's New

New Model Support

Bug Fixes

CI/CD

Maintenance

Throughput (Tesla K80, 4x GPU)

Docker

Full Changelog

Uh oh!

v2.02 Qwen3.5 DeltaNet & CI Improvements

What's New

New Features

Performance

CI/CD

Bug Fixes

Docker

Full Changelog

Uh oh!

v1.4.0 GPT-OSS Support and Tesla K80 Stability

Uh oh!

v1.3.0 Gemma 3n and Qwen2.5-VL Support

v1.3.0 (2025-07-01)

Uh oh!

v1.2.0 Qwen 3 Support

Uh oh!

v1.1.0 Gemma 3 Support

Uh oh!