fix(flows): flow-16 unsatisfiable Ready gate; flow-04 first-inference window on local models by bussyjd · Pull Request #621 · ObolNetwork/obol-stack

bussyjd · 2026-06-11T03:10:41Z

Summary

Both release-gating FAILs from the rc14 release smoke on a local Ollama model (qwopus3.6-27b-v2-mtp:q5_k_m, 27B) reduce to flow bugs — the stack itself is clean. Diagnosed live against a reproducing cluster; full chain below.

flow-16 §2.2 — the Ready gate could never pass

The flow creates its offer with registration enabled and never runs obol sell register, so the controller keeps Ready=False / Registered=AwaitingExternalRegistration by design ("offer already serves paid traffic"). A Ready=True poll is unsatisfiable as written — flow-11 only polls Ready after actually registering.

It historically "passed" by accident: the gate grepped Ready=True against sell status output, which substring-matches PaymentGateReady=True whenever the condition ladder converged inside the window.

Fix: poll the serving condition set — UpstreamHealthy + PaymentGateReady + RoutePublished, anchored greps via obol kubectl jsonpath — over 300s. That set is exactly what §3's 402 probe then exercises.

flow-04 step 12 — window too tight, diagnostics swallowed

curl -sf --max-time 120 failed reproducibly (twice, warm model) on the first inference ever routed through the Hermes agent pipeline: Hermes prepends a multi-thousand-token system prompt, and a local Ollama model pays full prompt processing before the KV cache warms (~150s observed; Hermes' internal client retries re-pay it until one attempt survives). The very next call answers in ~20s — which is why step 13 passed seconds later in every run. GPU-class endpoints converge in seconds and never see this.

-f also swallowed the response entirely, so the fail line was the empty Agent inference failed — (which initially sent the investigation toward cold model loads and auth). Fix: 300s window, no -f, and the fail message carries HTTP status + body snippet.

Verification

Live cluster in the failing state: new flow-16 gate passes where the old one cannot (UpstreamHealthy=True PaymentGateReady=True RoutePublished=True Registered=False Ready=False); the exact flow-04 request returns 200 with correct content once the prompt cache is warm.
bash -n clean on both flows; with these fixes the rc14 wopus smoke is 12 PASS / 2 SKIP (the SKIPs are waived registration-receipt sub-checks, registrations themselves succeeded on-chain).

…ference probe Both release-smoke failures on the rc14 wopus run reduce to these two flow bugs — no stack defect (reproduced live, full chain diagnosed): - flow-16 §2.2 polled Ready=True for an offer created WITH registration enabled and no `obol sell register` submitted, which the controller keeps Ready=False / AwaitingExternalRegistration by design ('offer already serves paid traffic') — the gate could never pass as written, and only ever matched historically because 'Ready=True' substring- matched 'PaymentGateReady=True' when the ladder converged in time. Gate now polls the serving condition set (UpstreamHealthy + PaymentGateReady + RoutePublished, anchored greps) over 300s, which is exactly what §3's 402 probe exercises. - flow-04 step 12 used `curl -sf --max-time 120`: too tight for the FIRST inference ever routed through the Hermes agent pipeline on a local Ollama model (the multi-thousand-token system prompt pays full prompt processing before the KV cache warms; ~150s observed for a 27B on an M-series host), and -f swallowed every diagnostic so the fail message was empty. Now 300s, no -f, and the fail message carries the HTTP status + body snippet. Verified against a live cluster in the failing state: the new flow-16 gate passes where the old one cannot; the flow-04 call returns 200 with correct content once warm.

bussyjd force-pushed the fix/flow-gates-local-llm branch from b62dfe8 to 8f5c6bb Compare June 11, 2026 03:47

bussyjd mentioned this pull request Jun 11, 2026

release: v0.10.0-rc15 train — auto-repin CI + sell resume + flow gates + paid-MCP smoke #623

Closed

OisinKyne closed this Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(flows): flow-16 unsatisfiable Ready gate; flow-04 first-inference window on local models#621

fix(flows): flow-16 unsatisfiable Ready gate; flow-04 first-inference window on local models#621
bussyjd wants to merge 1 commit into
mainfrom
fix/flow-gates-local-llm

bussyjd commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bussyjd commented Jun 11, 2026

Summary

flow-16 §2.2 — the Ready gate could never pass

flow-04 step 12 — window too tight, diagnostics swallowed

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants