Release v2026.4.29: cron sidecar + /api/system cold-path hardening#27
Conversation
OpenClaw v2026.4.20 split cron runtime state out of cron/jobs.json into cron/jobs-state.json. The dashboard collector still read only jobs.json, so the Cron Jobs table rendered blank Last run / Next run and "none" status against newer OpenClaw versions (issue #25). CollectCrons now reads the sidecar at the same directory and merges state by job.id, with sidecar entries taking precedence over inline state. Inline state is preserved as a fallback for pre-v2026.4.20 installs and for jobs the sidecar does not list. Sidecar files that are absent or malformed are silently ignored — collection never fails on sidecar issues. Tests cover six split-store / legacy / fallback scenarios. Refs #25
…lectors A cold /api/system request could block ~10–12s when the gateway was unresponsive (versions collected serially before the runtime probes, each with a 5s timeout). The frontend Gateway Runtime card has no fetch timeout, so it sat on "Loading…" indefinitely (issue #26). - New SystemConfig.ColdPathTimeoutMs (default 4000, validated in [200, 15000]) bounds the worst-case wall time of a single refresh. - Versions, OpenClaw runtime, disk, and CPU/RAM/Swap now all collect in parallel under one context.WithTimeout. CollectOpenclawRuntime no longer needs versions up front — Status.{Current,Latest}Version are patched after wg.Wait() from the freshly collected versions. - getVersionsCached only persists results when ctx finished cleanly, so a partial cold-path collection can never poison the warm cache. - On deadline, response carries Degraded=true and "cold path: deadline exceeded" in errors so the frontend can paint a terminal state. cold_path_test.go covers all four properties: deadline bound, degraded flag, host metrics always shipped, no cache poisoning. Tests use a hanging httptest server that honours r.Context() so the suite runs in <5s with -race.
Three small frontend changes that, together with the backend cold-path
deadline, keep the dashboard cards usable when the gateway is slow or
offline:
* SystemBar.fetch now wraps fetch('/api/system') in an AbortController
with a 6s client timeout (just over the backend ~4s cold-path budget).
HTTP errors and aborts call a new renderGatewayDegraded(reason) that
paints the Gateway Runtime card with State=Unavailable + the reason
("timeout", "network error", "http 5xx") instead of leaving it stuck
on "Loading…".
* Promote the local kv() row helper to SystemBar._kv. The previous
references to kv() in the gateway runtime block were undefined at
call time (kv was a const inside Renderer.render), which silently
threw a ReferenceError and contributed to the empty card reported
in #26. The new method is reused by renderGatewayDegraded.
* Skills grid now falls back to "No skills configured" when D.skills
is null/empty, matching the existing Git Log empty-state pattern at
the same render block.
No backend changes; all unit tests still pass with -race.
The sidecar override semantics (jobs-state.json wins wholesale over jobs.json inline state when present) were implemented but only spot- covered by tests. This commit: - Expands the comment in CollectCrons to spell out *why* we replace rather than field-merge: in OpenClaw v2026.4.20+ inline state is pre-migration leftover, and a field-level merge could surface stale lastRun/nextRun values. - Adds focused regression tests for the merge contract so future refactors keep the sidecar precedence and the legacy fallback path.
Replace the package-level fetchLatestVersion var with a per-instance field on SystemService. The shared var raced with the getLatestVersionCached background goroutine when tests overrode it during cleanup; per-instance state isolates each test fully and removes the cleanup race entirely. Also expand the inline doc on the cold-path goroutine that drives CollectOpenclawRuntime to spell out why versions and runtime probes run in parallel (both probe the gateway; serializing would double cold-path wall time) and where the openclaw status fields are patched in after wg.Wait(). Tests updated to construct fetchLatest via the constructor field rather than mutating a package-level seam.
Two related correctness fixes in the systemd service backend: - Generate Environment= directives for OPENCLAW_HOME and PATH so the unit can locate the openclaw binary and OpenClaw runtime at activation time. Previously the unit relied on whatever environment systemd-user inherited, which was missing both on fresh machines. - Switch the install path from systemctl start to systemctl restart so reinstalls with changed --bind/--port/Environment actually pick up the new unit content. systemctl restart also starts a non-running unit, so first-installs still work. Helpers systemdOpenclawHome() and systemdPathEnv() compute the values deterministically (OPENCLAW_HOME env override -> ~/.openclaw fallback; PATH dedup with system bins appended) so the generated unit is stable across reinstalls when nothing changed.
Default() previously seeded SystemConfig.GatewayPort with 18789, which defeated the inheritance path: when config.json omitted system.gatewayPort, Load() would see a non-zero value and skip the "copy from ai.gatewayPort" fallback. Leave SystemConfig.GatewayPort at the zero value in Default() and let Load() inherit AI.GatewayPort when system.gatewayPort is missing. This restores the documented behavior described in docs/CONFIGURATION.md and matches user expectations: configure the gateway port once on the ai block, system probes follow. Comment on the SystemConfig zero-default locks in the invariant so a future field-alignment cleanup does not regress it.
Three small polish items consolidated: - main.go: normalize the version banner via strings.TrimPrefix so a VERSION file that includes a leading "v" no longer renders as "vv2026.4.x" in startup output. Applied at both the service-subcommand and main flow assignment sites for parity with the --version flag. - web/index.html: add rel="noopener noreferrer" to the GitHub link in the header. target="_blank" without rel leaks window.opener to the external site and exposes a tabnabbing surface that browsers technically mitigate but auditors still flag. - internal/appserver/server_routes.go: encode the loopback-reflection CORS invariants directly above setCORSHeaders so the next reader understands why we accept arbitrary localhost ports. The reasoning (loopback bind + no Allow-Credentials + server-side gateway token + rate limit on /api/chat) lived in commit history and review threads but not in the code itself.
Three doc updates that tracked recent code changes: - README.md: add the system.coldPathTimeoutMs and system.gatewayPort rows to the configuration reference table. - docs/CONFIGURATION.md: document coldPathTimeoutMs and surface the gatewayPort inheritance behavior (defaults to ai.gatewayPort when omitted) in the table. - examples/config.full.json: add the full system block which was missing entirely; now mirrors the documented defaults so the example is copy-pasteable. No behavior changes; pure documentation alignment.
- Bump VERSION to v2026.4.29. - Add CHANGELOG entry covering #25 (cron sidecar merge), #26 (cold-path deadline + degraded UI fallback), and the release-hardening polish: gatewayPort inheritance, systemd Environment + restart, per-instance fetcher, version banner normalization, CORS doc, rel=noopener, docs. - Move TODO.md released entries; archive the issue #25/#26 fix plan under docs/plans/ alongside the existing planning history.
The Linux collector inlines ramFromMeminfo/swapFromMeminfo inside collectCPURAMSwapParallel to share a single /proc/meminfo read between RAM and Swap. The standalone collectRAM/collectSwap entry points have no remaining callers on Linux (darwin and unsupported still use them for their own platform-specific paths). Caught by golangci-lint unused on the Linux runner — local darwin go vet doesn't compile linux-build-tagged files, so this slipped past.
There was a problem hiding this comment.
Pull request overview
Release hardening and bug fixes for OpenClaw Dashboard v2026.4.29, addressing cron runtime state changes introduced in OpenClaw v2026.4.20+ and bounding /api/system cold-path latency to prevent the UI from getting stuck in “Loading…”.
Changes:
- Merge
jobs-state.jsonsidecar intoCollectCronsfor OpenClaw v2026.4.20+ cron runtime state compatibility. - Add
/api/systemcold-path bounding (system.coldPathTimeoutMs) + parallel collectors; improve frontend/api/systemfetch timeout + terminal “Unavailable” rendering. - Release hardening: fix
system.gatewayPortinheritance, systemd unitEnvironment=+restarton reinstall, per-instance latest-version fetcher, version banner normalization, and external-link noopener.
Reviewed changes
Copilot reviewed 25 out of 26 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| web/index.html | Adds rel="noopener noreferrer", Skills empty-state fallback, and /api/system fetch timeout + degraded Gateway Runtime rendering. |
| main.go | Normalizes version string by trimming a leading v to prevent double-v banner output. |
| internal/appsystem/system_service.go | Introduces cold-path timeout bounding and parallelizes collectors; makes latest-version fetcher per-instance (test-race fix). |
| internal/appsystem/system_service_test.go | Updates tests to override per-instance latest-version fetcher. |
| internal/appsystem/latest_version_test.go | Removes global fetch override and uses per-instance injection to eliminate races. |
| internal/appsystem/cold_path_test.go | Adds regression tests for cold-path deadline behavior, degraded flagging, and caching contracts. |
| internal/appsystem/bench_test.go | Adjusts config literal formatting to match updated struct fields. |
| internal/appservice/systemd.go | Adds Environment= lines for OPENCLAW_HOME and PATH; uses systemctl restart on install. |
| internal/appservice/systemd_test.go | Extends unit-file assertions for Environment= and verifies restart call. |
| internal/appserver/server_routes.go | Adds in-code documentation for the loopback CORS reflection logic. |
| internal/apprefresh/refresh.go | Loads cron state sidecar (jobs-state.json) and applies sidecar-overrides contract with legacy fallback. |
| internal/apprefresh/cron_state_test.go | Adds comprehensive cron sidecar regression coverage. |
| internal/apprefresh/testdata/cron/* | Adds fixtures for split-store cron definitions and sidecar state variants. |
| internal/appconfig/config.go | Adds system.coldPathTimeoutMs, fixes system.gatewayPort inheritance behavior, and validates new timeout. |
| internal/appconfig/config_test.go | Adds tests locking in gatewayPort inheritance/override behaviors. |
| examples/config.full.json | Adds full system block including coldPathTimeoutMs. |
| docs/CONFIGURATION.md | Documents system.coldPathTimeoutMs. |
| README.md | Documents system.coldPathTimeoutMs and clarifies system.gatewayPort behavior. |
| CHANGELOG.md | Adds v2026.4.29 release notes covering the fixes and hardening. |
| VERSION | Bumps version to v2026.4.29. |
| TODO.md | Records v2026.4.29 as released with included fixes. |
| docs/plans/2026-04-29-issue-25-26-fix-plan.md | Adds the validated plan document for issues #25 and #26. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Code Review
This pull request introduces several fixes and enhancements to the OpenClaw dashboard, including merging cron state from a sidecar file to fix empty tables and implementing a parallelized refresh process with a configurable cold-path timeout to reduce latency. It also improves the systemd service installation by adding necessary environment variables and switching to a restart-based update mechanism, while the frontend is updated to handle degraded states and fetch timeouts. Feedback suggests avoiding magic numbers for default timeouts by defining them as constants.
| coldPath := time.Duration(s.cfg.ColdPathTimeoutMs) * time.Millisecond | ||
| if coldPath <= 0 { | ||
| coldPath = 4 * time.Second | ||
| } |
Pull Request
Type
fix— bug fixSummary
Fix two GitHub-reported regressions and bundle release-hardening polish into v2026.4.29. #25: cron table rendered empty against OpenClaw v2026.4.20+ because runtime state moved to a new
~/.openclaw/cron/jobs-state.jsonsidecar that the dashboard wasn't reading. #26:/api/systemcold path could hang 10–12s because version probes ran serially before the parallel host-metrics group, and the frontend Gateway Runtime card had no fetch timeout so it stayed onLoading…indefinitely on slow gateways. Bundled fixes:system.gatewayPortinheritance fromai.gatewayPort, systemdEnvironment=directives +restarton reinstall, per-instance latest-version fetcher (race fix), startup banner double-vprefix,rel="noopener noreferrer"on the GitHub header link.Closes #25, closes #26.
What Changed
internal/apprefresh/refresh.goCollectCronsnow merges~/.openclaw/cron/jobs-state.jsonsidecar byjob.id; sidecar wins wholesale, inline state preserved as legacy fallbackinternal/apprefresh/cron_state_test.golastRunStatusfallbackinternal/appsystem/system_service.gorefresh()incontext.WithTimeout(ColdPathTimeoutMs); runs versions in parallel with runtime + host-metrics; setsdegraded:trueon partial collection; per-instancefetchLatestfield replaces package varinternal/appsystem/cold_path_test.godegraded:trueon partial, host metrics ship when gateway hangs, cancelled collection cannot poison version cacheinternal/appconfig/config.goColdPathTimeoutMs(default 4000, validated [200,15000]); removedGatewayPort: 18789fromDefault()soLoad()inheritance fromai.gatewayPortactivatesinternal/appservice/systemd.goEnvironment=OPENCLAW_HOME=+Environment=PATH=; install path usesrestartinstead ofstartinternal/appserver/server_routes.gosetCORSHeadersenumerating loopback-reflection invariantsweb/index.htmlSys.fetch()usesAbortController(6000ms ceiling); newrenderGatewayDegraded(reason)paintsState=Unavailableon timeout/error; Skills empty-state fallback;rel="noopener noreferrer"on GitHub linkmain.gostrings.TrimPrefix(version, "v")at bothBuildVersion/detectVersionsites — fixesvv2026.4.xbannerVERSION,CHANGELOG.md,TODO.mdv2026.4.29; added changelog entry; marked releasedREADME.md,docs/CONFIGURATION.md,examples/config.full.jsonsystem.coldPathTimeoutMsandsystem.gatewayPortinheritancedocs/plans/2026-04-29-issue-25-26-fix-plan.mdTest Evidence
Test output
Checklist
Code quality
$,esc,safeColor,relTime)esc()var(--accent), etc.)import, no CDN<script>)go.modstays stdlib-only)Tests
go test -race ./...Manual verification
Documentation
CHANGELOG.mdupdated under the correct version headingREADME.mdupdated if a new panel or config key was addedScreenshots / Recordings
Backend + frontend-recovery PR. Gateway Runtime card behaviour change is verified via integration: a slow gateway now produces
State=Unavailableinstead of stuckLoading….Breaking Changes
None. New
system.coldPathTimeoutMsis optional with a 4000ms default. Thesystem.gatewayPortchange only activates inheritance when the value is omitted; user-supplied values still win. Cron sidecar merge is additive — pre-v2026.4.20 OpenClaw layouts continue to work via the legacy inline fallback.Agent Review Notes
The cron sidecar merge contract is intentionally wholesale-replace (not field-merge) because OpenClaw v2026.4.20+ stops writing runtime state to
jobs.json. See the doc comment inCollectCronsfor the rationale. The cold-path deadline has a critical invariant: the version cache must only update on full success, otherwise a deadline-cancelled collection could persist a partial/empty version pair —cold_path_test.goexercises this directly.