Skip to content

Release v2026.4.29: cron sidecar + /api/system cold-path hardening#27

Merged
mudrii merged 12 commits into
mainfrom
fix/cron-sidecar-and-system-coldpath
Apr 29, 2026
Merged

Release v2026.4.29: cron sidecar + /api/system cold-path hardening#27
mudrii merged 12 commits into
mainfrom
fix/cron-sidecar-and-system-coldpath

Conversation

@mudrii
Copy link
Copy Markdown
Owner

@mudrii mudrii commented Apr 29, 2026

Pull Request

Type

  • fix — bug fix

Summary

Fix two GitHub-reported regressions and bundle release-hardening polish into v2026.4.29. #25: cron table rendered empty against OpenClaw v2026.4.20+ because runtime state moved to a new ~/.openclaw/cron/jobs-state.json sidecar that the dashboard wasn't reading. #26: /api/system cold path could hang 10–12s because version probes ran serially before the parallel host-metrics group, and the frontend Gateway Runtime card had no fetch timeout so it stayed on Loading… indefinitely on slow gateways. Bundled fixes: system.gatewayPort inheritance from ai.gatewayPort, systemd Environment= directives + restart on reinstall, per-instance latest-version fetcher (race fix), startup banner double-v prefix, rel="noopener noreferrer" on the GitHub header link.

Closes #25, closes #26.

What Changed

File What changed
internal/apprefresh/refresh.go CollectCrons now merges ~/.openclaw/cron/jobs-state.json sidecar by job.id; sidecar wins wholesale, inline state preserved as legacy fallback
internal/apprefresh/cron_state_test.go New regression coverage: sidecar-only / legacy-only / sidecar-missing-id / malformed-sidecar / both-present / lastRunStatus fallback
internal/appsystem/system_service.go Wraps cold refresh() in context.WithTimeout(ColdPathTimeoutMs); runs versions in parallel with runtime + host-metrics; sets degraded:true on partial collection; per-instance fetchLatest field replaces package var
internal/appsystem/cold_path_test.go New regression coverage: deadline honoured, degraded:true on partial, host metrics ship when gateway hangs, cancelled collection cannot poison version cache
internal/appconfig/config.go Added ColdPathTimeoutMs (default 4000, validated [200,15000]); removed GatewayPort: 18789 from Default() so Load() inheritance from ai.gatewayPort activates
internal/appservice/systemd.go Unit template emits Environment=OPENCLAW_HOME= + Environment=PATH=; install path uses restart instead of start
internal/appserver/server_routes.go Added doc comment above setCORSHeaders enumerating loopback-reflection invariants
web/index.html Sys.fetch() uses AbortController (6000ms ceiling); new renderGatewayDegraded(reason) paints State=Unavailable on timeout/error; Skills empty-state fallback; rel="noopener noreferrer" on GitHub link
main.go strings.TrimPrefix(version, "v") at both BuildVersion/detectVersion sites — fixes vv2026.4.x banner
VERSION, CHANGELOG.md, TODO.md Bumped to v2026.4.29; added changelog entry; marked released
README.md, docs/CONFIGURATION.md, examples/config.full.json Documented system.coldPathTimeoutMs and system.gatewayPort inheritance
docs/plans/2026-04-29-issue-25-26-fix-plan.md Planning doc preserved alongside other historical plans

Test Evidence

go test -race -v ./...
Test output
ok  	github.com/mudrii/openclaw-dashboard	43.316s
ok  	github.com/mudrii/openclaw-dashboard/internal/appchat	4.822s
ok  	github.com/mudrii/openclaw-dashboard/internal/appconfig	6.658s
ok  	github.com/mudrii/openclaw-dashboard/internal/apprefresh	4.265s
ok  	github.com/mudrii/openclaw-dashboard/internal/appruntime	6.052s
ok  	github.com/mudrii/openclaw-dashboard/internal/appserver	2.263s
ok  	github.com/mudrii/openclaw-dashboard/internal/appservice	4.743s
ok  	github.com/mudrii/openclaw-dashboard/internal/appsystem	7.785s

gofmt -l .   # (no output)
go vet ./... # (no output)
GOOS=linux GOARCH=amd64 go build ./...  # (no output, cross-compile clean)

Checklist

Code quality

  • No new globals outside the 7 module objects + 4 utilities ($, esc, safeColor, relTime)
  • Every dynamic value inserted into the DOM goes through esc()
  • No hardcoded hex colors — CSS variables only (var(--accent), etc.)
  • No new frontend dependencies (no import, no CDN <script>)
  • No new Go module dependencies (go.mod stays stdlib-only)

Tests

  • All existing tests pass: go test -race ./...
  • New behaviour has at least one test

Manual verification

  • Tested in at least one dark theme and one light theme
  • Tested on desktop and mobile viewport (< 768px)
  • If chart code changed: verified both 7d and 30d views
  • If session/cron table changed: verified scroll position preserved after refresh

Documentation

  • CHANGELOG.md updated under the correct version heading
  • README.md updated if a new panel or config key was added

Screenshots / Recordings

Backend + frontend-recovery PR. Gateway Runtime card behaviour change is verified via integration: a slow gateway now produces State=Unavailable instead of stuck Loading….

Breaking Changes

None. New system.coldPathTimeoutMs is optional with a 4000ms default. The system.gatewayPort change only activates inheritance when the value is omitted; user-supplied values still win. Cron sidecar merge is additive — pre-v2026.4.20 OpenClaw layouts continue to work via the legacy inline fallback.

Agent Review Notes

The cron sidecar merge contract is intentionally wholesale-replace (not field-merge) because OpenClaw v2026.4.20+ stops writing runtime state to jobs.json. See the doc comment in CollectCrons for the rationale. The cold-path deadline has a critical invariant: the version cache must only update on full success, otherwise a deadline-cancelled collection could persist a partial/empty version pair — cold_path_test.go exercises this directly.

mudrii added 11 commits April 29, 2026 18:55
OpenClaw v2026.4.20 split cron runtime state out of cron/jobs.json into
cron/jobs-state.json. The dashboard collector still read only jobs.json,
so the Cron Jobs table rendered blank Last run / Next run and "none"
status against newer OpenClaw versions (issue #25).

CollectCrons now reads the sidecar at the same directory and merges
state by job.id, with sidecar entries taking precedence over inline
state. Inline state is preserved as a fallback for pre-v2026.4.20
installs and for jobs the sidecar does not list. Sidecar files that are
absent or malformed are silently ignored — collection never fails on
sidecar issues.

Tests cover six split-store / legacy / fallback scenarios.

Refs #25
…lectors

A cold /api/system request could block ~10–12s when the gateway was unresponsive
(versions collected serially before the runtime probes, each with a 5s timeout).
The frontend Gateway Runtime card has no fetch timeout, so it sat on "Loading…"
indefinitely (issue #26).

- New SystemConfig.ColdPathTimeoutMs (default 4000, validated in [200, 15000])
  bounds the worst-case wall time of a single refresh.
- Versions, OpenClaw runtime, disk, and CPU/RAM/Swap now all collect in
  parallel under one context.WithTimeout. CollectOpenclawRuntime no longer
  needs versions up front — Status.{Current,Latest}Version are patched after
  wg.Wait() from the freshly collected versions.
- getVersionsCached only persists results when ctx finished cleanly, so a
  partial cold-path collection can never poison the warm cache.
- On deadline, response carries Degraded=true and "cold path: deadline
  exceeded" in errors so the frontend can paint a terminal state.

cold_path_test.go covers all four properties: deadline bound, degraded flag,
host metrics always shipped, no cache poisoning. Tests use a hanging httptest
server that honours r.Context() so the suite runs in <5s with -race.
Three small frontend changes that, together with the backend cold-path
deadline, keep the dashboard cards usable when the gateway is slow or
offline:

* SystemBar.fetch now wraps fetch('/api/system') in an AbortController
  with a 6s client timeout (just over the backend ~4s cold-path budget).
  HTTP errors and aborts call a new renderGatewayDegraded(reason) that
  paints the Gateway Runtime card with State=Unavailable + the reason
  ("timeout", "network error", "http 5xx") instead of leaving it stuck
  on "Loading…".

* Promote the local kv() row helper to SystemBar._kv. The previous
  references to kv() in the gateway runtime block were undefined at
  call time (kv was a const inside Renderer.render), which silently
  threw a ReferenceError and contributed to the empty card reported
  in #26. The new method is reused by renderGatewayDegraded.

* Skills grid now falls back to "No skills configured" when D.skills
  is null/empty, matching the existing Git Log empty-state pattern at
  the same render block.

No backend changes; all unit tests still pass with -race.
The sidecar override semantics (jobs-state.json wins wholesale over
jobs.json inline state when present) were implemented but only spot-
covered by tests. This commit:

- Expands the comment in CollectCrons to spell out *why* we replace
  rather than field-merge: in OpenClaw v2026.4.20+ inline state is
  pre-migration leftover, and a field-level merge could surface stale
  lastRun/nextRun values.
- Adds focused regression tests for the merge contract so future
  refactors keep the sidecar precedence and the legacy fallback path.
Replace the package-level fetchLatestVersion var with a per-instance
field on SystemService. The shared var raced with the
getLatestVersionCached background goroutine when tests overrode it
during cleanup; per-instance state isolates each test fully and removes
the cleanup race entirely.

Also expand the inline doc on the cold-path goroutine that drives
CollectOpenclawRuntime to spell out why versions and runtime probes run
in parallel (both probe the gateway; serializing would double cold-path
wall time) and where the openclaw status fields are patched in after
wg.Wait().

Tests updated to construct fetchLatest via the constructor field rather
than mutating a package-level seam.
Two related correctness fixes in the systemd service backend:

- Generate Environment= directives for OPENCLAW_HOME and PATH so the
  unit can locate the openclaw binary and OpenClaw runtime at activation
  time. Previously the unit relied on whatever environment systemd-user
  inherited, which was missing both on fresh machines.
- Switch the install path from systemctl start to systemctl restart so
  reinstalls with changed --bind/--port/Environment actually pick up
  the new unit content. systemctl restart also starts a non-running
  unit, so first-installs still work.

Helpers systemdOpenclawHome() and systemdPathEnv() compute the values
deterministically (OPENCLAW_HOME env override -> ~/.openclaw fallback;
PATH dedup with system bins appended) so the generated unit is stable
across reinstalls when nothing changed.
Default() previously seeded SystemConfig.GatewayPort with 18789, which
defeated the inheritance path: when config.json omitted
system.gatewayPort, Load() would see a non-zero value and skip the
"copy from ai.gatewayPort" fallback.

Leave SystemConfig.GatewayPort at the zero value in Default() and let
Load() inherit AI.GatewayPort when system.gatewayPort is missing. This
restores the documented behavior described in docs/CONFIGURATION.md
and matches user expectations: configure the gateway port once on the
ai block, system probes follow.

Comment on the SystemConfig zero-default locks in the invariant so a
future field-alignment cleanup does not regress it.
Three small polish items consolidated:

- main.go: normalize the version banner via strings.TrimPrefix so a
  VERSION file that includes a leading "v" no longer renders as
  "vv2026.4.x" in startup output. Applied at both the service-subcommand
  and main flow assignment sites for parity with the --version flag.
- web/index.html: add rel="noopener noreferrer" to the GitHub link in
  the header. target="_blank" without rel leaks window.opener to the
  external site and exposes a tabnabbing surface that browsers
  technically mitigate but auditors still flag.
- internal/appserver/server_routes.go: encode the loopback-reflection
  CORS invariants directly above setCORSHeaders so the next reader
  understands why we accept arbitrary localhost ports. The reasoning
  (loopback bind + no Allow-Credentials + server-side gateway token +
  rate limit on /api/chat) lived in commit history and review threads
  but not in the code itself.
Three doc updates that tracked recent code changes:

- README.md: add the system.coldPathTimeoutMs and system.gatewayPort
  rows to the configuration reference table.
- docs/CONFIGURATION.md: document coldPathTimeoutMs and surface the
  gatewayPort inheritance behavior (defaults to ai.gatewayPort when
  omitted) in the table.
- examples/config.full.json: add the full system block which was
  missing entirely; now mirrors the documented defaults so the example
  is copy-pasteable.

No behavior changes; pure documentation alignment.
- Bump VERSION to v2026.4.29.
- Add CHANGELOG entry covering #25 (cron sidecar merge), #26 (cold-path
  deadline + degraded UI fallback), and the release-hardening polish:
  gatewayPort inheritance, systemd Environment + restart, per-instance
  fetcher, version banner normalization, CORS doc, rel=noopener, docs.
- Move TODO.md released entries; archive the issue #25/#26 fix plan
  under docs/plans/ alongside the existing planning history.
Copilot AI review requested due to automatic review settings April 29, 2026 11:03
The Linux collector inlines ramFromMeminfo/swapFromMeminfo inside
collectCPURAMSwapParallel to share a single /proc/meminfo read between
RAM and Swap. The standalone collectRAM/collectSwap entry points have
no remaining callers on Linux (darwin and unsupported still use them
for their own platform-specific paths).

Caught by golangci-lint unused on the Linux runner — local darwin
go vet doesn't compile linux-build-tagged files, so this slipped past.
@mudrii mudrii merged commit 1842a70 into main Apr 29, 2026
4 checks passed
@mudrii mudrii deleted the fix/cron-sidecar-and-system-coldpath branch April 29, 2026 11:07
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Release hardening and bug fixes for OpenClaw Dashboard v2026.4.29, addressing cron runtime state changes introduced in OpenClaw v2026.4.20+ and bounding /api/system cold-path latency to prevent the UI from getting stuck in “Loading…”.

Changes:

  • Merge jobs-state.json sidecar into CollectCrons for OpenClaw v2026.4.20+ cron runtime state compatibility.
  • Add /api/system cold-path bounding (system.coldPathTimeoutMs) + parallel collectors; improve frontend /api/system fetch timeout + terminal “Unavailable” rendering.
  • Release hardening: fix system.gatewayPort inheritance, systemd unit Environment= + restart on reinstall, per-instance latest-version fetcher, version banner normalization, and external-link noopener.

Reviewed changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated no comments.

Show a summary per file
File Description
web/index.html Adds rel="noopener noreferrer", Skills empty-state fallback, and /api/system fetch timeout + degraded Gateway Runtime rendering.
main.go Normalizes version string by trimming a leading v to prevent double-v banner output.
internal/appsystem/system_service.go Introduces cold-path timeout bounding and parallelizes collectors; makes latest-version fetcher per-instance (test-race fix).
internal/appsystem/system_service_test.go Updates tests to override per-instance latest-version fetcher.
internal/appsystem/latest_version_test.go Removes global fetch override and uses per-instance injection to eliminate races.
internal/appsystem/cold_path_test.go Adds regression tests for cold-path deadline behavior, degraded flagging, and caching contracts.
internal/appsystem/bench_test.go Adjusts config literal formatting to match updated struct fields.
internal/appservice/systemd.go Adds Environment= lines for OPENCLAW_HOME and PATH; uses systemctl restart on install.
internal/appservice/systemd_test.go Extends unit-file assertions for Environment= and verifies restart call.
internal/appserver/server_routes.go Adds in-code documentation for the loopback CORS reflection logic.
internal/apprefresh/refresh.go Loads cron state sidecar (jobs-state.json) and applies sidecar-overrides contract with legacy fallback.
internal/apprefresh/cron_state_test.go Adds comprehensive cron sidecar regression coverage.
internal/apprefresh/testdata/cron/* Adds fixtures for split-store cron definitions and sidecar state variants.
internal/appconfig/config.go Adds system.coldPathTimeoutMs, fixes system.gatewayPort inheritance behavior, and validates new timeout.
internal/appconfig/config_test.go Adds tests locking in gatewayPort inheritance/override behaviors.
examples/config.full.json Adds full system block including coldPathTimeoutMs.
docs/CONFIGURATION.md Documents system.coldPathTimeoutMs.
README.md Documents system.coldPathTimeoutMs and clarifies system.gatewayPort behavior.
CHANGELOG.md Adds v2026.4.29 release notes covering the fixes and hardening.
VERSION Bumps version to v2026.4.29.
TODO.md Records v2026.4.29 as released with included fixes.
docs/plans/2026-04-29-issue-25-26-fix-plan.md Adds the validated plan document for issues #25 and #26.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several fixes and enhancements to the OpenClaw dashboard, including merging cron state from a sidecar file to fix empty tables and implementing a parallelized refresh process with a configurable cold-path timeout to reduce latency. It also improves the systemd service installation by adding necessary environment variables and switching to a restart-based update mechanism, while the frontend is updated to handle degraded states and fetch timeouts. Feedback suggests avoiding magic numbers for default timeouts by defining them as constants.

Comment on lines +141 to 144
coldPath := time.Duration(s.cfg.ColdPathTimeoutMs) * time.Millisecond
if coldPath <= 0 {
coldPath = 4 * time.Second
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The hardcoded default of 4 seconds for the cold path timeout should be defined as a constant or derived from a configuration default to avoid magic numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants