Skip to content

fix(daemon): stop restarting config-error gateways#1225

Open
ymote wants to merge 1 commit into
mainfrom
fix/daemon-config-error-no-restart-907
Open

fix(daemon): stop restarting config-error gateways#1225
ymote wants to merge 1 commit into
mainfrom
fix/daemon-config-error-no-restart-907

Conversation

@ymote

@ymote ymote commented May 24, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Classifies missing-env startup failures as non-retryable configuration_error states keyed by profile.
  • Skips fallback and watchdog auto-restarts while a profile is in configuration error, and suppresses profile-down health alerts for that state.
  • Surfaces configuration_error, error text, and error timestamp through process status responses and dashboard profile/gateway badges.
  • Refreshed from current origin/main with only the Daemon: auto-restart loops forever on env-var-missing config failure #907 daemon/dashboard files; no generated artifacts are included.

Closes #907

Validation

  • Base: origin/main at 1df02bd3975a66b232ae1a5c62ddf48297f88317
  • Head: b81c230def16f8ac89667afb8a7f2902c3b0d1d5
  • git diff --check --cached passed before commit
  • rustfmt --check crates/octos-cli/src/api/auth_handlers.rs crates/octos-cli/src/monitor.rs crates/octos-cli/src/process_manager.rs passed
  • cargo fmt --all -- --check passed
  • git ls-files --others --exclude-standard was empty
  • CARGO_TARGET_DIR=/private/tmp/octos-1225-refresh-target CARGO_INCREMENTAL=0 CARGO_PROFILE_DEV_DEBUG=0 cargo test -p octos-cli --features api configuration_error -- --nocapture passed: 4 passed
  • CARGO_TARGET_DIR=/private/tmp/octos-1225-refresh-target CARGO_INCREMENTAL=0 CARGO_PROFILE_DEV_DEBUG=0 cargo test -p octos-cli --features api process_manager::tests:: -- --nocapture passed: 39 passed
  • CARGO_TARGET_DIR=/private/tmp/octos-1225-refresh-target CARGO_INCREMENTAL=0 CARGO_PROFILE_DEV_DEBUG=0 cargo clippy --workspace --all-targets -- -D warnings passed
  • NPM_CONFIG_CACHE=/private/tmp/octos-npm-cache-1225-refresh npm --prefix dashboard ci passed; npm audit reports existing 3 vulnerabilities
  • npm --prefix dashboard run typecheck passed
  • npm --prefix dashboard run test passed: 6 files, 32 tests
  • VITE_OUT_DIR=/private/tmp/octos-1225-dashboard-dist npm --prefix dashboard run build passed
  • GitHub CI passed: dashboard, reject blocked author/committer emails, swarm-app, typos, check, test-octos-agent (lib), test-octos-agent (integration), test-octos-cli, check-matrix

Merge Status

  • MERGEABLE / BLOCKED with REVIEW_REQUIRED.

@ymote ymote requested a review from bobdingAI May 28, 2026 11:34
@ymote

ymote commented May 29, 2026

Copy link
Copy Markdown
Contributor Author

Validation evidence for #1225 against current main.

Scope checked:

  • Missing-env startup output is classified as a non-transient configuration error and extracts WISEMODEL_API_KEY.
  • Stored configuration errors include a timestamp and surface through status() / all_statuses() as status: "configuration_error" with the error message.
  • Auto-restart and monitor paths check stored configuration errors and skip restart loops until the profile is started/reloaded again.
  • Dashboard ProcessStatus typing, StatusBadge, ProfileCard, GatewayControls, and profile home display/handle the configuration-error state and message.

SHAs:

  • current main: 1fb4c88e020242d82cee245b331798ed6dc551e7
  • PR head: bb78f511379f2a45bf7d98fb8cf9489f9e459c1a
  • local rehearsal merge: 0da8429f38880a6a49b4ab5bae16f9e3dcba85cb

Environment:

  • rustc 1.95.0 (59807616e 2026-04-14)
  • cargo 1.95.0 (f2d3ce0bd 2026-03-21)
  • node v24.10.0
  • npm 11.6.0
  • Darwin 25.5.0 arm64

Commands run:

  • git diff --check origin/main...HEAD passed.
  • rustfmt --check crates/octos-cli/src/api/auth_handlers.rs crates/octos-cli/src/monitor.rs crates/octos-cli/src/process_manager.rs passed.
  • CARGO_TARGET_DIR=/private/tmp/octos-1225-current-1fb4-test-target CARGO_INCREMENTAL=0 cargo test -p octos-cli --features api configuration_error -- --nocapture passed: 4 passed.
  • CARGO_TARGET_DIR=/private/tmp/octos-1225-current-1fb4-test-target CARGO_INCREMENTAL=0 cargo test -p octos-cli --features api process_manager::tests:: -- --nocapture passed: 39 passed.
  • CARGO_TARGET_DIR=/private/tmp/octos-1225-current-1fb4-test-target CARGO_INCREMENTAL=0 cargo check -p octos-cli --features api passed.
  • CARGO_TARGET_DIR=/private/tmp/octos-1225-current-1fb4-test-target CARGO_INCREMENTAL=0 cargo clippy -p octos-cli --features api --lib --no-deps exited 0; current warning-level clippy output remains non-fatal.
  • npm --prefix dashboard ci passed; npm audit reported 20 vulnerabilities.
  • npm --prefix dashboard run typecheck passed.
  • VITE_OUT_DIR=/private/tmp/octos-1225-dashboard-dist npm --prefix dashboard run build passed.
  • git status --short --branch in the rehearsal clone was clean apart from ## main...origin/main [ahead 2]; no untracked files.

Artifacts:

  • Cargo target dir: /private/tmp/octos-1225-current-1fb4-test-target
  • Dashboard build dir: /private/tmp/octos-1225-dashboard-dist
  • No generated artifacts were left untracked in the clone.

Remaining gaps / gates:

  • cargo fmt --all -- --check still fails on current-main files outside this PR diff: crates/octos-agent/src/agent/loop_runner.rs, crates/octos-agent/src/loop_detect.rs, and crates/octos-cli/src/api/ui_protocol_ledger.rs.
  • I found no dedicated dashboard test matching configuration_error / config-error UI; dashboard coverage here is typecheck, production build, and code inspection.
  • GitHub still reports mergeable: MERGEABLE, mergeStateStatus: BLOCKED, reviewDecision: REVIEW_REQUIRED; merge is branch-protection blocked until required review lands.

@ymote

ymote commented May 30, 2026

Copy link
Copy Markdown
Contributor Author

Refreshed this #907 closure PR onto current origin/main.

Evidence:

  • base: 4b6d7218b14e91ece68ecde1ca10216975baea58
  • head: d84e99501c1934a6da9f630da4b6721134f371e3
  • isolated checkout: /Users/yuechen/home/octos/target/octos-1225-refresh-current-xTdhIt/octos
  • dashboard build output: /private/tmp/octos-1225-dashboard-dist
  • diff shape: daemon configuration-error classification/status surfacing, dashboard status badges/typing, and shared current-main fmt/clippy cleanup; no generated artifacts
  • local validation passed:
    • git diff --check origin/main...HEAD
    • cargo fmt --all -- --check
    • git ls-files --others --exclude-standard (empty)
    • CARGO_TARGET_DIR=/private/tmp/octos-1225-current-target CARGO_INCREMENTAL=0 CARGO_PROFILE_DEV_DEBUG=0 cargo test -p octos-cli --features api configuration_error -- --nocapture (4 passed)
    • CARGO_TARGET_DIR=/private/tmp/octos-1225-current-target CARGO_INCREMENTAL=0 CARGO_PROFILE_DEV_DEBUG=0 cargo test -p octos-cli --features api process_manager::tests:: -- --nocapture (39 passed)
    • CARGO_TARGET_DIR=/private/tmp/octos-1225-current-target CARGO_INCREMENTAL=0 CARGO_PROFILE_DEV_DEBUG=0 cargo clippy --workspace --all-targets -- -D warnings
    • npm --prefix dashboard ci (passed; npm audit reports existing 20 vulnerabilities)
    • npm --prefix dashboard run typecheck
    • npm --prefix dashboard run test (3 files, 19 tests)
    • VITE_OUT_DIR=/private/tmp/octos-1225-dashboard-dist npm --prefix dashboard run build

The PR still carries Closes #907; closure should happen through merge after required review and refreshed CI.

@ymote

ymote commented May 30, 2026

Copy link
Copy Markdown
Contributor Author

Refreshed CI is green at head d84e99501c1934a6da9f630da4b6721134f371e3 on base 4b6d7218b14e91ece68ecde1ca10216975baea58.

Required jobs succeeded: dashboard, reject blocked author/committer emails, swarm-app, typos, check, test-octos-agent (lib), test-octos-agent (integration), test-octos-cli, and check-matrix. Conditional platform/package/e2e/security jobs were skipped by workflow conditions.

Merge remains blocked only by required review (mergeable=MERGEABLE, mergeStateStatus=BLOCKED, reviewDecision=REVIEW_REQUIRED). The PR still carries Closes #907, so #907 should close automatically after review and merge.

@ymote ymote force-pushed the fix/daemon-config-error-no-restart-907 branch from d84e995 to b81c230 Compare June 1, 2026 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Daemon: auto-restart loops forever on env-var-missing config failure

1 participant