Fix #41: keep openclaw binary in sync + verify gateway is actually alive#42
Open
obaid wants to merge 1 commit into
Open
Fix #41: keep openclaw binary in sync + verify gateway is actually alive#42obaid wants to merge 1 commit into
obaid wants to merge 1 commit into
Conversation
The agent update script restarted openclaw-gateway without checking that
the on-disk binary still matches the version Provision had written config
for. If a previous run advanced the on-disk config (e.g. via
`openclaw gateway install --force` from a newer CLI invocation) while the
installed binary stayed older, restart exited with EX_CONFIG (78) and the
gateway stayed dead. The script's health check used `openclaw health`,
which still reported success, so the callback reported `status=updated`
and the agent stayed labeled `active` even though no messages could be
processed for any agent on the server.
- Sync the openclaw binary to config('provision.openclaw_version') with
`npm install -g openclaw@<pinned>` before `systemctl restart`.
- Replace the weak `openclaw health` check with the same
`openclaw gateway call health --timeout 5000` + `systemctl is-active`
retry loop that ServerSetupScriptService already uses on first install.
- Report `status=error` (with an explicit error_message) instead of
`status=updated&warning=health_check_failed` when the gateway never
comes back, so the existing webhook handler logs an error and does not
promote a Deploying agent to Active on a dead server.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #41.
config('provision.openclaw_version')withnpm install -g openclaw@<pinned>beforesystemctl restartinAgentUpdateScriptService, so the version-downgrade safety check in OpenClaw can't permanently fail the gateway after an upgrade run.openclaw healthcheck (which returned success even when the gateway service was failed) with the sameopenclaw gateway call health --timeout 5000+systemctl --user is-activeretry loop thatServerSetupScriptServicealready uses on first install.status=error(with an expliciterror_message) instead ofstatus=updated&warning=health_check_failedwhen the gateway never comes back, so the existing webhook handler logs at error level and does not promote aDeployingagent toActiveon a dead server.Verified manually on prod by reproducing the failure mode (#41 comment thread), then manually recovering with the same
npm install -g openclaw@<pinned>+ restart that this PR now bakes into every agent update.Test plan
php artisan test --compact --filter=AgentUpdateScriptTest(17 passed)php artisan test --compact tests/Feature/Api/(53 passed)vendor/bin/pint --dirty --format agentcleanstatus=updated.