feat: hot-reload trigger config without restart#285
feat: hot-reload trigger config without restart#285Nathan Schram (nathanschram) merged 2 commits intodevfrom
Conversation
Introduce TriggerManager — a mutable holder for cron and webhook configuration that the cron scheduler and webhook server reference at runtime. On config file change, handle_reload() re-parses the [triggers] TOML section and calls manager.update(), making added, removed, or modified crons and webhooks take effect immediately without killing active runs. Key changes: - triggers/manager.py: new TriggerManager class with atomic swap, change logging, and no-auth webhook warnings - triggers/cron.py: run_cron_scheduler() reads manager.crons each tick; last_fired dict preserved across reloads - triggers/server.py: build_webhook_app() accepts optional manager for dynamic webhook lookups per request (backwards-compatible) - telegram/loop.py: creates TriggerManager at startup, passes to cron/server, handle_reload() updates on config change 13 new tests covering manager lifecycle, webhook server hot-reload (add/remove/update routes, secret changes, health count), and cron schedule swapping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- CLAUDE.md: add hot-reload feature bullet, triggers/manager.py to key files table, test_trigger_manager.py to test list, update test count to 2038 - README.md: enhance "Scheduled tasks" feature line to mention hot-reload - docs/reference/triggers/triggers.md: add "Hot-reload" section with reload-vs-restart tables and TriggerManager explanation - docs/reference/config.md: add hot-reload tip to [triggers] section - docs/how-to/webhooks-and-cron.md: add "Hot-reload configuration" section with example Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ibility, restart Tier 1 (#271, #286, #287, #288) Bundles four rc4 features plus a CHANGELOG entry for #283 (diff_preview gate, already on dev as 8c04904). Full details in CHANGELOG.md. #269/#285 hot-reload triggers: merged separately as PR #285 (squash-merged to dev); this commit extends TriggerManager with rc4 helpers (remove_cron, crons_for_chat, webhooks_for_chat, cron_ids, webhook_ids) for Features 4b and 5 below. #288 — /at command and run_once cron flag: - new telegram/at_scheduler.py — module-level task-group + run_job holder; schedule_delayed_run(), cancel_pending_for_chat(), active_count(); per-chat cap of 20 pending delays - new telegram/commands/at.py — AtCommand backend, /at <duration> <prompt> with Ns/Nm/Nh suffixes, 60s-24h range - /cancel integration via cancel_pending_for_chat() - drain integration via active_count() in _drain_and_exit - entry-point at = untether.telegram.commands.at:BACKEND - CronConfig.run_once: bool = False; scheduler removes cron after fire if run_once=True; re-enters on reload/restart #286 — unfreeze TelegramBridgeConfig: - drop frozen=True (slots preserved); add update_from(settings) method - route_update() reads cfg.allowed_user_ids live; handle_reload() calls update_from() and refreshes state.forward_coalesce_s / media_group_debounce_s - restart-only keys still warn (bot_token, chat_id, session_mode, topics, message_overflow); others hot-reload #271 — trigger visibility Tier 1: - new triggers/describe.py — describe_cron(schedule, timezone) utility - /ping shows per-chat trigger indicator when triggers target the chat - RunContext.trigger_source field; dispatcher sets it to cron:<id>/webhook:<id>; runner_bridge seeds progress_tracker.meta['trigger'] with icon + source; ProgressTracker.note_event merges engine meta over dispatcher meta - format_meta_line() appends 'trigger' to footer parts - CommandContext gains trigger_manager, default_chat_id fields (default None); populated by telegram/commands/dispatch.py from cfg #287 — graceful restart Tier 1: - new sdnotify.py — stdlib sd_notify client (READY=1 / STOPPING=1); poll_updates sends READY=1 after _send_startup succeeds; _drain_and_exit sends STOPPING=1 at drain start - new telegram/offset_persistence.py — DebouncedOffsetWriter; loads saved update_id on startup, persists via on_offset_advanced callback in poll_incoming; flushes in poll_updates finally block - contrib/untether.service: Type=notify, NotifyAccess=main, RestartSec=2 Tests: +224 tests added across 6 new test files and 6 extended files; 2164 total tests pass with 81.55% coverage. Context files (CLAUDE.md, .claude/rules/*) and human docs (README, triggers reference, dev-instance, integration-testing, webhooks-and-cron how-to, commands-and-directives) updated. rc4 integration test scenarios R1-R10 added to integration-testing.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.35.1rc4 Status — Merged + ExtendedSquash-merged to PR #292 (rc4 feature branch) extends the TriggerManager with additional helpers:
All original PR #285 functionality preserved. See #269 comment for the full hot-reload scope including #286 (unfrozen bridge config). |
…t Tier 1 (#271, #286, #287, #288) * fix: stop Untether being the preferred OOM victim (#275) systemd user services inherit OOMScoreAdjust=200 + OOMPolicy=stop defaults, which made Untether's engine subprocesses preferred earlyoom/kernel OOM killer targets ahead of CLI claude (oom_score_adj=0) and orphaned grandchildren actually consuming the RAM. When lba-1 ran low on RAM, live Telegram chats died with rc=143 (SIGTERM) while the processes actually eating the RAM survived. Updates contrib/untether.service with: - OOMScoreAdjust=-100 — documents intent; kernel clamps to the parent baseline for unprivileged users (typically 100), but takes effect if the parent user@UID.service is ever overridden lower - OOMPolicy=continue — a single OOM-killed child no longer tears down the whole unit cgroup; previously every live chat died at once Also updates docs/reference/dev-instance.md with a new OOM section covering the asymmetry, the clamping caveat, and the optional sudo systemctl edit user@UID.service override for operators who want Untether's children to live longer than CLI processes. Existing installs need to copy the unit file and `systemctl --user daemon-reload`; staging picks up the change on the next `scripts/staging.sh install` cycle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: v0.35.1rc4 — /at command, hot-reload bridge config, trigger visibility, restart Tier 1 (#271, #286, #287, #288) Bundles four rc4 features plus a CHANGELOG entry for #283 (diff_preview gate, already on dev as 8c04904). Full details in CHANGELOG.md. #269/#285 hot-reload triggers: merged separately as PR #285 (squash-merged to dev); this commit extends TriggerManager with rc4 helpers (remove_cron, crons_for_chat, webhooks_for_chat, cron_ids, webhook_ids) for Features 4b and 5 below. #288 — /at command and run_once cron flag: - new telegram/at_scheduler.py — module-level task-group + run_job holder; schedule_delayed_run(), cancel_pending_for_chat(), active_count(); per-chat cap of 20 pending delays - new telegram/commands/at.py — AtCommand backend, /at <duration> <prompt> with Ns/Nm/Nh suffixes, 60s-24h range - /cancel integration via cancel_pending_for_chat() - drain integration via active_count() in _drain_and_exit - entry-point at = untether.telegram.commands.at:BACKEND - CronConfig.run_once: bool = False; scheduler removes cron after fire if run_once=True; re-enters on reload/restart #286 — unfreeze TelegramBridgeConfig: - drop frozen=True (slots preserved); add update_from(settings) method - route_update() reads cfg.allowed_user_ids live; handle_reload() calls update_from() and refreshes state.forward_coalesce_s / media_group_debounce_s - restart-only keys still warn (bot_token, chat_id, session_mode, topics, message_overflow); others hot-reload #271 — trigger visibility Tier 1: - new triggers/describe.py — describe_cron(schedule, timezone) utility - /ping shows per-chat trigger indicator when triggers target the chat - RunContext.trigger_source field; dispatcher sets it to cron:<id>/webhook:<id>; runner_bridge seeds progress_tracker.meta['trigger'] with icon + source; ProgressTracker.note_event merges engine meta over dispatcher meta - format_meta_line() appends 'trigger' to footer parts - CommandContext gains trigger_manager, default_chat_id fields (default None); populated by telegram/commands/dispatch.py from cfg #287 — graceful restart Tier 1: - new sdnotify.py — stdlib sd_notify client (READY=1 / STOPPING=1); poll_updates sends READY=1 after _send_startup succeeds; _drain_and_exit sends STOPPING=1 at drain start - new telegram/offset_persistence.py — DebouncedOffsetWriter; loads saved update_id on startup, persists via on_offset_advanced callback in poll_incoming; flushes in poll_updates finally block - contrib/untether.service: Type=notify, NotifyAccess=main, RestartSec=2 Tests: +224 tests added across 6 new test files and 6 extended files; 2164 total tests pass with 81.55% coverage. Context files (CLAUDE.md, .claude/rules/*) and human docs (README, triggers reference, dev-instance, integration-testing, webhooks-and-cron how-to, commands-and-directives) updated. rc4 integration test scenarios R1-R10 added to integration-testing.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent /at timers from firing after /cancel (CancelScope race) anyio.CancelScope.__exit__ swallows the Cancelled exception when the scope itself caused the cancellation. The fire/dispatch code outside the scope continued regardless. Added cancelled_caught check after the scope exits to prevent stale timers from dispatching. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add v0.35.1rc4 integration test plan 52-test plan covering all rc4 features: /at command, run_once, hot-reload (triggers + bridge config), trigger visibility, graceful restart Tier 1, plus standard Tier 1/6/7 regression. Includes correct dev bot chat IDs (Bot API + Telethon MCP mapping), pre-test trigger config, results template, and known caveats. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update integration test chat IDs to current dev bot The old ut-dev-hf: chat IDs (5171122044 etc.) belong to a different bot (ID 8485467124). Updated both docs to the current @untether_dev_bot chats with both Telethon and Bot API ID forms. Added note about Telegram MCP PeerUser fallback for channel/supergroup IDs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: comprehensive v0.35.1 documentation updates HIGH priority: - config.md: add run_once to cron table, fix watch_config description to list hot-reloadable vs restart-only settings - operations.md: fix hot-reload section (transport settings ARE now partially hot-reloadable), add /ping trigger format, update_id persistence, systemd section with Type=notify/OOM notes - schedule-tasks.md: add /at command section with examples, run_once mention MEDIUM priority: - triggers.md: remove duplicate hot-reload section, keep authoritative version with watch_config requirement and last_fired note - CLAUDE.md: add diff_preview plan bypass (#283) to features list - troubleshooting.md: add entries for config hot-reload issues and /at delay not firing LOW priority: - security.md: document untrusted-payload prefix for webhooks/cron - voice-notes.md: note that voice settings hot-reload - specification.md: bump version to v0.35.1 - tutorials: update version numbers from 0.35.0 to 0.35.1 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Implements hot-reload for trigger configuration (crons and webhooks) so that editing
untether.tomlapplies changes immediately without restarting Untether or killing active runs. Closes #269.TriggerManagerclass (triggers/manager.py) — mutable holder that both the cron scheduler and webhook server reference at runtimemanager.cronseach tick — new crons fire immediately, removed crons stop matching,last_fireddict preserved to prevent double-firingmanager.webhook_for_path()per request — new webhooks accessible immediately, removed webhooks return 404, auth/secret changes take effect on next requesthandle_reload()) re-reads raw TOML[triggers]section and callsmanager.update()on every config file changeWhat's now live-reloadable (no restart needed)
What still requires a restart
triggers.enabledtoggle (off→on)session_mode(stateless↔chat)topics.enabledtoggleshutil.which()at startupallowed_user_ids/chat_idsTelegramBridgeConfig(future work)TelegramBridgeConfig(future work)TelegramBridgeConfig(future work)Future work: unfreezing TelegramBridgeConfig
The
TelegramBridgeConfigdataclass isfrozen=True, which blocks hot-reload for voice, files, chat_ids, and timing settings. These fields are already read per-message (correct pattern), so unfreezing the config or introducing a config wrapper would make them reloadable with minimal logic changes. Tracked separately from this PR.Design decisions
build_webhook_app()acceptsmanager=None(default), so all existing tests pass without changesTriggerManager.update()creates new list/dict objects, so iterations over the old cron list are unaffected (Python'sforloop grabs the iterator at the start)Test plan
test_trigger_manager.py:ruff check,ruff format --check)@untether_dev_bot: edit TOML, add cron, verify fires without restart🤖 Generated with Claude Code