Skip to content

feat: hot-reload trigger config without restart#285

Merged
Nathan Schram (nathanschram) merged 2 commits intodevfrom
feature/trigger-hot-reload
Apr 14, 2026
Merged

feat: hot-reload trigger config without restart#285
Nathan Schram (nathanschram) merged 2 commits intodevfrom
feature/trigger-hot-reload

Conversation

@nathanschram
Copy link
Copy Markdown
Member

Summary

Implements hot-reload for trigger configuration (crons and webhooks) so that editing untether.toml applies changes immediately without restarting Untether or killing active runs. Closes #269.

  • New TriggerManager class (triggers/manager.py) — mutable holder that both the cron scheduler and webhook server reference at runtime
  • Cron scheduler reads manager.crons each tick — new crons fire immediately, removed crons stop matching, last_fired dict preserved to prevent double-firing
  • Webhook server uses manager.webhook_for_path() per request — new webhooks accessible immediately, removed webhooks return 404, auth/secret changes take effect on next request
  • Config watcher (handle_reload()) re-reads raw TOML [triggers] section and calls manager.update() on every config file change

What's now live-reloadable (no restart needed)

Setting Mechanism
Cron schedules (add/remove/modify) TriggerManager — scheduler reads each tick
Webhook routes (add/remove/modify) TriggerManager — per-request lookup
Webhook auth/secrets TriggerManager — per-request lookup
Webhook actions (agent_run, file_write, etc.) TriggerManager — per-request lookup
Multipart/file upload settings TriggerManager — per-request lookup
Cron fetch config TriggerManager — read at fire time
Cron timezone (per-cron and default) TriggerManager — read each tick
Projects config / chat routing Already reloadable via TransportRuntime
Engine defaults Already reloadable via TransportRuntime
Watchdog, preamble, footer, cost settings Already per-run via load_settings_if_exists()
Command menu, topic scope Already refreshed in handle_reload()

What still requires a restart

Setting Why
Bot token / chat ID Baked into Telegram client at startup
Webhook server host/port aiohttp binds once; changing port needs new listener
Transport type Architectural — entire transport stack
triggers.enabled toggle (off→on) Webhook server and cron scheduler must be started
session_mode (stateless↔chat) Requires state store initialisation
topics.enabled toggle Requires topic state store initialisation
New engine binaries Resolved via shutil.which() at startup
allowed_user_ids / chat_ids Frozen in TelegramBridgeConfig (future work)
Voice transcription settings Frozen in TelegramBridgeConfig (future work)
Files settings Frozen in TelegramBridgeConfig (future work)

Future work: unfreezing TelegramBridgeConfig

The TelegramBridgeConfig dataclass is frozen=True, which blocks hot-reload for voice, files, chat_ids, and timing settings. These fields are already read per-message (correct pattern), so unfreezing the config or introducing a config wrapper would make them reloadable with minimal logic changes. Tracked separately from this PR.

Design decisions

  • Backwards-compatible: build_webhook_app() accepts manager=None (default), so all existing tests pass without changes
  • In-flight safety: TriggerManager.update() creates new list/dict objects, so iterations over the old cron list are unaffected (Python's for loop grabs the iterator at the start)
  • Webhook server always starts when triggers are enabled — even with no initial webhooks — so new webhooks added via reload are immediately accessible
  • Cron scheduler always starts when triggers are enabled — idles when cron list is empty, picks up new crons on next tick

Test plan

  • 13 new tests in test_trigger_manager.py:
    • Manager init, update, clear, timezone, in-flight iteration safety
    • Webhook server: add/remove/update routes, secret changes, health count
    • Cron schedule swapping, timezone updates
  • All 2038 existing tests pass (81.55% coverage)
  • Lint and format clean (ruff check, ruff format --check)
  • Integration test via @untether_dev_bot: edit TOML, add cron, verify fires without restart
  • Integration test: add webhook route, verify accessible without restart

🤖 Generated with Claude Code

Introduce TriggerManager — a mutable holder for cron and webhook
configuration that the cron scheduler and webhook server reference
at runtime.  On config file change, handle_reload() re-parses the
[triggers] TOML section and calls manager.update(), making added,
removed, or modified crons and webhooks take effect immediately
without killing active runs.

Key changes:
- triggers/manager.py: new TriggerManager class with atomic swap,
  change logging, and no-auth webhook warnings
- triggers/cron.py: run_cron_scheduler() reads manager.crons each
  tick; last_fired dict preserved across reloads
- triggers/server.py: build_webhook_app() accepts optional manager
  for dynamic webhook lookups per request (backwards-compatible)
- telegram/loop.py: creates TriggerManager at startup, passes to
  cron/server, handle_reload() updates on config change

13 new tests covering manager lifecycle, webhook server hot-reload
(add/remove/update routes, secret changes, health count), and cron
schedule swapping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1e6bdba8-4113-4281-aed2-0c96a95c9437

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/trigger-hot-reload

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

- CLAUDE.md: add hot-reload feature bullet, triggers/manager.py to key
  files table, test_trigger_manager.py to test list, update test count
  to 2038
- README.md: enhance "Scheduled tasks" feature line to mention hot-reload
- docs/reference/triggers/triggers.md: add "Hot-reload" section with
  reload-vs-restart tables and TriggerManager explanation
- docs/reference/config.md: add hot-reload tip to [triggers] section
- docs/how-to/webhooks-and-cron.md: add "Hot-reload configuration"
  section with example

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nathanschram Nathan Schram (nathanschram) merged commit 74ea778 into dev Apr 14, 2026
21 checks passed
@nathanschram Nathan Schram (nathanschram) deleted the feature/trigger-hot-reload branch April 14, 2026 04:33
Nathan Schram (nathanschram) added a commit that referenced this pull request Apr 14, 2026
…ibility, restart Tier 1 (#271, #286, #287, #288)

Bundles four rc4 features plus a CHANGELOG entry for #283 (diff_preview gate,
already on dev as 8c04904). Full details in CHANGELOG.md.

#269/#285 hot-reload triggers: merged separately as PR #285 (squash-merged to
dev); this commit extends TriggerManager with rc4 helpers (remove_cron,
crons_for_chat, webhooks_for_chat, cron_ids, webhook_ids) for Features 4b and
5 below.

#288 — /at command and run_once cron flag:
- new telegram/at_scheduler.py — module-level task-group + run_job holder;
  schedule_delayed_run(), cancel_pending_for_chat(), active_count();
  per-chat cap of 20 pending delays
- new telegram/commands/at.py — AtCommand backend, /at <duration> <prompt>
  with Ns/Nm/Nh suffixes, 60s-24h range
- /cancel integration via cancel_pending_for_chat()
- drain integration via active_count() in _drain_and_exit
- entry-point at = untether.telegram.commands.at:BACKEND
- CronConfig.run_once: bool = False; scheduler removes cron after fire
  if run_once=True; re-enters on reload/restart

#286 — unfreeze TelegramBridgeConfig:
- drop frozen=True (slots preserved); add update_from(settings) method
- route_update() reads cfg.allowed_user_ids live; handle_reload() calls
  update_from() and refreshes state.forward_coalesce_s / media_group_debounce_s
- restart-only keys still warn (bot_token, chat_id, session_mode, topics,
  message_overflow); others hot-reload

#271 — trigger visibility Tier 1:
- new triggers/describe.py — describe_cron(schedule, timezone) utility
- /ping shows per-chat trigger indicator when triggers target the chat
- RunContext.trigger_source field; dispatcher sets it to cron:<id>/webhook:<id>;
  runner_bridge seeds progress_tracker.meta['trigger'] with icon + source;
  ProgressTracker.note_event merges engine meta over dispatcher meta
- format_meta_line() appends 'trigger' to footer parts
- CommandContext gains trigger_manager, default_chat_id fields (default None);
  populated by telegram/commands/dispatch.py from cfg

#287 — graceful restart Tier 1:
- new sdnotify.py — stdlib sd_notify client (READY=1 / STOPPING=1);
  poll_updates sends READY=1 after _send_startup succeeds;
  _drain_and_exit sends STOPPING=1 at drain start
- new telegram/offset_persistence.py — DebouncedOffsetWriter; loads saved
  update_id on startup, persists via on_offset_advanced callback in
  poll_incoming; flushes in poll_updates finally block
- contrib/untether.service: Type=notify, NotifyAccess=main, RestartSec=2

Tests: +224 tests added across 6 new test files and 6 extended files;
2164 total tests pass with 81.55% coverage.

Context files (CLAUDE.md, .claude/rules/*) and human docs (README, triggers
reference, dev-instance, integration-testing, webhooks-and-cron how-to,
commands-and-directives) updated. rc4 integration test scenarios R1-R10
added to integration-testing.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nathanschram
Copy link
Copy Markdown
Member Author

v0.35.1rc4 Status — Merged + Extended

Squash-merged to dev as commit 74ea778 on 2026-04-14. Closes #269.

PR #292 (rc4 feature branch) extends the TriggerManager with additional helpers:

All original PR #285 functionality preserved. See #269 comment for the full hot-reload scope including #286 (unfrozen bridge config).

Nathan Schram (nathanschram) added a commit that referenced this pull request Apr 14, 2026
…t Tier 1 (#271, #286, #287, #288)

* fix: stop Untether being the preferred OOM victim (#275)

systemd user services inherit OOMScoreAdjust=200 + OOMPolicy=stop
defaults, which made Untether's engine subprocesses preferred
earlyoom/kernel OOM killer targets ahead of CLI claude
(oom_score_adj=0) and orphaned grandchildren actually consuming the
RAM. When lba-1 ran low on RAM, live Telegram chats died with rc=143
(SIGTERM) while the processes actually eating the RAM survived.

Updates contrib/untether.service with:

- OOMScoreAdjust=-100 — documents intent; kernel clamps to the parent
  baseline for unprivileged users (typically 100), but takes effect
  if the parent user@UID.service is ever overridden lower
- OOMPolicy=continue — a single OOM-killed child no longer tears
  down the whole unit cgroup; previously every live chat died at once

Also updates docs/reference/dev-instance.md with a new OOM section
covering the asymmetry, the clamping caveat, and the optional
sudo systemctl edit user@UID.service override for operators who
want Untether's children to live longer than CLI processes.

Existing installs need to copy the unit file and
`systemctl --user daemon-reload`; staging picks up the change on
the next `scripts/staging.sh install` cycle.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: v0.35.1rc4 — /at command, hot-reload bridge config, trigger visibility, restart Tier 1 (#271, #286, #287, #288)

Bundles four rc4 features plus a CHANGELOG entry for #283 (diff_preview gate,
already on dev as 8c04904). Full details in CHANGELOG.md.

#269/#285 hot-reload triggers: merged separately as PR #285 (squash-merged to
dev); this commit extends TriggerManager with rc4 helpers (remove_cron,
crons_for_chat, webhooks_for_chat, cron_ids, webhook_ids) for Features 4b and
5 below.

#288 — /at command and run_once cron flag:
- new telegram/at_scheduler.py — module-level task-group + run_job holder;
  schedule_delayed_run(), cancel_pending_for_chat(), active_count();
  per-chat cap of 20 pending delays
- new telegram/commands/at.py — AtCommand backend, /at <duration> <prompt>
  with Ns/Nm/Nh suffixes, 60s-24h range
- /cancel integration via cancel_pending_for_chat()
- drain integration via active_count() in _drain_and_exit
- entry-point at = untether.telegram.commands.at:BACKEND
- CronConfig.run_once: bool = False; scheduler removes cron after fire
  if run_once=True; re-enters on reload/restart

#286 — unfreeze TelegramBridgeConfig:
- drop frozen=True (slots preserved); add update_from(settings) method
- route_update() reads cfg.allowed_user_ids live; handle_reload() calls
  update_from() and refreshes state.forward_coalesce_s / media_group_debounce_s
- restart-only keys still warn (bot_token, chat_id, session_mode, topics,
  message_overflow); others hot-reload

#271 — trigger visibility Tier 1:
- new triggers/describe.py — describe_cron(schedule, timezone) utility
- /ping shows per-chat trigger indicator when triggers target the chat
- RunContext.trigger_source field; dispatcher sets it to cron:<id>/webhook:<id>;
  runner_bridge seeds progress_tracker.meta['trigger'] with icon + source;
  ProgressTracker.note_event merges engine meta over dispatcher meta
- format_meta_line() appends 'trigger' to footer parts
- CommandContext gains trigger_manager, default_chat_id fields (default None);
  populated by telegram/commands/dispatch.py from cfg

#287 — graceful restart Tier 1:
- new sdnotify.py — stdlib sd_notify client (READY=1 / STOPPING=1);
  poll_updates sends READY=1 after _send_startup succeeds;
  _drain_and_exit sends STOPPING=1 at drain start
- new telegram/offset_persistence.py — DebouncedOffsetWriter; loads saved
  update_id on startup, persists via on_offset_advanced callback in
  poll_incoming; flushes in poll_updates finally block
- contrib/untether.service: Type=notify, NotifyAccess=main, RestartSec=2

Tests: +224 tests added across 6 new test files and 6 extended files;
2164 total tests pass with 81.55% coverage.

Context files (CLAUDE.md, .claude/rules/*) and human docs (README, triggers
reference, dev-instance, integration-testing, webhooks-and-cron how-to,
commands-and-directives) updated. rc4 integration test scenarios R1-R10
added to integration-testing.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent /at timers from firing after /cancel (CancelScope race)

anyio.CancelScope.__exit__ swallows the Cancelled exception when the
scope itself caused the cancellation. The fire/dispatch code outside
the scope continued regardless. Added cancelled_caught check after
the scope exits to prevent stale timers from dispatching.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add v0.35.1rc4 integration test plan

52-test plan covering all rc4 features: /at command, run_once,
hot-reload (triggers + bridge config), trigger visibility,
graceful restart Tier 1, plus standard Tier 1/6/7 regression.

Includes correct dev bot chat IDs (Bot API + Telethon MCP mapping),
pre-test trigger config, results template, and known caveats.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update integration test chat IDs to current dev bot

The old ut-dev-hf: chat IDs (5171122044 etc.) belong to a different
bot (ID 8485467124). Updated both docs to the current @untether_dev_bot
chats with both Telethon and Bot API ID forms. Added note about
Telegram MCP PeerUser fallback for channel/supergroup IDs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: comprehensive v0.35.1 documentation updates

HIGH priority:
- config.md: add run_once to cron table, fix watch_config description
  to list hot-reloadable vs restart-only settings
- operations.md: fix hot-reload section (transport settings ARE now
  partially hot-reloadable), add /ping trigger format, update_id
  persistence, systemd section with Type=notify/OOM notes
- schedule-tasks.md: add /at command section with examples, run_once
  mention

MEDIUM priority:
- triggers.md: remove duplicate hot-reload section, keep authoritative
  version with watch_config requirement and last_fired note
- CLAUDE.md: add diff_preview plan bypass (#283) to features list
- troubleshooting.md: add entries for config hot-reload issues and
  /at delay not firing

LOW priority:
- security.md: document untrusted-payload prefix for webhooks/cron
- voice-notes.md: note that voice settings hot-reload
- specification.md: bump version to v0.35.1
- tutorials: update version numbers from 0.35.0 to 0.35.1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant