fix(engine,sdk): improve trigger registration error visibility#1644
fix(engine,sdk): improve trigger registration error visibility#1644guibeira wants to merge 15 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughEngine now produces structured trigger registration errors and forwards failures to originator workers; Node, Python, and Rust SDKs consume TriggerRegistrationResult.error and log formatted diagnostics. Documentation and tests added or updated across engine and SDKs. ChangesTrigger Registration Error Reporting
Sequence DiagramsequenceDiagram
participant Worker
participant Engine
participant TriggerRegistry
participant OriginatorWorker
participant SDK
Worker->>Engine: RegisterTrigger
Engine->>TriggerRegistry: register_trigger()
alt Registry returns Err(RegisterTriggerError)
TriggerRegistry-->>Engine: Err(RegisterTriggerError)
Engine->>Engine: Map error -> ErrorBody
Engine-->>Worker: TriggerRegistrationResult (error)
Worker->>SDK: SDK receives TriggerRegistrationResult
SDK->>SDK: Log error (console.error / logging / tracing::error!)
alt trigger had originator
Engine-->>OriginatorWorker: Forward TriggerRegistrationResult (error)
OriginatorWorker->>SDK: SDK receives TriggerRegistrationResult
OriginatorWorker->>OriginatorWorker: Log error
end
else Registry returns Ok()
TriggerRegistry-->>Engine: Ok()
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
Replace anyhow::Error in TriggerRegistry::register_trigger with a structured RegisterTriggerError enum (UnknownBuiltin / Unknown / Other). Engine no longer swallows the registry error at the RegisterTrigger router arm — on failure it sends a TriggerRegistrationResult with an ErrorBody back to the worker that initiated the request. Built-in trigger types (http, cron, ...) include the "iii worker add <name>" install hint in the message.
The TriggerRegistrationResult router arm was a no-op, dropping errors that registrator workers (iii-http, iii-cron, ...) reported when they rejected a trigger. Engine now looks up the originating worker via Trigger.worker_id and forwards the result. Failed triggers are also removed from the registry so they don't accumulate. Successful results are not forwarded — they would flood the user worker with chatter and are not actionable.
Python SDK ignored inbound TRIGGER_REGISTRATION_RESULT messages. Add a handler in _handle_message that logs the engine's error body via the iii logger at ERROR level when error is present; no-op on success.
Rust SDK ignored inbound Message::TriggerRegistrationResult. Add an arm in handle_message that logs via tracing::error! when error is populated, no-op on success. Pulls in tracing-test for log capture in unit tests.
Node SDK ignored inbound TriggerRegistrationResult. Add an onMessage branch that routes to a new handler logging via console.error when error is populated, no-op on success. Tightens TriggerRegistrationResultMessage.error to ErrorBody and drops the unused result field.
a823df4 to
ace87fe
Compare
skill-check — docs1 verified, 277 skipped.
Three for three. Nicely done. |
…ger types When a registered trigger type is neither built-in nor active, the error message now points users to https://workers.iii.dev/ to find a worker that provides it. Built-in types keep their `iii worker add <name>` hint.
- Node: ISdk lives in ./types, not ./iii-types — fix test import so `tsc --noEmit` passes. - Rust: collapse `if let Some(err) = error` into outer match's `Some(err)` binding per `clippy::collapsible_match`.
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
sdk/packages/node/iii/tests/trigger-registration-error.test.ts (1)
12-21: ⚡ Quick winReplace fixed sleeps with event-driven waits to reduce test flakiness.
The
50ms/20mssleeps make these tests timing-sensitive on slower CI runners. Prefer waiting on explicit connection/message conditions.Refactor sketch
describe('trigger registration error surfacing', () => { let wss: WebSocketServer let url: string let sdk: ISdk | undefined let serverSocket: WebSocket | undefined + let connected: Promise<void> beforeEach(async () => { wss = new WebSocketServer({ port: 0 }) await new Promise<void>((resolve) => wss.once('listening', () => resolve())) const address = wss.address() as { port: number } url = `ws://127.0.0.1:${address.port}` serverSocket = undefined - wss.on('connection', (ws) => { - serverSocket = ws - ws.send(JSON.stringify({ type: 'workerregistered', worker_id: 'test-worker' })) - }) + connected = new Promise<void>((resolve) => { + wss.on('connection', (ws) => { + serverSocket = ws + ws.send(JSON.stringify({ type: 'workerregistered', worker_id: 'test-worker' })) + resolve() + }) + }) }) it('logs to console.error on TriggerRegistrationResult with error', async () => { const spy = vi.spyOn(console, 'error').mockImplementation(() => {}) sdk = registerWorker(url) - await new Promise((r) => setTimeout(r, 50)) + await connected @@ - await new Promise((r) => setTimeout(r, 20)) + await new Promise<void>((r) => setImmediate(r)) @@ it('does not log on TriggerRegistrationResult success (no error field)', async () => { @@ - await new Promise((r) => setTimeout(r, 50)) + await connected @@ - await new Promise((r) => setTimeout(r, 20)) + await new Promise<void>((r) => setImmediate(r))Also applies to: 32-33, 48-49, 60-61, 71-75
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@sdk/packages/node/iii/tests/trigger-registration-error.test.ts` around lines 12 - 21, The test uses fixed sleeps (50ms/20ms) causing flakiness; replace them with event-driven waits by awaiting connection/message events instead: in the beforeEach and other tests that reference wss, serverSocket, and ws.send, remove setTimeout/sleep calls and instead wait on the WebSocketServer 'connection' event or a Promise that resolves when serverSocket receives the expected message (use wss.once('connection', ...) or serverSocket.once('message', ...) as appropriate), and update assertions to run after those Promises resolve so tests proceed only after the actual connection/message is observed.engine/src/engine/mod.rs (1)
653-694: ⚡ Quick winDelete the duplicated
TriggerRegistrationResultblock.This second half is a copy of the first half. After the first half, success has already returned, and the error path has already removed the trigger, so this copy only adds redundant work and can emit a misleading
"unknown trigger"debug line after a legitimate forwarded failure.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@engine/src/engine/mod.rs` around lines 653 - 694, Remove the duplicated second block that repeats the TriggerRegistrationResult handling; after the first block already returns on success or removes the trigger on error, the duplicate code (the repeated use of trigger_registry.triggers.get(id), originator_id extraction, worker_registry.get_worker lookup, building Message::TriggerRegistrationResult and calling self.send_msg) should be deleted so only the original handling remains and no misleading "unknown trigger" debug path is executed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/0-11-0/architecture/trigger-types.mdx`:
- Line 55: Update the built-in trigger types line to align names with documented
types: remove or explicitly mark "subscribe" as an alias for
"durable:subscriber" if the engine supports that alias (mention both "subscribe"
and "durable:subscriber" and state the alias relationship in the text), clarify
whether error messages use the generic "stream" or the specific
"stream:join"/"stream:leave" (add a parenthetical note like "engine reports
'stream' (see stream:join/stream:leave)" or state the engine reports the
specific subtypes), and either add a pointer to the existing documentation for
"state" or add a brief "state" entry in Core Trigger Types so the list matches
the rest of the doc; update the sentence referencing `http`, `cron`,
`subscribe`, `state`, `durable:subscriber`, `stream`, `log` accordingly.
In `@engine/src/engine/mod.rs`:
- Around line 610-651: The code handling Message::TriggerRegistrationResult
should validate that the sender is the registrator of the stored trigger before
removing or forwarding: look up the stored trigger in
self.trigger_registry.triggers (the existing let Some(trigger_entry) binding)
and compare trigger_entry.worker_id to the originator/sender id (the worker.id
from the incoming message context) — only if they match should you remove the
trigger and forward a result; otherwise ignore/return. When forwarding, build
the Message::TriggerRegistrationResult using the canonical values from the
stored trigger_entry (trigger_entry.trigger_type, trigger_entry.function_id, and
any stored error state) instead of using trigger_type/function_id/error from the
inbound payload. Ensure you still fetch the originator via
self.worker_registry.get_worker(&originator_id) and call
self.send_msg(&originator, forward).await only after these validations.
- Around line 859-897: The metric is incremented regardless of register_trigger
outcome; move the
crate::workers::telemetry::collector::track_trigger_registered() call into the
Ok(()) branch so it only runs on successful registrations (i.e., after the
register_trigger(Trigger { ... }).await returns Ok(())). Keep the Err block
behavior unchanged (building error_body, sending
Message::TriggerRegistrationResult via send_msg) and ensure
track_trigger_registered() is not invoked after the Err arm.
In `@sdk/packages/node/iii/tests/trigger-registration-error.test.ts`:
- Around line 24-27: Teardown is not awaiting the async SDK shutdown which can
race with websocket server close; update the afterEach to await sdk.shutdown()
before closing the wss by calling and awaiting sdk?.shutdown?.() (or await
sdk.shutdown() if non-null) prior to awaiting new Promise that wraps
wss.close(), ensuring the async function sdk.shutdown is awaited to complete
cleanup.
In `@sdk/packages/python/iii/src/iii/iii.py`:
- Line 717: The fallback to data.get("type") is misleading; update the
assignment for trigger_type so it only reads data.get("trigger_type") with a
safe empty-string fallback (remove the data.get("type") alternative) — locate
the trigger_type assignment in iii.py (the line setting trigger_type) and change
it to use only the trigger_type key with an empty-string default.
---
Nitpick comments:
In `@engine/src/engine/mod.rs`:
- Around line 653-694: Remove the duplicated second block that repeats the
TriggerRegistrationResult handling; after the first block already returns on
success or removes the trigger on error, the duplicate code (the repeated use of
trigger_registry.triggers.get(id), originator_id extraction,
worker_registry.get_worker lookup, building Message::TriggerRegistrationResult
and calling self.send_msg) should be deleted so only the original handling
remains and no misleading "unknown trigger" debug path is executed.
In `@sdk/packages/node/iii/tests/trigger-registration-error.test.ts`:
- Around line 12-21: The test uses fixed sleeps (50ms/20ms) causing flakiness;
replace them with event-driven waits by awaiting connection/message events
instead: in the beforeEach and other tests that reference wss, serverSocket, and
ws.send, remove setTimeout/sleep calls and instead wait on the WebSocketServer
'connection' event or a Promise that resolves when serverSocket receives the
expected message (use wss.once('connection', ...) or
serverSocket.once('message', ...) as appropriate), and update assertions to run
after those Promises resolve so tests proceed only after the actual
connection/message is observed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: fb0b4cff-ba66-4e9e-b9b9-e55a176889dd
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
docs/0-11-0/architecture/trigger-types.mdxengine/src/engine/mod.rsengine/src/trigger.rssdk/packages/node/iii/src/iii-types.tssdk/packages/node/iii/src/iii.tssdk/packages/node/iii/tests/trigger-registration-error.test.tssdk/packages/python/iii/src/iii/iii.pysdk/packages/python/iii/tests/test_trigger_registration_error.pysdk/packages/rust/iii/Cargo.tomlsdk/packages/rust/iii/src/iii.rs
- engine/path B: validate that TriggerRegistrationResult is sent by the registrator worker that owns the trigger_type. Reject and ignore reports from non-registrators so a buggy/compromised worker cannot spoof failures and tear other workers' triggers out of the registry. Forwarded message is now built from the canonical stored trigger (id, trigger_type, function_id) rather than the inbound payload. - engine/path A: only increment `track_trigger_registered` on successful registration; failed registrations were skewing the metric. - docs: align built-in trigger type list with the engine's BUILTIN_TRIGGER_TYPES const (adds stream:join / stream:leave and notes that the engine reports the registered name verbatim). - node test: await `sdk.shutdown()` and restore mocks in teardown so the WebSocket server close does not race with pending SDK work.
…istration error log
Use data.get("trigger_type", "") instead of falling back to data.get("type"),
which returned the WS message type ("triggerregistrationresult") and produced
confusing error logs when trigger_type was absent.
Summary
When a worker registers a trigger that fails — either the
trigger_typeis unknown to the engine, or the registrator worker (e.g.iii-http) rejects it — the error now reaches the originating SDK and is logged atERRORlevel. Before this change both paths were silent: the engine swallowed the registry error and ignored the registrator'sTriggerRegistrationResult, so the user only saw "connection lost" or 404s with no hint that a worker was missing.register_triggernow returns a structuredRegisterTriggerError(UnknownBuiltin/Unknown/Other). TheMessage::RegisterTriggerrouter arm sends aTriggerRegistrationResultwith anErrorBodyback to the originating worker on failure. Built-in trigger types (http,cron,subscribe,state,durable:subscriber,stream,log) include theiii worker add <name>install hint.Message::TriggerRegistrationResultrouter arm forwards the result to the originating worker viaTrigger.worker_idlookup. Failed triggers are also removed from the registry. Successful results are not forwarded (chatter, not actionable).TriggerRegistrationResultwitherrortoconsole.error/log.error/tracing::error!. No new public API.docs/architecture/trigger-types.mdx.Demo
Engine started with a worker registry that only includes
iii-state(noiii-http, no custom workers).Built-in worker missing (
iii-http)Registering
type: 'http'— engine reports the missing built-in worker and the install command:Unknown non-built-in trigger type (
my-custom-trigger)Registering a made-up
type: 'my-custom-trigger'— engine points to the workers directory:Test Plan
cargo test -p iii --lib— engine: 1565 passed (includes new path A and path B coverage; replaced the existing no-opTriggerRegistrationResulttest with three forwarding tests).cargo test -p iii-sdk --lib— Rust SDK: 78 passed (includes new logging tests;tracing-testadded as a dev-dep for log capture).pnpm --filter iii-sdk test tests/trigger-registration-error.test.ts— Node SDK: 2 passed (console.erroron error, no log on success).iii-httpto confirm the user-facing log line — optional, unit tests cover the behavior.Notes
The plan document for this change is at
docs/superpowers/plans/2026-05-15-improve-trigger-registration-errors.md(untracked locally per user request).Summary by CodeRabbit
New Features
User-facing Behavior
Documentation
Tests