Skip to content

fix(engine,sdk): improve trigger registration error visibility#1644

Open
guibeira wants to merge 15 commits into
mainfrom
feat/improve-erros-if-trigger-does-not-exists
Open

fix(engine,sdk): improve trigger registration error visibility#1644
guibeira wants to merge 15 commits into
mainfrom
feat/improve-erros-if-trigger-does-not-exists

Conversation

@guibeira
Copy link
Copy Markdown
Contributor

@guibeira guibeira commented May 15, 2026

Summary

When a worker registers a trigger that fails — either the trigger_type is unknown to the engine, or the registrator worker (e.g. iii-http) rejects it — the error now reaches the originating SDK and is logged at ERROR level. Before this change both paths were silent: the engine swallowed the registry error and ignored the registrator's TriggerRegistrationResult, so the user only saw "connection lost" or 404s with no hint that a worker was missing.

  • engine (path A): register_trigger now returns a structured RegisterTriggerError (UnknownBuiltin / Unknown / Other). The Message::RegisterTrigger router arm sends a TriggerRegistrationResult with an ErrorBody back to the originating worker on failure. Built-in trigger types (http, cron, subscribe, state, durable:subscriber, stream, log) include the iii worker add <name> install hint.
  • engine (path B): the Message::TriggerRegistrationResult router arm forwards the result to the originating worker via Trigger.worker_id lookup. Failed triggers are also removed from the registry. Successful results are not forwarded (chatter, not actionable).
  • SDKs (Node / Python / Rust): each routes inbound TriggerRegistrationResult with error to console.error / log.error / tracing::error!. No new public API.
  • docs: new "Registration errors" section in docs/architecture/trigger-types.mdx.

Demo

Engine started with a worker registry that only includes iii-state (no iii-http, no custom workers).

Built-in worker missing (iii-http)

Registering type: 'http' — engine reports the missing built-in worker and the install command:

trigger-http-error

[iii] Trigger registration failed for "…" (http): Trigger type "http" not found — worker iii-http is missing. Run: iii worker add iii-http

Unknown non-built-in trigger type (my-custom-trigger)

Registering a made-up type: 'my-custom-trigger' — engine points to the workers directory:

trigger-unknown-error

[iii] Trigger registration failed for "…" (my-custom-trigger): Trigger type "my-custom-trigger" not found. Search for a worker that provides this trigger type at https://workers.iii.dev/

Test Plan

  • cargo test -p iii --lib — engine: 1565 passed (includes new path A and path B coverage; replaced the existing no-op TriggerRegistrationResult test with three forwarding tests).
  • cargo test -p iii-sdk --lib — Rust SDK: 78 passed (includes new logging tests; tracing-test added as a dev-dep for log capture).
  • pnpm --filter iii-sdk test tests/trigger-registration-error.test.ts — Node SDK: 2 passed (console.error on error, no log on success).
  • Python SDK unit tests for the new branch: 2 passed; full suite still green for non-integration tests.
  • Manual smoke against an engine started without iii-http to confirm the user-facing log line — optional, unit tests cover the behavior.

Notes

The plan document for this change is at docs/superpowers/plans/2026-05-15-improve-trigger-registration-errors.md (untracked locally per user request).

Summary by CodeRabbit

  • New Features

    • Trigger registration failures are now returned to the originating worker as structured error results.
  • User-facing Behavior

    • Clients log actionable registration error messages including trigger id, trigger type, error code/message, and suggested remediation (e.g., install hint). Successful registrations do not emit these error logs; unrelated or spoofed results are ignored.
  • Documentation

    • Added "Registration errors" section with examples and SDK logging guidance.
  • Tests

    • Added tests for error forwarding, suppression on success, and client logging.

Review Change Stack

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
iii-website Ready Ready Preview, Comment May 22, 2026 5:39pm

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e4d06495-2cf2-4983-a4b4-306c90ad9e8a

📥 Commits

Reviewing files that changed from the base of the PR and between a240195 and f140f1d.

📒 Files selected for processing (1)
  • sdk/packages/python/iii/src/iii/iii.py

📝 Walkthrough

Walkthrough

Engine now produces structured trigger registration errors and forwards failures to originator workers; Node, Python, and Rust SDKs consume TriggerRegistrationResult.error and log formatted diagnostics. Documentation and tests added or updated across engine and SDKs.

Changes

Trigger Registration Error Reporting

Layer / File(s) Summary
Engine error type contract
engine/src/trigger.rs
RegisterTriggerError enum distinguishes unknown built-in triggers (with installation hints), fully unknown triggers, and other failures; TriggerRegistry::register_trigger now returns this typed error and tests updated.
Engine registration and result handling
engine/src/engine/mod.rs (logic & tests)
RegisterTrigger maps typed errors to structured ErrorBody and sends TriggerRegistrationResult to the registering worker on failure; TriggerRegistrationResult handling validates sender, forwards error results to originator workers, removes failed triggers, and tests cover forwarding, cleanup, spoofing, noops, and error message content.
Node SDK error handling
sdk/packages/node/iii/src/iii-types.ts, sdk/packages/node/iii/src/iii.ts, sdk/packages/node/iii/tests/trigger-registration-error.test.ts
Adds ErrorBody type, updates TriggerRegistrationResultMessage, implements onTriggerRegistrationResult to log errors to console.error, and tests both error and success scenarios.
Python SDK error handling
sdk/packages/python/iii/src/iii/iii.py, sdk/packages/python/iii/tests/test_trigger_registration_error.py
Routes TRIGGER_REGISTRATION_RESULT to _handle_trigger_registration_result, logs ERROR with trigger id/type and message on failures, and adds tests asserting logged output and silent success.
Rust SDK error handling
sdk/packages/rust/iii/Cargo.toml, sdk/packages/rust/iii/src/iii.rs
Adds tracing-test dev-dependency, matches Message::TriggerRegistrationResult to log via tracing::error! when error present, and adds traced tests asserting logging behavior.
Documentation
docs/0-11-0/architecture/trigger-types.mdx
New “Registration errors” subsection describing TriggerRegistrationResult.error, when errors are emitted, example log lines for builtin vs unknown types, and per-SDK logging targets.

Sequence Diagram

sequenceDiagram
  participant Worker
  participant Engine
  participant TriggerRegistry
  participant OriginatorWorker
  participant SDK
  Worker->>Engine: RegisterTrigger
  Engine->>TriggerRegistry: register_trigger()
  alt Registry returns Err(RegisterTriggerError)
    TriggerRegistry-->>Engine: Err(RegisterTriggerError)
    Engine->>Engine: Map error -> ErrorBody
    Engine-->>Worker: TriggerRegistrationResult (error)
    Worker->>SDK: SDK receives TriggerRegistrationResult
    SDK->>SDK: Log error (console.error / logging / tracing::error!)
    alt trigger had originator
      Engine-->>OriginatorWorker: Forward TriggerRegistrationResult (error)
      OriginatorWorker->>SDK: SDK receives TriggerRegistrationResult
      OriginatorWorker->>OriginatorWorker: Log error
    end
  else Registry returns Ok()
    TriggerRegistry-->>Engine: Ok()
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • iii-hq/iii#107: Both PRs modify TriggerRegistry::register_trigger error handling.
  • iii-hq/iii#1454: Prior work adding built-in trigger install-hint errors referenced by this PR.

Suggested reviewers

  • sergiofilhowz
  • anthonyiscoding
  • andersonleal

Poem

🐰 I nudged a trigger, heard a ping,

The engine whispered why it failed,
A hint to add the missing thing,
SDKs logged the tale that trailed,
Now errors hop, clearly unveiled.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(engine,sdk): improve trigger registration error visibility' directly summarizes the main change: improving visibility of trigger registration errors across engine and SDKs.
Description check ✅ Passed The description comprehensively covers all required template sections: What (error visibility improvements with structured error handling), Why (errors were previously silent), and Notes (breaking changes, test coverage, optional manual verification).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/improve-erros-if-trigger-does-not-exists

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mintlify
Copy link
Copy Markdown

mintlify Bot commented May 15, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
iii 🟢 Ready View Preview May 15, 2026, 2:38 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

guibeira added 6 commits May 15, 2026 16:26
Replace anyhow::Error in TriggerRegistry::register_trigger with a structured
RegisterTriggerError enum (UnknownBuiltin / Unknown / Other). Engine no longer
swallows the registry error at the RegisterTrigger router arm — on failure it
sends a TriggerRegistrationResult with an ErrorBody back to the worker that
initiated the request. Built-in trigger types (http, cron, ...) include the
"iii worker add <name>" install hint in the message.
The TriggerRegistrationResult router arm was a no-op, dropping errors that
registrator workers (iii-http, iii-cron, ...) reported when they rejected a
trigger. Engine now looks up the originating worker via Trigger.worker_id
and forwards the result. Failed triggers are also removed from the registry
so they don't accumulate. Successful results are not forwarded — they would
flood the user worker with chatter and are not actionable.
Python SDK ignored inbound TRIGGER_REGISTRATION_RESULT messages. Add a
handler in _handle_message that logs the engine's error body via the iii
logger at ERROR level when error is present; no-op on success.
Rust SDK ignored inbound Message::TriggerRegistrationResult. Add an arm in
handle_message that logs via tracing::error! when error is populated, no-op
on success. Pulls in tracing-test for log capture in unit tests.
Node SDK ignored inbound TriggerRegistrationResult. Add an onMessage branch
that routes to a new handler logging via console.error when error is
populated, no-op on success. Tightens TriggerRegistrationResultMessage.error
to ErrorBody and drops the unused result field.
@guibeira guibeira force-pushed the feat/improve-erros-if-trigger-does-not-exists branch from a823df4 to ace87fe Compare May 15, 2026 19:26
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

skill-check — docs

1 verified, 277 skipped.

Layer Result
structure
vale
ai

Three for three. Nicely done.

…ger types

When a registered trigger type is neither built-in nor active, the error
message now points users to https://workers.iii.dev/ to find a worker that
provides it. Built-in types keep their `iii worker add <name>` hint.
- Node: ISdk lives in ./types, not ./iii-types — fix test import so `tsc --noEmit` passes.
- Rust: collapse `if let Some(err) = error` into outer match's `Some(err)` binding per `clippy::collapsible_match`.
@guibeira guibeira marked this pull request as ready for review May 18, 2026 12:17
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
sdk/packages/node/iii/tests/trigger-registration-error.test.ts (1)

12-21: ⚡ Quick win

Replace fixed sleeps with event-driven waits to reduce test flakiness.

The 50ms/20ms sleeps make these tests timing-sensitive on slower CI runners. Prefer waiting on explicit connection/message conditions.

Refactor sketch
 describe('trigger registration error surfacing', () => {
   let wss: WebSocketServer
   let url: string
   let sdk: ISdk | undefined
   let serverSocket: WebSocket | undefined
+  let connected: Promise<void>

   beforeEach(async () => {
     wss = new WebSocketServer({ port: 0 })
     await new Promise<void>((resolve) => wss.once('listening', () => resolve()))
     const address = wss.address() as { port: number }
     url = `ws://127.0.0.1:${address.port}`
     serverSocket = undefined
-    wss.on('connection', (ws) => {
-      serverSocket = ws
-      ws.send(JSON.stringify({ type: 'workerregistered', worker_id: 'test-worker' }))
-    })
+    connected = new Promise<void>((resolve) => {
+      wss.on('connection', (ws) => {
+        serverSocket = ws
+        ws.send(JSON.stringify({ type: 'workerregistered', worker_id: 'test-worker' }))
+        resolve()
+      })
+    })
   })

   it('logs to console.error on TriggerRegistrationResult with error', async () => {
     const spy = vi.spyOn(console, 'error').mockImplementation(() => {})
     sdk = registerWorker(url)
-    await new Promise((r) => setTimeout(r, 50))
+    await connected
@@
-    await new Promise((r) => setTimeout(r, 20))
+    await new Promise<void>((r) => setImmediate(r))
@@
   it('does not log on TriggerRegistrationResult success (no error field)', async () => {
@@
-    await new Promise((r) => setTimeout(r, 50))
+    await connected
@@
-    await new Promise((r) => setTimeout(r, 20))
+    await new Promise<void>((r) => setImmediate(r))

Also applies to: 32-33, 48-49, 60-61, 71-75

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/packages/node/iii/tests/trigger-registration-error.test.ts` around lines
12 - 21, The test uses fixed sleeps (50ms/20ms) causing flakiness; replace them
with event-driven waits by awaiting connection/message events instead: in the
beforeEach and other tests that reference wss, serverSocket, and ws.send, remove
setTimeout/sleep calls and instead wait on the WebSocketServer 'connection'
event or a Promise that resolves when serverSocket receives the expected message
(use wss.once('connection', ...) or serverSocket.once('message', ...) as
appropriate), and update assertions to run after those Promises resolve so tests
proceed only after the actual connection/message is observed.
engine/src/engine/mod.rs (1)

653-694: ⚡ Quick win

Delete the duplicated TriggerRegistrationResult block.

This second half is a copy of the first half. After the first half, success has already returned, and the error path has already removed the trigger, so this copy only adds redundant work and can emit a misleading "unknown trigger" debug line after a legitimate forwarded failure.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@engine/src/engine/mod.rs` around lines 653 - 694, Remove the duplicated
second block that repeats the TriggerRegistrationResult handling; after the
first block already returns on success or removes the trigger on error, the
duplicate code (the repeated use of trigger_registry.triggers.get(id),
originator_id extraction, worker_registry.get_worker lookup, building
Message::TriggerRegistrationResult and calling self.send_msg) should be deleted
so only the original handling remains and no misleading "unknown trigger" debug
path is executed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/0-11-0/architecture/trigger-types.mdx`:
- Line 55: Update the built-in trigger types line to align names with documented
types: remove or explicitly mark "subscribe" as an alias for
"durable:subscriber" if the engine supports that alias (mention both "subscribe"
and "durable:subscriber" and state the alias relationship in the text), clarify
whether error messages use the generic "stream" or the specific
"stream:join"/"stream:leave" (add a parenthetical note like "engine reports
'stream' (see stream:join/stream:leave)" or state the engine reports the
specific subtypes), and either add a pointer to the existing documentation for
"state" or add a brief "state" entry in Core Trigger Types so the list matches
the rest of the doc; update the sentence referencing `http`, `cron`,
`subscribe`, `state`, `durable:subscriber`, `stream`, `log` accordingly.

In `@engine/src/engine/mod.rs`:
- Around line 610-651: The code handling Message::TriggerRegistrationResult
should validate that the sender is the registrator of the stored trigger before
removing or forwarding: look up the stored trigger in
self.trigger_registry.triggers (the existing let Some(trigger_entry) binding)
and compare trigger_entry.worker_id to the originator/sender id (the worker.id
from the incoming message context) — only if they match should you remove the
trigger and forward a result; otherwise ignore/return. When forwarding, build
the Message::TriggerRegistrationResult using the canonical values from the
stored trigger_entry (trigger_entry.trigger_type, trigger_entry.function_id, and
any stored error state) instead of using trigger_type/function_id/error from the
inbound payload. Ensure you still fetch the originator via
self.worker_registry.get_worker(&originator_id) and call
self.send_msg(&originator, forward).await only after these validations.
- Around line 859-897: The metric is incremented regardless of register_trigger
outcome; move the
crate::workers::telemetry::collector::track_trigger_registered() call into the
Ok(()) branch so it only runs on successful registrations (i.e., after the
register_trigger(Trigger { ... }).await returns Ok(())). Keep the Err block
behavior unchanged (building error_body, sending
Message::TriggerRegistrationResult via send_msg) and ensure
track_trigger_registered() is not invoked after the Err arm.

In `@sdk/packages/node/iii/tests/trigger-registration-error.test.ts`:
- Around line 24-27: Teardown is not awaiting the async SDK shutdown which can
race with websocket server close; update the afterEach to await sdk.shutdown()
before closing the wss by calling and awaiting sdk?.shutdown?.() (or await
sdk.shutdown() if non-null) prior to awaiting new Promise that wraps
wss.close(), ensuring the async function sdk.shutdown is awaited to complete
cleanup.

In `@sdk/packages/python/iii/src/iii/iii.py`:
- Line 717: The fallback to data.get("type") is misleading; update the
assignment for trigger_type so it only reads data.get("trigger_type") with a
safe empty-string fallback (remove the data.get("type") alternative) — locate
the trigger_type assignment in iii.py (the line setting trigger_type) and change
it to use only the trigger_type key with an empty-string default.

---

Nitpick comments:
In `@engine/src/engine/mod.rs`:
- Around line 653-694: Remove the duplicated second block that repeats the
TriggerRegistrationResult handling; after the first block already returns on
success or removes the trigger on error, the duplicate code (the repeated use of
trigger_registry.triggers.get(id), originator_id extraction,
worker_registry.get_worker lookup, building Message::TriggerRegistrationResult
and calling self.send_msg) should be deleted so only the original handling
remains and no misleading "unknown trigger" debug path is executed.

In `@sdk/packages/node/iii/tests/trigger-registration-error.test.ts`:
- Around line 12-21: The test uses fixed sleeps (50ms/20ms) causing flakiness;
replace them with event-driven waits by awaiting connection/message events
instead: in the beforeEach and other tests that reference wss, serverSocket, and
ws.send, remove setTimeout/sleep calls and instead wait on the WebSocketServer
'connection' event or a Promise that resolves when serverSocket receives the
expected message (use wss.once('connection', ...) or
serverSocket.once('message', ...) as appropriate), and update assertions to run
after those Promises resolve so tests proceed only after the actual
connection/message is observed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: fb0b4cff-ba66-4e9e-b9b9-e55a176889dd

📥 Commits

Reviewing files that changed from the base of the PR and between 2cae066 and 0ee2dc1.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • docs/0-11-0/architecture/trigger-types.mdx
  • engine/src/engine/mod.rs
  • engine/src/trigger.rs
  • sdk/packages/node/iii/src/iii-types.ts
  • sdk/packages/node/iii/src/iii.ts
  • sdk/packages/node/iii/tests/trigger-registration-error.test.ts
  • sdk/packages/python/iii/src/iii/iii.py
  • sdk/packages/python/iii/tests/test_trigger_registration_error.py
  • sdk/packages/rust/iii/Cargo.toml
  • sdk/packages/rust/iii/src/iii.rs

Comment thread docs/0-11-0/architecture/trigger-types.mdx Outdated
Comment thread engine/src/engine/mod.rs Outdated
Comment thread engine/src/engine/mod.rs Outdated
Comment thread sdk/packages/node/iii/tests/trigger-registration-error.test.ts
Comment thread sdk/packages/python/iii/src/iii/iii.py Outdated
- engine/path B: validate that TriggerRegistrationResult is sent by the
  registrator worker that owns the trigger_type. Reject and ignore reports
  from non-registrators so a buggy/compromised worker cannot spoof failures
  and tear other workers' triggers out of the registry. Forwarded message
  is now built from the canonical stored trigger (id, trigger_type,
  function_id) rather than the inbound payload.
- engine/path A: only increment `track_trigger_registered` on successful
  registration; failed registrations were skewing the metric.
- docs: align built-in trigger type list with the engine's
  BUILTIN_TRIGGER_TYPES const (adds stream:join / stream:leave and notes
  that the engine reports the registered name verbatim).
- node test: await `sdk.shutdown()` and restore mocks in teardown so the
  WebSocket server close does not race with pending SDK work.
…istration error log

Use data.get("trigger_type", "") instead of falling back to data.get("type"),
which returned the WS message type ("triggerregistrationresult") and produced
confusing error logs when trigger_type was absent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant