Skip to content

feat: event listeners on Worker, Queue, and Job#147

Merged
jotarios merged 3 commits into
mainfrom
feat/events-listeners
May 23, 2026
Merged

feat: event listeners on Worker, Queue, and Job#147
jotarios merged 3 commits into
mainfrom
feat/events-listeners

Conversation

@jotarios

Copy link
Copy Markdown
Owner

Summary

Close the BullMQ-parity gap for event-driven consumption. The engine already emits a cross-process events stream covering waiting / active / completed / failed / retry-scheduled / delayed / dlq / drained; this slice finishes the high-level shim surfaces on top so application code mirrors BullMQ exactly. Zero engine changes (empty git diff main -- chasquimq/src/).

Three new user-visible surfaces, mirrored across both shims:

  • Worker-level listeners. worker.on('drained' | 'paused' | 'resumed' | ...) on both shims. Node adds the three new events to the existing EventEmitter; Python gains a full .on() / .off() / .once() listener API alongside the existing native handler. 'drained' lazily spawns an embedded QueueEvents subscriber on the first listener attach (zero-cost when unused); 'paused' / 'resumed' fire from the local pause() / resume() methods. 'progress' / 'stalled' documented as accepted-but-no-op for parity (engine doesn't emit those transitions yet).
  • Queue-level listeners (QueueEvents). Per-id channels (active:<jobId> / completed:<jobId> / failed:<jobId>) emitted alongside the existing broadcast channels so targeted subscribers can listen on a single job without paying the O(N-listeners) broadcast dispatch cost. Python gains the .on() / .off() / .once() listener API alongside the existing async-iterator; sync and async callbacks both supported.
  • Job-level promise. Job.waitUntilFinished(queueEvents, ttl?) (Node) and job.wait_until_finished(queue_events, *, timeout=None) (Python). Event-driven (no polling), resolves with the stored handler result via Queue.getJobResult when storeResults=true, rejects with new Error(failedReason) on failed, throws new WaitUntilFinishedTimeoutError on ttl elapse.

Design notes

  • Why completed events don't carry the return value. BullMQ's Job.waitUntilFinished exposes the value because BullMQ JSON-encodes it onto the events stream. Re-implementing that on ChasquiMQ would either regress the msgpack wire-format invariant or force every subscriber to depend on msgpack. Composing event + Queue.getJobResult keeps the events stream small (1 MB result × 50 dashboards = 50 MB/job otherwise), keeps subscribers msgpack-free, and works without the result backend for "just detect completion" cases. Documented as a deliberate cross-shim contract.
  • Lost-event race. Events emitted before a subscriber's first XREAD BLOCK lands are missed. Mitigations: Node Worker.run() awaits a ready promise on the embedded drained subscriber before kicking the engine; Python QueueEvents exposes await events.wait_until_ready() so callers can gate on subscriber-is-listening.
  • Events-emit-before-result-write race. The engine emits completed events before the per-entry JOB_OK_SCRIPT writes the result key (events emit lives off the ack hot path). waitUntilFinished handles this with a short retry loop (10× 50ms) on the getJobResult fetch; falls through to undefined for workers running without storeResults=true.
  • Lazy subscriber lifecycle. Workers that never subscribe to 'drained' open zero extra Redis connections. The embedded subscriber uses blockingTimeout: 1000 (vs the QueueEvents 10s default) for snappy shutdown.
  • Cancellation is not failure. Python Worker does not fire 'failed' or 'error' on asyncio.CancelledError — cancellation is a control-flow signal (test teardown, shutdown), not a handler failure. Matches BullMQ.

Test plan

  • Engine workspace tests: cargo test --workspace -- --include-ignored against live Redis 8.6.2 → 356 passed (38 suites).
  • Node shim: npm test (vitest) → 194 passed | 1 skipped | 1 todo (185 before + 9 new in __test__/event-listeners.test.ts).
  • Python shim: pytest160 passed (149 before + 11 new in tests/test_event_listeners.py).
  • cargo fmt --all -- --check clean; cargo clippy --all-targets --workspace -- -D warnings clean (only pre-existing non-root profile warnings).
  • Engine source diff vs main is empty — pure shim work, no hot-path change.
  • Same-host bench (contended ~3-4 load): queue-add-bulk 185.7k → 186.9k (+0.7%, flat); worker-concurrent retried run shows branch 116.8k vs main 109.8k (+6.3%, contention-driven). Per CLAUDE.md host-load gate, an empty engine diff defends any small worker-concurrent movement against host noise.
  • Bug-hunt fix-pass: queue-mismatch guard on waitUntilFinished; cancellation-vs-failure distinction; sync .on() outside coroutine surfaces clear RuntimeError; lazy subscriber error-forwarding skips close-time races.

Docs updated symmetrically on both shim READMEs (Subscribing to events + Awaiting a single job's completion), Node + Python API reference, new concepts/events-and-listeners.md registered in site/astro.config.mjs, root README feature table, and docs/history.md.

jotarios added 3 commits May 23, 2026 02:24
Adds an EventEmitter-style listener surface across the Node shim:

- Worker emits ready / active / completed / failed / error / closing /
  closed / drained / paused / resumed. The drained subscriber is
  lazily wired the first time someone calls worker.on('drained', ...)
  and torn down on close(), so workers that never subscribe pay no
  extra Redis connections.
- QueueEvents now fans broadcast events onto per-id channels
  (active:<jobId>, completed:<jobId>, failed:<jobId>) so targeted
  subscribers (like Job.waitUntilFinished) avoid the O(N-listeners)
  broadcast dispatch cost.
- Job.waitUntilFinished(queueEvents, ttl?) — event-driven completion
  wait that subscribes to the per-id channels and resolves with the
  stored result (when storeResults: true) or undefined, or rejects
  with the engine failedReason. WaitUntilFinishedTimeoutError is a
  new public error type.
- progress / stalled listener names are accepted but currently no-op
  (engine doesn't emit those transitions yet). Listed in the docstring
  so subscriber code keeps type-checking.

Includes an integration test suite gated on REDIS_URL covering
drained, paused/resumed, per-id channels, waitUntilFinished
(completed / undefined / failed / timeout / queue mismatch).
Mirrors the Node shim's listener surface in Python:

- Worker emits ready / active / completed / failed / error / closing /
  closed / drained / paused / resumed. Sync and async def callbacks
  both work — async callbacks are scheduled on the running loop. The
  drained subscriber is lazily wired on the first on('drained', ...)
  call and torn down on close().
- QueueEvents gains an EventEmitter-style listener API alongside the
  existing async iterator (the two are mutually exclusive — they
  share the Redis connection but only one XREAD consumer at a time).
  Broadcast events plus per-id channels (active:<jobId> /
  completed:<jobId> / failed:<jobId>). wait_until_ready() lets
  callers deterministically gate on the subscriber's first XREAD
  BLOCK landing.
- Job.wait_until_finished(queue_events, timeout=...) — event-driven
  completion wait that returns the stored result (when
  store_results=True) or None, or raises with the engine
  failedReason. WaitUntilFinishedTimeoutError is a new public error.
- progress / stalled listener names are accepted but currently no-op
  (engine doesn't emit those transitions yet).

Includes an integration test suite gated on REDIS_URL covering
drained, paused/resumed, per-id channels, async/sync callbacks,
wait_until_finished (completed / None / failed / timeout /
queue mismatch), and CancelledError handling.
User-visible documentation for the event listener slice:

- README: adds event listeners and waitUntilFinished to the feature
  comparison table.
- Node + Python shim READMEs: new "Subscribing to events" and
  "Awaiting a single job's completion" sections, mirrored across
  shims (same headings, language-specific examples).
- Starlight site:
  - new concepts/events-and-listeners.md explaining the two layers
    (in-process Worker EventEmitter vs cross-process QueueEvents),
    per-id channels, the return-value choice, the two completion-
    wait helpers, the lazy-subscriber lifecycle, and the lost-event
    race.
  - reference/node-api.md and reference/python-api.md gain the new
    Worker event table, the QueueEvents listener API, per-id
    channels, and waitUntilFinished / wait_until_finished.
  - registered in astro.config.mjs sidebar under Concepts.
- docs/history.md: slice entry covering the engine surface and the
  cross-shim API.
@jotarios jotarios force-pushed the feat/events-listeners branch from e076435 to 6c950f4 Compare May 23, 2026 05:30
@jotarios jotarios changed the title feat: BullMQ-style event listeners on Worker, Queue, and Job feat: event listeners on Worker, Queue, and Job May 23, 2026
@jotarios jotarios merged commit 736325a into main May 23, 2026
26 checks passed
@jotarios jotarios deleted the feat/events-listeners branch May 23, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant