Skip to content

fix(queue): share one ioredis connection across BullMQ queues and workers#1009

Open
LorenzoGalassi wants to merge 1 commit into
Crosstalk-Solutions:devfrom
LorenzoGalassi:fix/885-shared-bullmq-redis-connection
Open

fix(queue): share one ioredis connection across BullMQ queues and workers#1009
LorenzoGalassi wants to merge 1 commit into
Crosstalk-Solutions:devfrom
LorenzoGalassi:fix/885-shared-bullmq-redis-connection

Conversation

@LorenzoGalassi

Copy link
Copy Markdown

Summary

Closes #885

config/queue.ts hands BullMQ a plain {host, port} connection object. BullMQ treats that as a recipe: every Queue and Worker instantiates its own ioredis client from it, and script commands executed against those clients can spawn further short-lived connections. As documented in #885, under sustained ZIM ingestion this leaks ~1 client/sec on the admin process until Redis maxclients (10k) is exhausted in ~2–3 hours, forcing periodic admin restarts.

This PR implements the fix direction suggested in the issue — the BullMQ-documented production pattern of passing a single shared ioredis instance so all Queues and Workers reuse one client pool:

  • admin/config/queue.ts — construct one shared Redis instance (preserving the REDIS_DB support from feat(config): respect REDIS_DB env var for queue and transmit #939) and export it as queueConfig.connection. maxRetriesPerRequest: null is set, as BullMQ requires for shared connections.
  • admin/package.json / lockfile — add ioredis@5.10.1 as a direct dependency (previously only transitive via bullmq; pinned to the version already in the lockfile).
  • admin/app/services/queue_service.ts — comment-only update: the singleton rationale from fix(queue): singleton QueueService to stop ioredis connection leak #877 still applies, but the per-Queue connection-count claim is no longer accurate with a shared instance.

No call sites change: QueueService and commands/queue/work.ts already read queueConfig.connection, so both the web process and the worker process pick up the shared instance automatically. Workers duplicate the connection once for their blocking client, which is expected and bounded (1 per worker).

Testing

  • npm run typecheck and eslint pass clean.

  • Synthetic steady-state test against a throwaway redis:7-alpine, simulating NOMAD's topology (7 queues + 7 workers at the same concurrency settings, sustained add + getJobCounts + getJobs dispatch for 45 s, sampling CLIENT LIST every 5 s):

    connection style clients at t=0 clients at t=45s
    {host, port} object (current) 15 22 (one per Queue + two per Worker)
    shared ioredis instance (this PR) 9 9 (flat — 1 shared + 1 blocking dup per worker)

    The post-fix steady state matches the ~20–30 client expectation from the issue once scaled to the full app. Caveat, in the interest of full transparency: the synthetic run did not reproduce the unbounded ~1 client/sec growth — that appears to need the real embed pipeline's dispatch pattern — so verification on a NOMAD instance under sustained multi-batch ZIM ingestion (the bug(queue): internal embed pipeline opens ~1 fresh ioredis client per second under sustained ingestion — distinct from #877's QueueService fix #885 repro: watch docker exec nomad_redis redis-cli CLIENT LIST | wc -l over ~10 min of embedding) would be a valuable confirmation before merge.

Suggested release note

Fixed a Redis connection leak during sustained Knowledge Base ingestion that could make the admin UI unresponsive after a few hours (#885).

…kers

BullMQ instantiates a fresh ioredis client per Queue/Worker when handed a
plain {host, port} config object, and under sustained ZIM ingestion the
embed pipeline leaked ~1 client/sec until Redis maxclients was exhausted.
Pass a single shared ioredis instance (maxRetriesPerRequest: null, as
required by BullMQ) so all queues and workers reuse one client pool.
Workers still duplicate the connection once for their blocking client,
which is expected and bounded.

Closes Crosstalk-Solutions#885
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant