feat: add sqla broker #2704

Arseniy-Popov · 2025-12-02T21:29:05Z

Description

Work-in-progress for a relational database based broker.
Fixes #799

Motivation

The primary benefit of a message queue built on top of a relational database is the ability to insert messages transactionally, atomically with other database operations, thus enabling the transactional outbox pattern. Also, the relational database is usually the most readiliy available, already provisioned piece of infrastructure for a given service. While implementing all patterns / semantics of full-blown message queue / streaming platform (e.g. kafka-like partitions-enabled horizontal scaling with local ordering, etc.) would be problematic, given a proper understanding of the trade-offs involved, the relational database based queue would be an appropriate tool for many low-to-medium throughput, latency-tolerant uses, including as parts of a larger messaging flow that involves a "proper" queue (e.g. as transactional layer between a service and a queue).

Design

The key components are in message.py, usecase.py, client.py.

Flow:

On start, the subscriber spawns four types of concurrent loops:

Fetch loop: Periodically fetches PENDING or RETRYABLE messages from the database, simultaneously updating them in the database: marking as PROCESSING, setting acquired_at to now, and incrementing attempts_count. Only messages with next_attempt_at <= now are fetched, ordered by next_attempt_at. The fetched messages are placed into an internal queue. The fetch limit is the minimum of fetch_batch_size and the free buffer capacity (fetch_batch_size * overfetch_factor minus currently queued messages). If the last fetch was "full" (returned as many messages as the limit), the next fetch happens after min_fetch_interval; otherwise after max_fetch_interval.
Worker loops (max_workers instances): Each worker takes a message from the internal queue and checks if the attempt is allowed by the retry_strategy. If allowed, the message is processed, if not, Reject'ed. Depending on the processing result, AckPolicy, and manual Ack/Nack/Reject, the message is Ack'ed, Nack'ed, or Reject'ed. For Nack'ed messages the retry_strategy is consulted to determine if and when the message might be retried. If allowed to be retried, the message is marked as RETRYABLE, otherwise as FAILED. Ack'ed messages are marked as COMPLETED and Reject'ed messages are marked as FAILED. The message is then buffered for flushing.
Flush loop: Periodically flushes the buffered message state changes to the database. COMPLETED and FAILED messages are moved from the primary table to the archive table. The state of RETRYABLE messages is updated in the primary table.
Release stuck loop: Periodically releases messages that have been stuck in PROCESSING state for longer than release_stuck_timeout since acquired_at. These messages are marked back as PENDING.

On stop, all loops are gracefully stopped. Messages that have been acquired but are not yet being processed are drained from the internal queue and marked back as PENDING. The subscriber waits for all tasks to complete within graceful_shutdown_timeout, then performs a final flush.

Notes:

This design allows for work sharing between processes/nodes because "SELECT FOR UPDATE SKIP LOCKED" is utilized.

This design adheres to the "at least once" processing guarantee because flushing changes to the database happens only after a processing attempt. Messages might be processed more times than allowed by the retry_strategy if, among other things, the flush doesn't happen due to crash or failure after a message is processed.

This design handles the poison message problem (messages that crash the worker without the ability to catch the exception due to e.g. OOM terminations) because attempts_count is incremented and retry_strategy is consulted with prior to processing attempt.

Why not use LISTEN/NOTIFY? It is specific to Postgres, while it is preferable to start with functionality universal to any database. When using multiple nodes/processes, distributing messages among them would still require "SELECT FOR UPDATE SKIP LOCKED", because the notification will be delivered to all nodes/processes. A notification may also fail to arrive, especially if a node restarts. That is, polling is needed in any case. And once polling is in place, listen/notify can be integrated to “wake up” the polling loop earlier than as per the interval-based schedule.

Type of change

Please delete options that are not relevant.

Documentation (typos, code examples, or any documentation updates)
Bug fix (a non-breaking change that resolves an issue)
New feature (a non-breaking change that adds functionality)
Breaking change (a fix or feature that would disrupt existing functionality)
This change requires a documentation update

Checklist

My code adheres to the style guidelines of this project (just lint shows no errors)
I have conducted a self-review of my own code
I have made the necessary changes to the documentation
My changes do not generate any new warnings
I have added tests to validate the effectiveness of my fix or the functionality of my new feature
Both new and existing unit tests pass successfully on my local environment by running just test-coverage
I have ensured that static analysis tests are passing by running just static-analysis
I have included code examples to illustrate the modifications

CLAassistant · 2025-12-02T21:29:13Z

All committers have signed the CLA.

github-actions bot added the dependencies Pull requests that update a dependency file label Dec 2, 2025

Arseniy-Popov force-pushed the feat/sqla-broker branch 3 times, most recently from 9b0d481 to 90721ee Compare December 3, 2025 15:56

feat: add sqla broker

d7a3c4a

Arseniy-Popov force-pushed the feat/sqla-broker branch from 90721ee to d7a3c4a Compare December 3, 2025 16:12

Arseniy-Popov added 3 commits December 3, 2025 20:19

feat: add sqla broker

bbac874

feat: add sqla broker

fb3a9b9

feat: add sqla broker

9a6b866

Arseniy-Popov force-pushed the feat/sqla-broker branch from e283a39 to 9a6b866 Compare December 4, 2025 21:48

feat: add sqla broker

29b80f8

Arseniy-Popov force-pushed the feat/sqla-broker branch from de7bc0d to 29b80f8 Compare December 5, 2025 19:31

Arseniy-Popov added 6 commits December 6, 2025 00:52

feat: add sqla broker

ba5daad

feat: add sqla broker

4dc0532

feat: add sqla broker

c9ec348

feat: add sqla broker

35a3b48

feat: add sqla broker

96de0cd

feat: add sqla broker

a2cde27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add sqla broker #2704

feat: add sqla broker #2704

Uh oh!

Arseniy-Popov commented Dec 2, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Dec 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add sqla broker #2704

Are you sure you want to change the base?

feat: add sqla broker #2704

Uh oh!

Conversation

Arseniy-Popov commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation

Design

Flow:

Notes:

Type of change

Checklist

Uh oh!

CLAassistant commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Arseniy-Popov commented Dec 2, 2025 •

edited

Loading

CLAassistant commented Dec 2, 2025 •

edited

Loading