ref(batcher): Only flush the bucket that triggered the flush event #6168
2 issues
code-review: Found 2 issues (1 medium, 1 low)
Medium
_flush_queue grows unboundedly when same trace_id repeatedly hits flush threshold - `sentry_sdk/_span_batcher.py:104-109`
In add(), every span that pushes a bucket past MAX_BEFORE_FLUSH or MAX_BYTES_BEFORE_FLUSH enqueues span.trace_id onto the unbounded _flush_queue (default maxsize=0). For a hot trace producing many spans before the flusher wakes, the same trace_id is appended repeatedly. After the bucket is flushed, the queue still contains the stale duplicates, which the flusher will dequeue and call _flush(trace_id=...) on—each one re-acquires the lock and no-ops, but the queue itself can accumulate without bound under load. The author's own # XXX remove trace_id from queue comment in _flush confirms this is a known gap. Under sustained high span throughput on a single trace this is an O(n) memory leak in the flush queue that lives until process exit.
Low
kill() does not join flusher thread, so pending spans may be lost on shutdown - `sentry_sdk/_span_batcher.py:113-119`
kill() sets _running = False, puts None on the queue, and immediately sets self._flusher = None. There is no self._flusher.join() to wait for the flusher to drain remaining buckets. While the base-class behavior was similar, the new code relies on the queued None triggering a full _flush() in the loop; if the interpreter is shutting down at the same time, the daemon thread can be killed before that flush completes, dropping spans. This is a backwards-compatibility-preserving change rather than a regression, but worth confirming against the test in test_span_buffer.py.
Duration: 1m 16s · Tokens: 167.5k in / 3.6k out · Cost: $1.12 (+merge: $0.00)
Annotations
Check warning on line 109 in sentry_sdk/_span_batcher.py
sentry-warden / warden: code-review
_flush_queue grows unboundedly when same trace_id repeatedly hits flush threshold
In `add()`, every span that pushes a bucket past `MAX_BEFORE_FLUSH` or `MAX_BYTES_BEFORE_FLUSH` enqueues `span.trace_id` onto the unbounded `_flush_queue` (default `maxsize=0`). For a hot trace producing many spans before the flusher wakes, the same `trace_id` is appended repeatedly. After the bucket is flushed, the queue still contains the stale duplicates, which the flusher will dequeue and call `_flush(trace_id=...)` on—each one re-acquires the lock and no-ops, but the queue itself can accumulate without bound under load. The author's own `# XXX remove trace_id from queue` comment in `_flush` confirms this is a known gap. Under sustained high span throughput on a single trace this is an O(n) memory leak in the flush queue that lives until process exit.