ref(batcher): Only flush the bucket that triggered the flush event #6168

4 issues

Medium

_flush_queue grows unboundedly when same trace_id repeatedly hits flush threshold - `sentry_sdk/_span_batcher.py:104-109`

In add(), every span that pushes a bucket past MAX_BEFORE_FLUSH or MAX_BYTES_BEFORE_FLUSH enqueues span.trace_id onto the unbounded _flush_queue (default maxsize=0). For a hot trace producing many spans before the flusher wakes, the same trace_id is appended repeatedly. After the bucket is flushed, the queue still contains the stale duplicates, which the flusher will dequeue and call _flush(trace_id=...) on—each one re-acquires the lock and no-ops, but the queue itself can accumulate without bound under load. The author's own # XXX remove trace_id from queue comment in _flush confirms this is a known gap. Under sustained high span throughput on a single trace this is an O(n) memory leak in the flush queue that lives until process exit.

Low

kill() does not join flusher thread, so pending spans may be lost on shutdown - `sentry_sdk/_span_batcher.py:113-119`

kill() sets _running = False, puts None on the queue, and immediately sets self._flusher = None. There is no self._flusher.join() to wait for the flusher to drain remaining buckets. While the base-class behavior was similar, the new code relies on the queued None triggering a full _flush() in the loop; if the interpreter is shutting down at the same time, the daemon thread can be killed before that flush completes, dropping spans. This is a backwards-compatibility-preserving change rather than a regression, but worth confirming against the test in test_span_buffer.py.

_flush_queue grows with duplicate trace_ids while bucket is awaiting flush - `sentry_sdk/_span_batcher.py:103-109`

In add(), every span added to a bucket that already exceeds MAX_BEFORE_FLUSH (or MAX_BYTES_BEFORE_FLUSH) puts another copy of the same trace_id onto _flush_queue. Until the bucket is actually drained by the flusher, up to (MAX_BEFORE_DROP - MAX_BEFORE_FLUSH) = 1000 redundant entries can accumulate per trace, multiplied across concurrent traces. The flusher then performs many no-op _flush() calls (each acquiring the lock) for buckets that have already been flushed. The 'XXX remove trace_id from queue' comment in _flush() acknowledges the missing dedup. Impact: wasted CPU/lock contention and unbounded queue growth proportional to span volume; not a correctness bug.

kill() does not join the flusher thread, risking lost spans on shutdown - `sentry_sdk/_span_batcher.py:113-119`

kill() sets _running=False, puts None on _flush_queue (which causes _flush_loop to call _flush(trace_id=None), a full flush), and immediately sets self._flusher=None without join()ing. If the interpreter shuts down before the daemon flusher thread finishes its final full flush and the resulting envelope is captured/transported, in-flight spans can be lost. The parent Batcher.kill() has the same shape, but here the final flush is more expensive (multiple envelopes per trace) so the window is larger.

4 skills analyzed

Skill	Findings	Duration	Cost
code-review	2	1m 16s	$1.12
find-bugs	2	1m 37s	$1.22
skill-scanner	0	23.6s	$0.52
security-review	0	11.2s	$0.66

Duration: 3m 27s · Tokens: 541.8k in / 9.2k out · Cost: $3.52 (+merge: $0.00, +dedup: $0.00)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ref(batcher): Only flush the bucket that triggered the flush event #6168

Uh oh!

Uh oh!

ref(batcher): Only flush the bucket that triggered the flush event #6168

Uh oh!

4 issues

Medium

Low

Re-running checks...

ref(batcher): Only flush the bucket that triggered the flush event #6168

Uh oh!

Merge branch 'master' into ivana/batcher-flush-by-bucket

Uh oh!

ref(batcher): Only flush the bucket that triggered the flush event #6168

Uh oh!

4 issues

Medium

Low

Re-running checks...