ref(batcher): Only flush the bucket that triggered the flush event #6168
2 issues
code-review: Found 3 issues (1 medium, 2 low)
Medium
kill() can race with _flush_loop and skip the final full flush - `sentry_sdk/_span_batcher.py:113-119`
kill() sets self._running = False before putting the sentinel None onto _flush_queue. If the flusher thread is between iterations (just finished a _flush call and about to re-evaluate the 'while self._running' condition) when kill() runs, it will observe _running=False and exit without ever consuming the None sentinel — meaning no final full flush is performed and any remaining buffered spans are dropped on shutdown. The previous Event-based design had the same shape, but moving to a queue does not fix this and the docstring/issue explicitly aims to preserve full-flush-on-shutdown semantics. Consider performing a final self._flush() inline in kill() (or before setting _running=False) to guarantee shutdown drainage.
Low
Flush queue can grow unboundedly under sustained high span volume - `sentry_sdk/_span_batcher.py:103-109`
Every span added past MAX_BEFORE_FLUSH (or exceeding MAX_BYTES_BEFORE_FLUSH) calls self._flush_queue.put(span.trace_id) on every subsequent add until the bucket is actually drained by the flusher thread. Under sustained load on a single trace, many duplicate trace_id entries can pile up in _flush_queue while the flusher processes one per loop iteration. After the first flush deletes the bucket, the remaining duplicates become no-ops in _flush, but the queue itself keeps growing memory until it catches up. Consider deduplicating (e.g., only put when not already pending) or using a set-based signal to bound memory.
Test coverage for shutdown/kill drainage of buckets is not evident in the diff
The PR changes shutdown semantics (kill() now uses a queue sentinel instead of an Event) and changes the flush loop to only flush on explicit triggers. The skill requires verifying that tests cover edge cases such as: kill() while buckets contain spans, multiple buckets receiving simultaneous threshold triggers, and the time-based full-flush path. The diff lists tests/tracing/test_span_buffer.py and test_span_streaming.py as updated, but reviewers should confirm these new behaviors are explicitly exercised, particularly that no spans are lost when kill() is called with non-empty buckets.
Duration: 1m 6s · Tokens: 177.4k in / 4.5k out · Cost: $1.14 (+merge: $0.00)
Annotations
Check warning on line 119 in sentry_sdk/_span_batcher.py
sentry-warden / warden: code-review
kill() can race with _flush_loop and skip the final full flush
kill() sets self._running = False before putting the sentinel None onto _flush_queue. If the flusher thread is between iterations (just finished a _flush call and about to re-evaluate the 'while self._running' condition) when kill() runs, it will observe _running=False and exit without ever consuming the None sentinel — meaning no final full flush is performed and any remaining buffered spans are dropped on shutdown. The previous Event-based design had the same shape, but moving to a queue does not fix this and the docstring/issue explicitly aims to preserve full-flush-on-shutdown semantics. Consider performing a final self._flush() inline in kill() (or before setting _running=False) to guarantee shutdown drainage.