ref(batcher): Only flush the bucket that triggered the flush event #6168
2 issues
find-bugs: Found 2 issues (1 medium, 1 low)
Medium
kill() can lose buffered spans if flush loop exits before consuming the sentinel - `sentry_sdk/_span_batcher.py:113-119`
kill() sets self._running = False and then puts None on the flush queue. The flush loop only performs the shutdown flush as a side effect of consuming a queued trace_id (or the None sentinel) and calling _flush(). However, the loop's continuation is gated by while self._running: checked at the top of each iteration. If kill() sets _running = False after the loop has finished an iteration but before it re-enters get(), the loop exits without ever consuming the None and without calling _flush(), so any spans still in self._span_buffer are silently dropped on shutdown. The previous Event-based implementation in the parent Batcher.kill() relied on _flush_loop calling _flush() unconditionally each iteration, which avoided this hazard.
Also found at:
sentry_sdk/_span_batcher.py:58-73
Low
add() can enqueue many duplicate trace_ids while a bucket is awaiting flush - `sentry_sdk/_span_batcher.py:103-109`
Once size+1 >= MAX_BEFORE_FLUSH (or the byte threshold) is met, every subsequent span on the same trace_id (up to MAX_BEFORE_DROP) calls self._flush_queue.put(span.trace_id) again, because the bucket isn't drained until the flush loop consumes the entry. This can enqueue up to ~1000 duplicate entries per bucket; the flush loop processes each by calling _flush(trace_id) on an already-empty bucket, wasting wakeups and (more importantly) starving the time-based full flush check, since each successful get() in the loop runs before the time-check and can keep _last_full_flush from advancing under sustained load.
Duration: 1m 32s · Tokens: 171.3k in / 5.5k out · Cost: $1.22 (+merge: $0.00)
Annotations
Check warning on line 119 in sentry_sdk/_span_batcher.py
sentry-warden / warden: find-bugs
kill() can lose buffered spans if flush loop exits before consuming the sentinel
kill() sets self._running = False and then puts None on the flush queue. The flush loop only performs the shutdown flush as a side effect of consuming a queued trace_id (or the None sentinel) and calling _flush(). However, the loop's continuation is gated by `while self._running:` checked at the top of each iteration. If kill() sets _running = False after the loop has finished an iteration but before it re-enters get(), the loop exits without ever consuming the None and without calling _flush(), so any spans still in self._span_buffer are silently dropped on shutdown. The previous Event-based implementation in the parent Batcher.kill() relied on _flush_loop calling _flush() unconditionally each iteration, which avoided this hazard.
Check warning on line 73 in sentry_sdk/_span_batcher.py
sentry-warden / warden: find-bugs
[29F-LA3] kill() can lose buffered spans if flush loop exits before consuming the sentinel (additional location)
kill() sets self._running = False and then puts None on the flush queue. The flush loop only performs the shutdown flush as a side effect of consuming a queued trace_id (or the None sentinel) and calling _flush(). However, the loop's continuation is gated by `while self._running:` checked at the top of each iteration. If kill() sets _running = False after the loop has finished an iteration but before it re-enters get(), the loop exits without ever consuming the None and without calling _flush(), so any spans still in self._span_buffer are silently dropped on shutdown. The previous Event-based implementation in the parent Batcher.kill() relied on _flush_loop calling _flush() unconditionally each iteration, which avoided this hazard.