Skip to content

Fix channel backpressure causing keepalive timeout and stream drops#399

Draft
joshkautz wants to merge 1 commit into
QuantumEntangledAndy:masterfrom
joshkautz:fix/channel-backpressure
Draft

Fix channel backpressure causing keepalive timeout and stream drops#399
joshkautz wants to merge 1 commit into
QuantumEntangledAndy:masterfrom
joshkautz:fix/channel-backpressure

Conversation

@joshkautz
Copy link
Copy Markdown

@joshkautz joshkautz commented Mar 3, 2026

Problem

When the internal mpsc message channels reach capacity, send().await blocks the entire message processing loop in Poller::run(). While blocked:

  1. Keepalive ping messages from the camera cannot be processed
  2. The camera interprets the silence as a dead connection
  3. The camera drops the session after its ping timeout expires
  4. Neolink enters a reconnection cycle

This is the root cause of streams dying after hours of continuous operation, which users work around with periodic cron restarts.

Additionally, when RTSP clients cannot consume frames fast enough, audio buffer overflow triggers FlowError::Flushing, which cascades into the video pipeline causing a full stream reconnection.

Fix

1. Increase channel capacities to reduce how often channels fill under normal load:

  • Outgoing message channel: 100 → 500
  • Poll command channel: 200 → 1000
  • Subscriber channels: 100 → 500

2. Non-blocking send when channel is full. When sender.capacity() == 0, use try_send() instead of the blocking send().await. Dropping a single video frame is far better than blocking the message loop and losing the entire camera connection. The normal async send path is preserved when capacity is available.

3. Drop audio frames on buffer overflow. When the audio AppSrc buffer exceeds 90% capacity, audio frames are silently dropped instead of being pushed. This prevents cascading backpressure from stalling the video pipeline. Audio buffer overflow was triggering pipeline flushing, which caused full stream reconnection cycles.

Changes

  • crates/core/src/bc_protocol/connection/bcconn.rs: Increase channel sizes, replace blocking send with try_send at capacity
  • src/rtsp/factory.rs: Add audio frame dropping when AppSrc buffer is near capacity

Relationship to Other PRs

The audio frame dropping in this PR is complementary to #394, which adds a config option to disable audio entirely. The two approaches serve different use cases:

  • feat: audio and long latency as config #394: Hard disable — audio pipeline is never constructed. Best for users who know they don't need audio and want to save CPU permanently.
  • This PR: Graceful degradation — audio passes through normally but is dropped when the buffer is under pressure. Prevents cascading failures without requiring any configuration change.

Both could be merged. Users who disable audio via #394 would never hit the audio drop path in this PR.

Related Issues

When the mpsc message channel reaches capacity, the blocking
send().await stalls the entire message processing loop. This prevents
keepalive ping responses from being processed, causing the camera to
interpret the silence as a disconnection and drop the session.

This manifests as:
- "Reaching limit of channel" warnings followed by stream death
- Streams dying after hours of continuous operation
- Users resorting to periodic cron restarts as a workaround

Three changes to fix this:

1. Increase channel capacities to reduce how often channels fill:
   - Outgoing message channel: 100 -> 500
   - Poll command channel: 200 -> 1000
   - Subscriber channels: 100 -> 500

2. Use non-blocking try_send when channel is at capacity. When the
   channel is full, dropping a video frame is far better than blocking
   the message loop and losing the entire camera connection. Normal
   async send is still used when there is available capacity.

3. Drop audio frames when the audio AppSrc buffer is near capacity
   (>90% full) instead of pushing them and causing cascading
   backpressure that stalls the video pipeline. Audio buffer overflow
   was triggering pipeline flushing, which caused full stream
   reconnection cycles.

Addresses QuantumEntangledAndy#349, QuantumEntangledAndy#346, QuantumEntangledAndy#315, and the channel limit warnings in QuantumEntangledAndy#366.
@joshkautz
Copy link
Copy Markdown
Author

Note: These are tweaks I ended up making for my own use case — a 24/7 ALPR system where streams need to stay up indefinitely and audio isn't critical. If the nature of these changes doesn't align with the trajectory or direction you want to take the project, feel free to close this — I'm simply opening it because they solved real stability issues for me and might help others in similar situations.

janost added a commit to MutuallyAssuredDeployment/neolink that referenced this pull request Mar 18, 2026
Upstream PR QuantumEntangledAndy#399. When mpsc channels reach capacity, the blocking send
prevented keepalive processing, causing camera session timeouts. Now uses
try_send when at capacity (drops one frame) instead of blocking. Also drops
audio frames when AppSrc buffer exceeds 90% to prevent cascading backpressure.
Channel capacities increased to reduce drop frequency.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
janost added a commit to MutuallyAssuredDeployment/neolink that referenced this pull request Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant