Fix resource exhaustion from per-client GStreamer pipelines#400
Fix resource exhaustion from per-client GStreamer pipelines#400joshkautz wants to merge 1 commit into
Conversation
With set_shared(false) and SuspendMode::Reset (the current defaults), every RTSP client connection creates its own independent GStreamer pipeline and camera stream session. Each connection to the RTSP server (go2rtc, ffprobe, health check probes) opens a new Baichuan session with the camera. During long-running 24/7 operation, this causes: - File descriptor exhaustion from accumulated CLOSE_WAIT connections - Memory growth as pipeline resources are not fully reclaimed - Camera firmware hitting its concurrent session limit - RTSP server becoming unresponsive to new connections Switch to set_shared(true) so all RTSP clients share a single GStreamer pipeline, and SuspendMode::None to keep the pipeline alive when the last client disconnects. This eliminates the constant pipeline teardown/rebuild cycle and keeps resource usage stable. Addresses the file descriptor leaks reported in QuantumEntangledAndy#370 and QuantumEntangledAndy#380, and the memory growth reported in QuantumEntangledAndy#366.
|
Note: This change came out of running Neolink for a 24/7 ALPR system where go2rtc, health checks, and the dashboard are all connecting to the RTSP server concurrently. The shared pipeline + no-reset suspend mode eliminated the resource exhaustion I was seeing. If this doesn't align with the direction you want to take the project, feel free to close it — just opening it in case it's useful to others hitting the same issues. |
…austion Upstream PR QuantumEntangledAndy#400. With set_shared(false), every RTSP client created its own GStreamer pipeline and camera session, causing FD exhaustion and memory growth during 24/7 operation. SuspendMode::None keeps pipeline alive across disconnect/reconnect cycles. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add a section crediting @wafgo, @Maaggs, @fromagge, @lorek123, and @joshkautz for their unmerged upstream PRs (QuantumEntangledAndy#373, QuantumEntangledAndy#389, QuantumEntangledAndy#394, QuantumEntangledAndy#395, QuantumEntangledAndy#396, QuantumEntangledAndy#398, QuantumEntangledAndy#399, QuantumEntangledAndy#400) that were ported into this fork. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
Id like to boost this. I am having crazy memory bloat in short periods of time. I found that 16 cameras could rack up nearly 20GB in less than 5 hours. In addition, I notice video feed becomes delayed and over a large amount of time, the entire match is delayed by nearly 10 minutes or more. |
Apply PR QuantumEntangledAndy#400 - shared GStreamer pipeline across RTSP clients - factory.set_shared(true): one pipeline per stream, not per client - RTSPSuspendMode::None: keep pipeline alive between client connects This eliminates per-client session buildup and CLOSE_WAIT FD leaks. Apply PR QuantumEntangledAndy#373 - bucketed GStreamer BufferPool allocation - Replace per-frame-size pools (unbounded growth) with power-of-two bucket pools (MAX 12 pools per appsrc, max 1 MiB bucket) - Oversized frames fall back to non-pooled allocation - Fixes steady RSS growth over hours of 24/7 operation Add live-stream / sports optimisations - make_queue: leaky=2 (downstream) — drop OLDEST frame when full, ensuring the output is always the most recent camera data - max-size-time=1s — one I-frame period (1x multiplier, 25fps) so the queue never holds more than one GOP of stale data - buffer_size(): 1 second of compressed data (bitrate/8) floored at 256 KiB; matches the 1x I-frame / 25fps assumption - send_to_appsrc: FlowError::Flushing now logs at debug level and returns Ok() cleanly — the leaky queue handles overflow silently Deploy: 4 cameras per neolink instance on Debian 12 LXC (Proxmox)
|
Okay, Im adding some context to my earlier message. (@QuantumEntangledAndy ) I have been using between 16 and 32 Reolink cams to handle livestreaming a sport field. My main issue was the fact I was getting extreme memory buffer runaway, and really fast with the amount of cameras. I have applied the code in #400, #373, and some of my own code for leaky frames (dump oldest, keep alive and up to date with no latency/lag behind). This has worked tremendously, and has now kept my idle (with client viewing of the rtsp streams in neolink, across network) at nearly 50MB for 4 cameras, down from Gigabytes per hour. Attached are some screenshots of the RSS and network usage overtime via proxmox cluster test containers. I ensured that there were at least ffplay instances for two cameras of a four camera toml file. I had originally had to break 16 cameras into four groups of 4, to help dilute the memory buffer runaway. This PR has fixed this. I did end up testing a large swath of cameras, of various qualities (2k, 4k and the 8k cams). No issues on any of them. My implementation of this and my own code attempts have some bugs I need to work out regarding gstreamer in the gst-factory.rs. In short, I support the pull request and believe it should be accepted as it fixes the server memory overuse. Thanks to @joshkautz |


Problem
With the current defaults (
set_shared(false)andSuspendMode::Reset), every RTSP client connection creates its own independent GStreamer pipeline and camera stream session. Each connection — go2rtc, ffprobe, VLC, health check probes — opens a separate Baichuan session with the camera.During long-running operation this causes:
nofilelimit, causing GStreamer critical errors (gst_poll_write_control: assertion 'set != NULL' failed) and eventually a fatal crash (Creating pipes for GWakeup: Too many open files).Fix
Two configuration changes to
NeoMediaFactory:set_shared(true): All RTSP clients share a single GStreamer pipeline. One pipeline, one camera session, regardless of how many clients connect.SuspendMode::None: Keep the pipeline alive when the last client disconnects, instead of tearing it down. This eliminates the teardown/rebuild cycle that causes resource churn.Together these changes keep resource usage stable during 24/7 operation with monitoring tools constantly probing the stream.
Changes
src/rtsp/gst/factory.rs: Changeset_shared(false)→set_shared(true),SuspendMode::Reset→SuspendMode::NoneRelationship to Other PRs
This PR is complementary to #373 and #340, which address buffer pool proliferation within a single pipeline:
This PR operates at a higher level: it prevents the multiplication of pipelines that amplifies the pool leak. With
set_shared(false), every client connection creates its own set of pools, so the FD/memory leak scales withnum_clients × num_unique_frame_sizes. Withset_shared(true), it's just1 × num_unique_frame_sizes, which #373's bucketing then bounds to ~12 pools.Commenters on #373 noted that the pool fix alone did not resolve streaming failures for 4K cameras — the pipeline architecture change in this PR addresses that remaining stability issue.
Related Issues