Skip to content

Fix resource exhaustion from per-client GStreamer pipelines#400

Draft
joshkautz wants to merge 1 commit into
QuantumEntangledAndy:masterfrom
joshkautz:fix/shared-pipeline
Draft

Fix resource exhaustion from per-client GStreamer pipelines#400
joshkautz wants to merge 1 commit into
QuantumEntangledAndy:masterfrom
joshkautz:fix/shared-pipeline

Conversation

@joshkautz
Copy link
Copy Markdown

@joshkautz joshkautz commented Mar 3, 2026

Problem

With the current defaults (set_shared(false) and SuspendMode::Reset), every RTSP client connection creates its own independent GStreamer pipeline and camera stream session. Each connection — go2rtc, ffprobe, VLC, health check probes — opens a separate Baichuan session with the camera.

During long-running operation this causes:

  • File descriptor exhaustion: Each pipeline teardown/rebuild cycle leaks UNIX-STREAM socket file descriptors. Over hours, this hits the per-process nofile limit, causing GStreamer critical errors (gst_poll_write_control: assertion 'set != NULL' failed) and eventually a fatal crash (Creating pipes for GWakeup: Too many open files).
  • Memory growth: Pipeline resources are not fully reclaimed between sessions, causing memory to grow unbounded (users report 18-25GB before OOM).
  • Camera session limits: Reolink cameras have a limited number of concurrent Baichuan sessions. Multiple independent pipelines can exhaust this limit.
  • CLOSE_WAIT accumulation: The Reset suspend mode tears down and rebuilds the pipeline on every client connect/disconnect, accumulating TCP connections in CLOSE_WAIT state.

Fix

Two configuration changes to NeoMediaFactory:

  1. set_shared(true): All RTSP clients share a single GStreamer pipeline. One pipeline, one camera session, regardless of how many clients connect.

  2. SuspendMode::None: Keep the pipeline alive when the last client disconnects, instead of tearing it down. This eliminates the teardown/rebuild cycle that causes resource churn.

Together these changes keep resource usage stable during 24/7 operation with monitoring tools constantly probing the stream.

Changes

  • src/rtsp/gst/factory.rs: Change set_shared(false)set_shared(true), SuspendMode::ResetSuspendMode::None

Relationship to Other PRs

This PR is complementary to #373 and #340, which address buffer pool proliferation within a single pipeline:

This PR operates at a higher level: it prevents the multiplication of pipelines that amplifies the pool leak. With set_shared(false), every client connection creates its own set of pools, so the FD/memory leak scales with num_clients × num_unique_frame_sizes. With set_shared(true), it's just 1 × num_unique_frame_sizes, which #373's bucketing then bounds to ~12 pools.

Commenters on #373 noted that the pool fix alone did not resolve streaming failures for 4K cameras — the pipeline architecture change in this PR addresses that remaining stability issue.

Related Issues

With set_shared(false) and SuspendMode::Reset (the current defaults),
every RTSP client connection creates its own independent GStreamer
pipeline and camera stream session. Each connection to the RTSP server
(go2rtc, ffprobe, health check probes) opens a new Baichuan session
with the camera.

During long-running 24/7 operation, this causes:
- File descriptor exhaustion from accumulated CLOSE_WAIT connections
- Memory growth as pipeline resources are not fully reclaimed
- Camera firmware hitting its concurrent session limit
- RTSP server becoming unresponsive to new connections

Switch to set_shared(true) so all RTSP clients share a single
GStreamer pipeline, and SuspendMode::None to keep the pipeline alive
when the last client disconnects. This eliminates the constant
pipeline teardown/rebuild cycle and keeps resource usage stable.

Addresses the file descriptor leaks reported in QuantumEntangledAndy#370 and QuantumEntangledAndy#380, and
the memory growth reported in QuantumEntangledAndy#366.
@joshkautz
Copy link
Copy Markdown
Author

Note: This change came out of running Neolink for a 24/7 ALPR system where go2rtc, health checks, and the dashboard are all connecting to the RTSP server concurrently. The shared pipeline + no-reset suspend mode eliminated the resource exhaustion I was seeing. If this doesn't align with the direction you want to take the project, feel free to close it — just opening it in case it's useful to others hitting the same issues.

janost added a commit to MutuallyAssuredDeployment/neolink that referenced this pull request Mar 18, 2026
…austion

Upstream PR QuantumEntangledAndy#400. With set_shared(false), every RTSP client created its own
GStreamer pipeline and camera session, causing FD exhaustion and memory growth
during 24/7 operation. SuspendMode::None keeps pipeline alive across
disconnect/reconnect cycles.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
janost added a commit to MutuallyAssuredDeployment/neolink that referenced this pull request Mar 19, 2026
@LinuxMainframe
Copy link
Copy Markdown

Id like to boost this. I am having crazy memory bloat in short periods of time. I found that 16 cameras could rack up nearly 20GB in less than 5 hours. In addition, I notice video feed becomes delayed and over a large amount of time, the entire match is delayed by nearly 10 minutes or more.

LinuxMainframe pushed a commit to LinuxMainframe/neolink that referenced this pull request May 22, 2026
Apply PR QuantumEntangledAndy#400 - shared GStreamer pipeline across RTSP clients
  - factory.set_shared(true): one pipeline per stream, not per client
  - RTSPSuspendMode::None: keep pipeline alive between client connects
  This eliminates per-client session buildup and CLOSE_WAIT FD leaks.

Apply PR QuantumEntangledAndy#373 - bucketed GStreamer BufferPool allocation
  - Replace per-frame-size pools (unbounded growth) with power-of-two
    bucket pools (MAX 12 pools per appsrc, max 1 MiB bucket)
  - Oversized frames fall back to non-pooled allocation
  - Fixes steady RSS growth over hours of 24/7 operation

Add live-stream / sports optimisations
  - make_queue: leaky=2 (downstream) — drop OLDEST frame when full,
    ensuring the output is always the most recent camera data
  - max-size-time=1s — one I-frame period (1x multiplier, 25fps)
    so the queue never holds more than one GOP of stale data
  - buffer_size(): 1 second of compressed data (bitrate/8) floored
    at 256 KiB; matches the 1x I-frame / 25fps assumption
  - send_to_appsrc: FlowError::Flushing now logs at debug level and
    returns Ok() cleanly — the leaky queue handles overflow silently

Deploy: 4 cameras per neolink instance on Debian 12 LXC (Proxmox)
@LinuxMainframe
Copy link
Copy Markdown

Okay, Im adding some context to my earlier message. (@QuantumEntangledAndy )

I have been using between 16 and 32 Reolink cams to handle livestreaming a sport field.

My main issue was the fact I was getting extreme memory buffer runaway, and really fast with the amount of cameras.

I have applied the code in #400, #373, and some of my own code for leaky frames (dump oldest, keep alive and up to date with no latency/lag behind). This has worked tremendously, and has now kept my idle (with client viewing of the rtsp streams in neolink, across network) at nearly 50MB for 4 cameras, down from Gigabytes per hour.

Attached are some screenshots of the RSS and network usage overtime via proxmox cluster test containers.
image
image
The RSS was taken over a ten minute period, so these are reasonably spaced apart. I did so because my original RSS was climbing extremely fast and this has stayed stable for much longer than ever before.

I ensured that there were at least ffplay instances for two cameras of a four camera toml file. I had originally had to break 16 cameras into four groups of 4, to help dilute the memory buffer runaway. This PR has fixed this.

I did end up testing a large swath of cameras, of various qualities (2k, 4k and the 8k cams). No issues on any of them. My implementation of this and my own code attempts have some bugs I need to work out regarding gstreamer in the gst-factory.rs.

In short, I support the pull request and believe it should be accepted as it fixes the server memory overuse.

Thanks to @joshkautz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants