Fix resource exhaustion from per-client GStreamer pipelines by joshkautz · Pull Request #400 · QuantumEntangledAndy/neolink

joshkautz · 2026-03-03T23:32:19Z

Problem

With the current defaults (set_shared(false) and SuspendMode::Reset), every RTSP client connection creates its own independent GStreamer pipeline and camera stream session. Each connection — go2rtc, ffprobe, VLC, health check probes — opens a separate Baichuan session with the camera.

During long-running operation this causes:

File descriptor exhaustion: Each pipeline teardown/rebuild cycle leaks UNIX-STREAM socket file descriptors. Over hours, this hits the per-process nofile limit, causing GStreamer critical errors (gst_poll_write_control: assertion 'set != NULL' failed) and eventually a fatal crash (Creating pipes for GWakeup: Too many open files).
Memory growth: Pipeline resources are not fully reclaimed between sessions, causing memory to grow unbounded (users report 18-25GB before OOM).
Camera session limits: Reolink cameras have a limited number of concurrent Baichuan sessions. Multiple independent pipelines can exhaust this limit.
CLOSE_WAIT accumulation: The Reset suspend mode tears down and rebuilds the pipeline on every client connect/disconnect, accumulating TCP connections in CLOSE_WAIT state.

Fix

Two configuration changes to NeoMediaFactory:

set_shared(true): All RTSP clients share a single GStreamer pipeline. One pipeline, one camera session, regardless of how many clients connect.
SuspendMode::None: Keep the pipeline alive when the last client disconnects, instead of tearing it down. This eliminates the teardown/rebuild cycle that causes resource churn.

Together these changes keep resource usage stable during 24/7 operation with monitoring tools constantly probing the stream.

Changes

src/rtsp/gst/factory.rs: Change set_shared(false) → set_shared(true), SuspendMode::Reset → SuspendMode::None

Relationship to Other PRs

This PR is complementary to #373 and #340, which address buffer pool proliferation within a single pipeline:

feat(rtsp): fix file descriptor exhaustion and memory fragmentation #373 fixes the unbounded pool-per-frame-size growth with power-of-two bucketing — a valuable optimization even with a shared pipeline, since the single pipeline's pools can still grow.
Don't try to maintain a pool of buffers #340 removes pools entirely — simpler but loses the allocation throughput benefits.

This PR operates at a higher level: it prevents the multiplication of pipelines that amplifies the pool leak. With set_shared(false), every client connection creates its own set of pools, so the FD/memory leak scales with num_clients × num_unique_frame_sizes. With set_shared(true), it's just 1 × num_unique_frame_sizes, which #373's bucketing then bounds to ~12 pools.

Commenters on #373 noted that the pool fix alone did not resolve streaming failures for 4K cameras — the pipeline architecture change in this PR addresses that remaining stability issue.

Related Issues

neolink:latest runs away using up file descriptors when ffmpeg is connected to the rtsp port #370 — File descriptor runaway when ffmpeg connects to RTSP port
neolink in a reconnect/push loop and leaking file descriptors ? #380 — Reconnect loop leaking file descriptors, memory balloon to 18GB
Neolink V0.6.3 - Memory Leak #366 — Memory leak, 25GB RAM consumed overnight

With set_shared(false) and SuspendMode::Reset (the current defaults), every RTSP client connection creates its own independent GStreamer pipeline and camera stream session. Each connection to the RTSP server (go2rtc, ffprobe, health check probes) opens a new Baichuan session with the camera. During long-running 24/7 operation, this causes: - File descriptor exhaustion from accumulated CLOSE_WAIT connections - Memory growth as pipeline resources are not fully reclaimed - Camera firmware hitting its concurrent session limit - RTSP server becoming unresponsive to new connections Switch to set_shared(true) so all RTSP clients share a single GStreamer pipeline, and SuspendMode::None to keep the pipeline alive when the last client disconnects. This eliminates the constant pipeline teardown/rebuild cycle and keeps resource usage stable. Addresses the file descriptor leaks reported in QuantumEntangledAndy#370 and QuantumEntangledAndy#380, and the memory growth reported in QuantumEntangledAndy#366.

joshkautz · 2026-03-03T23:41:11Z

Note: This change came out of running Neolink for a 24/7 ALPR system where go2rtc, health checks, and the dashboard are all connecting to the RTSP server concurrently. The shared pipeline + no-reset suspend mode eliminated the resource exhaustion I was seeing. If this doesn't align with the direction you want to take the project, feel free to close it — just opening it in case it's useful to others hitting the same issues.

…austion Upstream PR QuantumEntangledAndy#400. With set_shared(false), every RTSP client created its own GStreamer pipeline and camera session, causing FD exhaustion and memory growth during 24/7 operation. SuspendMode::None keeps pipeline alive across disconnect/reconnect cycles. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

@wafgo

Add a section crediting @wafgo, @Maaggs, @fromagge, @lorek123, and @joshkautz for their unmerged upstream PRs (QuantumEntangledAndy#373, QuantumEntangledAndy#389, QuantumEntangledAndy#394, QuantumEntangledAndy#395, QuantumEntangledAndy#396, QuantumEntangledAndy#398, QuantumEntangledAndy#399, QuantumEntangledAndy#400) that were ported into this fork. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

LinuxMainframe · 2026-05-21T02:16:41Z

Id like to boost this. I am having crazy memory bloat in short periods of time. I found that 16 cameras could rack up nearly 20GB in less than 5 hours. In addition, I notice video feed becomes delayed and over a large amount of time, the entire match is delayed by nearly 10 minutes or more.

Apply PR QuantumEntangledAndy#400 - shared GStreamer pipeline across RTSP clients - factory.set_shared(true): one pipeline per stream, not per client - RTSPSuspendMode::None: keep pipeline alive between client connects This eliminates per-client session buildup and CLOSE_WAIT FD leaks. Apply PR QuantumEntangledAndy#373 - bucketed GStreamer BufferPool allocation - Replace per-frame-size pools (unbounded growth) with power-of-two bucket pools (MAX 12 pools per appsrc, max 1 MiB bucket) - Oversized frames fall back to non-pooled allocation - Fixes steady RSS growth over hours of 24/7 operation Add live-stream / sports optimisations - make_queue: leaky=2 (downstream) — drop OLDEST frame when full, ensuring the output is always the most recent camera data - max-size-time=1s — one I-frame period (1x multiplier, 25fps) so the queue never holds more than one GOP of stale data - buffer_size(): 1 second of compressed data (bitrate/8) floored at 256 KiB; matches the 1x I-frame / 25fps assumption - send_to_appsrc: FlowError::Flushing now logs at debug level and returns Ok() cleanly — the leaky queue handles overflow silently Deploy: 4 cameras per neolink instance on Debian 12 LXC (Proxmox)

LinuxMainframe · 2026-05-22T03:51:43Z

Okay, Im adding some context to my earlier message. (@QuantumEntangledAndy )

I have been using between 16 and 32 Reolink cams to handle livestreaming a sport field.

My main issue was the fact I was getting extreme memory buffer runaway, and really fast with the amount of cameras.

I have applied the code in #400, #373, and some of my own code for leaky frames (dump oldest, keep alive and up to date with no latency/lag behind). This has worked tremendously, and has now kept my idle (with client viewing of the rtsp streams in neolink, across network) at nearly 50MB for 4 cameras, down from Gigabytes per hour.

Attached are some screenshots of the RSS and network usage overtime via proxmox cluster test containers.

The RSS was taken over a ten minute period, so these are reasonably spaced apart. I did so because my original RSS was climbing extremely fast and this has stayed stable for much longer than ever before.

I ensured that there were at least ffplay instances for two cameras of a four camera toml file. I had originally had to break 16 cameras into four groups of 4, to help dilute the memory buffer runaway. This PR has fixed this.

I did end up testing a large swath of cameras, of various qualities (2k, 4k and the 8k cams). No issues on any of them. My implementation of this and my own code attempts have some bugs I need to work out regarding gstreamer in the gst-factory.rs.

In short, I support the pull request and believe it should be accepted as it fixes the server memory overuse.

Thanks to @joshkautz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix resource exhaustion from per-client GStreamer pipelines#400

Fix resource exhaustion from per-client GStreamer pipelines#400
joshkautz wants to merge 1 commit into
QuantumEntangledAndy:masterfrom
joshkautz:fix/shared-pipeline

joshkautz commented Mar 3, 2026 •

edited

Loading

Uh oh!

joshkautz commented Mar 3, 2026

Uh oh!

LinuxMainframe commented May 21, 2026

Uh oh!

LinuxMainframe commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

joshkautz commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Changes

Relationship to Other PRs

Related Issues

Uh oh!

joshkautz commented Mar 3, 2026

Uh oh!

LinuxMainframe commented May 21, 2026

Uh oh!

LinuxMainframe commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joshkautz commented Mar 3, 2026 •

edited

Loading