Skip to content
Open
Changes from 31 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
87abc5c
initial draft for webrtc adr
Irakus Sep 15, 2025
d681fef
small changes
Irakus Sep 15, 2025
98c3a6f
Added risks
Irakus Sep 15, 2025
1da3dad
Merge branch 'main' into webrtc_adr
Irakus Sep 15, 2025
e88248a
Merge branch 'main' into webrtc_adr
Irakus Sep 16, 2025
581a74b
Merge branch 'main' into webrtc_adr
Irakus Sep 16, 2025
e711c8c
adjusted docs
Irakus Sep 17, 2025
6b10ad4
Merge branch 'main' into webrtc_adr
Irakus Sep 18, 2025
14463c4
added diagram
Irakus Sep 18, 2025
f9fc473
split diagram into 2
Irakus Sep 18, 2025
d04bc06
changed to sequence diagrams
Irakus Sep 18, 2025
505f3d7
Merge branch 'main' into webrtc_adr
Irakus Sep 18, 2025
f5e82e1
Merge branch 'main' into webrtc_adr
Irakus Sep 19, 2025
63ecc0f
updated feature name
Irakus Sep 22, 2025
9d5cf2a
Update docs/design/streaming-video-webrtc-ui.md
Irakus Sep 22, 2025
9cd01ed
Update docs/design/streaming-video-webrtc-ui.md
Irakus Sep 22, 2025
df287ea
Merge branch 'main' into webrtc_adr
Irakus Sep 22, 2025
288b12f
Update charts
Irakus Sep 23, 2025
6469534
Merge branch 'webrtc_adr' of https://github.com/open-edge-platform/sc…
Irakus Sep 23, 2025
4e70525
prettier-write
Irakus Sep 23, 2025
e1d7c34
update charts
Irakus Sep 23, 2025
752aaf3
update
Irakus Sep 23, 2025
09e5fb5
updated alternative solution
Irakus Sep 23, 2025
d36eefc
prettier-write
Irakus Sep 23, 2025
439f218
update
Irakus Sep 23, 2025
c0971a3
Apply suggestion from @ltalarcz
Irakus Sep 23, 2025
f92a643
Apply suggestion from @ltalarcz
Irakus Sep 23, 2025
9367485
added another alternative approach
Irakus Sep 23, 2025
99e27d0
Merge branch 'webrtc_adr' of https://github.com/open-edge-platform/sc…
Irakus Sep 23, 2025
bdfb3d5
prettier-write
Irakus Sep 23, 2025
57ff3c4
added nginx and coturn
Irakus Sep 23, 2025
3f4f6ee
Merge branch 'main' into webrtc_adr
Irakus Sep 25, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
343 changes: 343 additions & 0 deletions docs/design/streaming-video-webrtc-ui.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My general impression of this implementation is of complexity. Supporting WebRTC streaming of all DLS-PS inputs is not a trivial task and involves many moving parts. Even with all of this complexity, the core capabilities needed for SceneScape are not met, including access to raw, unannotated and undistorted frames for camera calibration (manual or auto) in addition to access to undistorted and annotated frames for pipeline monitoring and visual verification.

By design, in production the frames were not published over MQTT and the system would free-run by publishing only metadata with no frames on MQTT (in Percebro, at least). Only when the GUI connected and "live" mode was on or calibration was actively happening would occasional preview frames appear in the GUI that simulated a live feed. This worked well even in very bandwidth-constrained environments because the client would only request the next frame once the previous frame is received. It also does not compete with running an independent NVR against the same cameras that is designed for monitoring live feeds.

In general, use of WebRTC is interesting for minimizing the network overhead when people are monitoring the system. For SceneScape running scene analytics in production mode with no human in the loop, there may be a negative impact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us whiteboard the entire pipeline factoring in distortion handling, frigate/NVR storage, uploading snippets to Geti etc. so that what we build does not have to be restructured as new requests come in.

Agree with the complexity comment. Also, in production "live view" will be off and thus should not overload the mqtt broker. The issue is that in all of conference and customer demos, sample apps etc. the "live view" is turned on. As the number of cameras increased we observed unstable behavior that did not exist if "live view" was turned off. Also, grafana dashboards that show live video feed commonly use WebRTC connection to show the stream.

Dumping larger frames on MQTT is not a standard practice for a reason and our demos are increasingly using 4k video feeds. Mosquitto broker may have a max message size of 256MB but other brokers like AWS IoT have a limit of 128 KB. Kafka has a default max size of 1MB.

For future, when looking at supporting moving cameras, the camera could be continually streaming rather than being requested to calibrate by the user event from UI and the feed needs to be accessible by the calibration service.

Vibhu has requested this feature. Having seen the complex architecture, I request that an official communique be sent from CIFA regarding this irrespective of what the decision is.

Original file line number Diff line number Diff line change
@@ -0,0 +1,343 @@
# Design Document: Using WebRTC for Video Streaming

- **Author(s)**: [Patryk Iracki](https://github.com/Irakus)
- **Date**: 2025-09-15
- **Status**: [Proposed]
- **Related ADRs**: N/A

---

## 1. Overview

Replacing the current video streaming from MQTT-based to WebRTC-based to improve performance and user experience.

## 2. Goals

- Stop publishing video frames over MQTT
- Implement WebRTC for video streaming
- Reduce latency for each stream
- Reduce resource consumption on DLStreamer Pipeline Server

## 3. Non-Goals

- Using WebRTC for calibration service
- Removing Python script from DLStreamer pipeline - Preprocessing will still be used, for Postprocessig only images publishing for calibration will be kept

## 4. Background / Context

### Current Design

```mermaid
flowchart LR
subgraph Cameras["IP Cameras"]
C1["Camera 1<br/>(RTSP H.264)"]
C2["Camera 2<br/>(RTSP MJPEG)"]
end

subgraph Mqtt["MQTT<br/>"]
RawStream["Raw video stream"]
AnnotatedStream["Annotated video stream"]
VideoMetadata["Video metadata"]
end

subgraph AI["DL Streamer Pipeline"]
subgraph gvapython["gvapython"]
CustomPreProcess["Custom pre-processing"]
CustomPostProcess["Custom post-processing"]
end
Detect["Inference<br/>(Object Detection)"]
CustomPreProcess --> Detect --> CustomPostProcess
end

subgraph Browser["Web Browser"]
Scene["Scene Page<br/>(AI Stream)"]
AutoCalib["Autocalibration Page<br/>(Raw Stream)"]
end

%% Camera flows into DLS
C1 --> CustomPreProcess
C2 --> CustomPreProcess

%% DLS publishes to 3 MQTT topics
CustomPostProcess --> RawStream
CustomPostProcess --> AnnotatedStream
CustomPostProcess --> VideoMetadata

%% Web UI subscribes to 2 MQTT topics
AnnotatedStream --> Scene
RawStream --> AutoCalib
```

As of now, MQTT was used as single channel for all data, including video frames. This approach has several drawbacks:

- High latency due to MQTT protocol overhead
- Increased CPU and memory usage on the server side - it's only assumption, need benchmark for exact numbers
- Scalability issues with multiple concurrent video streams
To achieve this, there's a custom python script used in DLStreamer pipeline that takes raw video frames, draws overlays and watermarks, encodes them to JPEG and publishes to MQTT broker. On the client side, the web application subscribes to the MQTT topic, decodes JPEG frames and displays them in an HTML image and canvas elements. This approach is not optimal for real-time video streaming due to the overhead of encoding/decoding and the limitations of MQTT for high-frequency data transmission.
- Even though the current solution is not optimal and efficient, it ensures that all data is synchronised since it's transmitted over a single channel.
- Another positive aspect is reliability of MQTT protocol, which ensures that all messages are delivered, even in case of temporary network issues. This is particularly important for scenarios where data integrity is crucial.

## 5. Proposed Design

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

One concern I have is added latency. If transcoding and routing via RTSP happens prior to inferencing and publishing metadata, then we have significant added latency that will directly impact live scene analytics. Anything that relates to viewing the feed by a human for calibration or monitoring purposes should be happening as secondary workload with minimal impact to live processing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed regarding latency. Can we quantify and then make a decision?

```mermaid
flowchart LR
subgraph Cameras["IP Cameras"]
C1["Camera 1<br/>(RTSP H.264)"]
C2["Camera 2<br/>(RTSP MJPEG)"]
end

subgraph FFMPEG["Video Adapter<br/>(FFMPEG)"]
Transcode["Transcoding<br/>(MJPEG β†’ H.264)"]
end

subgraph Nginx["NGINX<br/>(Reverse Proxy)"]
TLS["TLS Termination"]
end

subgraph MediaServer["Media Server + TURN <br/>(mediamtx + coturn)"]
RouteRTSP["Routing<br/>(RTSP)"]
Repack["Protocol Repackaging<br/>(H.264 RTSP β†’ WebRTC)"]
RouteWebRTC["Routing<br/>(WebRTC)"]
end

subgraph AI["DL Streamer Pipeline"]
subgraph gvapython["gvapython"]
CustomPreProcess["Custom pre-processing"]
CustomPostProcess["Custom post-processing"]
end
Detect["Inference<br/>(Object Detection)"]
Overlay["Overlay Bounding Boxes"]
end

subgraph Mqtt["MQTT<br/>"]
VideoMetadata["Video metadata"]
end

subgraph Browser["Web Browser"]
Scene["Scene Page<br/>(AI Stream)"]
AutoCalib["Autocalibration Page<br/>(Raw Stream)"]
end

%% Camera flows into Media Server
C1 --> Transcode
C2 --> Transcode

%% FFMPEG converts video and sends to Media Server
Transcode --> RouteRTSP

%% Inference metadata
CustomPostProcess --> VideoMetadata

%% Raw stream path β†’ Autocalibration
RouteRTSP --> Repack
Repack --> TLS
TLS --> AutoCalib

%% AI pipeline path β†’ Scene
RouteRTSP --> CustomPreProcess
CustomPreProcess --> Detect --> Overlay --> RouteWebRTC
Detect --> CustomPostProcess
RouteWebRTC --> TLS
TLS --> Scene

```

### New components

- **MediaMTX**: An open-source media server that supports various streaming protocols, including WebRTC. It will handle the WebRTC connections and stream routing.
- **FFMPEG-based adapter**: A lightweight component that will convert camera streams to a WebRTC-compatible format with `zerolatency` tuning.
- **NGINX as a reverse proxy**: To handle TLS termination and provide a secure connection for Web app.
- **COTURN**: WebRTC connection requires a TURN server for NAT traversal in some network configurations. This can be set up using open-source solutions like Coturn.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nginx & coturn add extra complexity to our compose & k8s deployments, I need some more time to understand if this is really unavoidable; will get back to you on this tommorow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coturn is really needed to establish connection - tested without and it just didn't work.
NGINX is more complicated. It solves mixed content error. It's when web app behind HTTPS tries to make HTTP calls. Adding some exceptions to Apache server isn't solving the issue as web browsers are blocking such content with their built-in security features.

Copy link
Contributor Author

@Irakus Irakus Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


### Key changes

- In python script, only frames needed for autocalibration will be published to MQTT as they're only transmitted one-time and on demand when autocalibration button is pressed by user.
- MediaMTX server will be used to handle WebRTC connections.
- On the client side, the web application will establish a WebRTC connection to MediaMTX server to receive video streams. This will involve setting up signaling, ICE candidates, and media tracks.
- Overlays and watermarks provided by custom Python Script will be dropped. Instead, native DLStreamer bounding boxes will be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this change be included in a separate PR and merged irrespective of WebRTC enablement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we publish frames through gvapython, we need to stay with custom boxes.

Live-view button will be replaced from Scene Details as WebRTC stream is not that easy to start/stop as MQTT stream. Instead, live-view will be always active when user is on Scene Details page.
- For raw camera feed, as they're already available in MediaMTX server, at least a consistent naming convention will be needed, as web app only knows topic names of DLStreamer output streams.
- With MQTT there were no requirements for video format, as each frame was encoded to JPEG image. With WebRTC, video codec must be supported by both MediaMTX server and web browsers. Videos can no longer contain b-frames.
- Nginx will be added as a reverse proxy in front of MediaMTX server to handle TLS termination and provide a secure connection for Web app.
For browser to connect to MediaMTX server, a valid TLS certificate must be used. Instead of accepting insecure connection in browser, user guide should include instructions on how to import Scenescape CA certificate.
- TURN server will be set up using Coturn to ensure WebRTC connections can be established in various network configurations.

## 6. Alternatives Considered

### Displaying DLStreamer output in all places

```mermaid
flowchart LR
subgraph Cameras["IP Cameras"]
C1["Camera 1<br/>(RTSP H.264)"]
C2["Camera 2<br/>(RTSP MJPEG)"]
end

subgraph MediaServer["Media Server + TURN <br/>(mediamtx + coturn)"]
RouteWebRTC["Routing<br/>(WebRTC)"]
end

subgraph Nginx["NGINX<br/>(Reverse Proxy)"]
TLS["TLS Termination"]
end

subgraph AI["DL Streamer Pipeline"]
subgraph gvapython["gvapython"]
CustomPreProcess["Custom pre-processing"]
end
Detect["Inference<br/>(Object Detection)"]
Overlay["Overlay Bounding Boxes"]
end

subgraph Browser["Web Browser"]
Scene["Scene Page<br/>(AI Stream)"]
AutoCalib["Autocalibration Page<br/>(Raw Stream)"]
end

%% Camera flows into Media Server
C1 --> CustomPreProcess
C2 --> CustomPreProcess

%% Raw stream path β†’ Autocalibration
TLS --> AutoCalib

%% AI pipeline path β†’ Scene
CustomPreProcess --> Detect --> Overlay --> RouteWebRTC --> TLS --> Scene


```

#### New components

- **MediaMTX**: An open-source media server that supports various streaming protocols, including WebRTC. It will handle the WebRTC connections and stream routing.
- **NGINX as a reverse proxy**: To handle TLS termination and provide a secure connection for Web app.
- **COTURN**: WebRTC connection requires a TURN server for NAT traversal in some network configurations. This can be set up using open-source solutions like Coturn.

#### Key changes

- No adapter component is needed, as DLStreamer will handle all camera formats.
- No change in currently supported video formats
- DLStreamer will output only one video stream per camera, with bounding boxes overlayed. This means Camera calibration page will also show bounding boxes, which may be distracting for user.

### Splitting streams in DLStreamer Pipeline Server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm opting for such a solution in the long term and would try to push the DLSPS team to make it possible.

Key advantages that I see:

  • no need to route the input streams from cameras through the media server, so lower latency, which is crucial for real-time tracking of use cases.
  • the configuration of the cameras will be much simpler from the user's point of view (just define the source in DLSPS pipelien, no need to publish camera feeds into media server and configure it)
  • this will simplify the architecture and improve maintainability.

In the short term we could possibly choose "Displaying DLStreamer output in all places" alternative, so use annotated frames in the camera calibration page as a trade off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall DLSPS having the ability to restream WebRTC without having to have a separate MediaMTX + Turn component. It may have been removed during the build refactor for reducing size and dependency list. That would have simplified the design further at the expense of overloading DLSPS "microservice". @rawatts10 what are your thoughts on the complexity of these alternative approaches? In the short term, I believe we can live with using watermarked stream for calibration until DLSPS team supports split streams.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split streams are supported, I've made alternative approach into another PR to have these side-by-side
#441


```mermaid
flowchart LR
subgraph Cameras["IP Cameras"]
C1["Camera 1<br/>(RTSP H.264)"]
C2["Camera 2<br/>(RTSP MJPEG)"]
end

subgraph MediaServer["Media Server + TURN <br/>(mediamtx + coturn)"]
RouteWebRTC["Routing<br/>(WebRTC)"]
RouteWebRTCOverlay["Routing (with overlay)<br/>(WebRTC)"]
end

subgraph Nginx["NGINX<br/>(Reverse Proxy)"]
TLS["TLS Termination"]
end

subgraph AI["DL Streamer Pipeline"]
subgraph gvapython["gvapython"]
CustomPreProcess["Custom pre-processing"]
end
Detect["Inference<br/>(Object Detection)"]
Overlay["Overlay Bounding Boxes"]
end

subgraph Browser["Web Browser"]
Scene["Scene Page<br/>(AI Stream)"]
AutoCalib["Autocalibration Page<br/>(Raw Stream)"]
end

%% Camera flows into Media Server
C1 --> CustomPreProcess
C2 --> CustomPreProcess

%% Raw stream path β†’ Autocalibration
RouteWebRTC --> TLS
TLS --> AutoCalib

%% AI pipeline path β†’ Scene
CustomPreProcess --> Detect --> Overlay --> RouteWebRTCOverlay
Detect --> RouteWebRTC
RouteWebRTCOverlay --> TLS--> Scene
```

#### New components

- **MediaMTX**: An open-source media server that supports various streaming protocols, including WebRTC. It will handle the WebRTC connections and stream routing.
- **NGINX as a reverse proxy**: To handle TLS termination and provide a secure connection for Web app.
- **COTURN**: WebRTC connection requires a TURN server for NAT traversal in some network configurations. This can be set up using open-source solutions like Coturn.

#### Key changes

- No adapter component is needed, as DLStreamer will handle all camera formats.
- No change in currently supported video formats
- DLStreamer would split the stream into two before applying watermarks. This would allow us to use different streams for Scene and Autocalibration pages.

#### Issues

- Although DLStreamer supports split pipelines, DLSPS doesn't as it only accepts one destination in payload. An experiment showed that it only used last defined appsink as output. Update in DLSPS would be needed to support multiple outputs.

### Staying with current implementation

Staying with MQTT: for few cameras and low frame rates, MQTT might be sufficient, but it doesn't scale well with more cameras and higher frame rates.

## 7. Risks and Mitigations

- When video is out of user view, browsers stop buffering it. Reconnection can take a while - subject for further discussion
- Lost synchronization between video and other dlstreamer data - subject for further discussion
- Only DLStreamer output topics are known to web app - raw camera feed topic naming convention must be established
- WebRTC is less reliable at delivering every single frame compared to MQTT - subject for further discussion
- WebRTC has more strict requirements for video format - Adding ffmpeg-based adapter component for connecting cameras to ensure WebRTC-compatible video format

## 8. Rollout / Migration Plan

Upgrade from current version would require user to restart DLStreamer Pipelines and Web App.
Inference is not affected by this change, so no retraining of models is needed.
Persistent data is not affected by this change, so no migration of database is needed.
With new component, system requirements will increase, so server specs must be checked to ensure they meet the new requirements.

## 9. Testing & Monitoring

### Video format support

With the adapter, it is essential to ensure that all currently supported camera formats are compatible and can be processed correctly.

### Performance Improvements

A setup involving numerous cameras and/or higher frame rates is required to effectively observe performance improvements.

## 10. Open Questions

### Video Formats

WebRTC has limited support on Video Codecs which may also vary between browsers.
Documenation for supported codecs:

- https://www.rfc-editor.org/rfc/rfc7742.txt
- https://developer.mozilla.org/en-US/docs/Web/Media/Guides/Formats/WebRTC_codecs

Overview:

- Mandatory (Must Support)
- VP8
- H.264 (Constrained Baseline Profile)
- Optional (May Support)
- VP9
- AV1
- H.265/HEVC (limited browser support)
- Legacy/Deprecated
- H.264 (other profiles - limited support)
- Browser Support Notes:
- Chrome/Edge: VP8, H.264, VP9, AV1
- Firefox: VP8, H.264, VP9, AV1 (experimental)
- Safari: VP8, H.264, VP9 (limited), H.265 (Safari-specific)

Aside from that, for quick start of transmission, more keyframes are needed. Our sample videos have keyframes every 10 seconds and that causes long delays when starting the stream. Ideal keyframe interval is 1-2 seconds.

## 11. References

- https://www.rfc-editor.org/rfc/rfc7742.txt
- https://developer.mozilla.org/en-US/docs/Web/Media/Guides/Formats/WebRTC_codecs
Loading