Skip to content

Releases: espressif/esp-webrtc-solution

ESP WebRTC Solution Release v1.0

14 May 11:26
Compare
Choose a tag to compare

Overview

ESP WebRTC Solution v1.0 is the first stable release of Espressif’s WebRTC implementation designed specifically for lightweight embedded devices. This version delivers a comprehensive protocol stack for building real-time communication applications on ESP32 series chips, supporting audio/video streaming, data channel communication, and customizable signaling mechanisms.

🚀 Highlights

  • High-level esp_webrtc API for easy WebRTC application development
  • Support for peer-to-peer media and data communication via RTP and SCTP
  • TURN support, NACK handling, SCTP SACK support
  • Flexible signaling abstraction with built-in support for AppRTC, WHIP, OpenAI Realtime, and local HTTP SSE
  • Audio/video capture and rendering modules with codec abstraction
  • Out-of-the-box support for key audio/video codecs: H.264, MJPEG, OPUS, G.711, AAC
  • Demo projects for doorbell, OpenAI chatbot, WHIP publishing, and more
  • Fewer dependencies (only depends on libSRTP; all other modules are included in ESP-IDF)
  • Lightweight and low memory consumption — designed specifically for embedded devices

1. Core WebRTC Components

1.1 High-level esp_webrtc API

The esp_webrtc API internally manages PeerConnection state and signaling flow, making it easy to build WebRTC applications. In most cases, users only need to adapt it to their custom board and signaling — everything else is handled by esp_webrtc.

1.2 WebRTC Peer Connection (esp_peer)

esp_peer abstracts WebRTC PeerConnection logic on ESP32 devices and includes a default implementation (peer_default) derived from libpeer, with the following features:

  • Full TURN support (RFC5766 and RFC8656)
  • Optimized connection speed with multiple ICE candidates
  • Supports both Controlling and Controlled roles
  • RTP NACK support for retransmission
  • SCTP SACK support for large data transmission
  • Separate tasks for sending/receiving to avoid blocking
  • Codec support:
    • Video: H.264 (baseline), MJPEG (data channel only)
    • Audio: G.711 (PCMA/PCMU), OPUS

1.3 WebRTC Signaling (esp_peer_signaling)

Signaling is used to detect peers and exchange SDP/control commands. esp_peer_signaling abstracts signaling logic, allowing easy integration of custom signaling without modifying the core WebRTC stack. Built-in implementations:

  • esp_signaling_get_apprtc_impl: AppRTC signaling via WebSocket
  • esp_signaling_get_whip_impl: WHIP protocol for publishing to WebRTC servers
  • esp_signaling_get_openai_signaling: OpenAI Realtime API for chatbot integration
  • esp_signaling_get_http_impl: Local signaling via HTTP SSE for testing

2. Media Provider

2.1 Media Capture (esp_capture)

Integrated audio/video capture framework supporting various devices and codecs:

  • Video: H.264 (baseline), MJPEG
  • Audio: G.711 (PCMA/PCMU), AAC, OPUS

Capture device abstraction allows plug-and-play development:

  • esp_capture_new_audio_codec_src: I2S audio (via esp_codec_dev)
  • esp_capture_new_audio_aec_src: I2S audio with AEC
  • esp_capture_new_video_v4l2_src: V4L2 (MIPI CSI/DVP, ESP32-P4 only)
  • esp_capture_new_video_dvp_src: DVP camera (ESP32-S3 and others)

2.2 Media Player (av_render)

A lightweight media player supporting a "push" model for playback:

  • Video: H.264 (baseline), MJPEG
  • Audio: G.711 (PCMA/PCMU), AAC, OPUS

Rendering device abstraction:

  • av_render_alloc_i2s_render: Audio playback via I2S
  • av_render_alloc_lcd_render: Video output via esp_lcd

3. Board Configuration (codec_board)

Provides default configuration to simplify adapting audio boards for quick testing and verification.


4. Demo Solutions

  • Peer Demo: Peer-to-peer demo between two ESP32 devices
    • Audio and data channel communication
  • OpenAI Demo: Real-time chatbot using OpenAI Realtime API
    • Audio capture, AEC, and AI response
    • Supports function calls for device control
    • Supports OPUS for better audio quality
    • Supports changing voice type
  • Doorbell Demo: Smart video doorbell solution
    • AppRTC signaling, video/audio stream, two-way talk, MAC-based auto connect
  • Doorbell Local: Local version of Doorbell Demo
    • Uses HTTP SSE signaling for LAN tests
  • Video Call Demo: Full-featured two-way audio/video calling
    • MJPEG transmission over data channel
  • WHIP Demo: Publish media to WHIP-compatible servers
    • Uses WHIP signaling and media provider APIs

5. Compatibility

  • Supported Chips: ESP32, ESP32-S2, ESP32-S3, ESP32-P4
  • ESP-IDF Version: v5.4 or later recommended
  • Requirements:
    • PSRAM for video/audio processing
    • Compatible camera/audio drivers depending on your board

6. Obtaining v1.0.0

Users can use either of the following 2 methods to get the release code.

Using Git

git clone -b v1.0.0 https://github.com/espressif/esp-webrtc-solution.git esp-webrtc-solution-v1.0.0
cd esp-webrtc-solution-v1.0.0/

This is the recommended way of obtaining v1.0.0 of ESP WebRTC Solution.

Download an Archive

Attached to the release is an esp-webrtc-solution-v1.0.0.zip archive.
You can also download it from github directly:
esp-webrtc-solution-v1.0.0.zip


7. Getting Started

After obtaining the v1.0.0 code, try the Peer Demo or Doorbell Local for initial testing by following the README.


8. Contributing / Feedback

Your feedback is highly appreciated and helps us improve the solution!
Feel free to open an issue or submit a pull request on GitHub.