Skip to content

Conversation

j0sh
Copy link
Collaborator

@j0sh j0sh commented Jul 31, 2025

NB: This requires golang 1.25; it is probably best to merge this PR first and let it soak on staging for a bit.


Adds a native WHEP server to the gateway for playback. This
offers:

  * Immediate connectivity. WHEP playback can begin as soon as
    the WHIP response is received, rather than having to poll
    the WHEP endpoint repeatedly until the output is available.
    The WHEP response will simply hang until the output is ready.

  * Faster startup times, especially in no-audio cases. We use the
    mpegts PMT to know immediately which tracks are available. This
    avoids having to probe several seconds for an audio track that
    may never come.

  * Removes the requirement to have a MediaMTX server in the mix.
    This can substantially simplify the infrastructure requirements
    for many deployments. However, MediaMTX would still be needed for
    RTMP ingest or non-WHEP playback.

  * The promise of much finer control over tracks and stats reports.

/// Detailed Changes

  * Adds a WHEP server. Supports video (H.264) and audio (Opus).

  * Adds a mpegts-to-RTP timestamp converter to handle roll-overs
    between 33-bit mpegts (ugh) and 32-bit RTP timestamps.

  * Adds the output RingBuffer to the LivePipelines struct so it can be
    accessible via the WHEP handler. If the output is not yet ready,
    the WHEP handler waits until it is (via condvar). This allows WHEP
    sessions to be set up immediately as soon as ingest begins,
    regardless of whether the output is ready yet.

  * Updates the output RingBuffer to follow the io.Closer interface.
    Needed so we close the output buffer instead of writing to it
    indefinitely after a WHEP peerconnection goes away.

  * Off by default. The `LIVE_AI_WHEP_ADDR` environment variable
    needs to be set with at least a port number. This should be
    different from the WHIP port at `LIVE_AI_WHIP_ADDR`.

  * MediaMTX playback is still available; nothing changes there.

  * The swapping behavior does not change for now. If we swap
    orchestrators, the WHEP peerconnection will close and the client
    will have to re-connect WHEP (swaps do not affect ingest).

/// Rollout Plan

  * Enable on staging via configuring a port via the
    `LIVE_AI_WHEP_ADDR` env var. Update `LIVE_AI_WHEP_URL` to
    point to the gateway instead of MediaMTX.

  * Leave things on staging for a couple weeks

  * When we are satisfied with staging, enable on production.

Note that we can always enable WHEP on the gateway for manual testing
without making it the playback default as long as the `LIVE_AI_WHEP_URL`
env var points to MediaMTX.

@github-actions github-actions bot added dependencies Pull requests that update a dependency file go Pull requests that update Go code AI Issues and PR related to the AI-video branch. labels Jul 31, 2025
Adds a native WHEP server to the gateway for playback. This
offers:

  * Immediate connectivity. WHEP playback can begin as soon as
    the WHIP response is received, rather than having to poll
    the WHEP endpoint repeatedly until the output is available.
    The WHEP response will simply hang until the output is ready.

  * Faster startup times, especially in no-audio cases. We use the
    mpegts PMT to know immediately which tracks are available. This
    avoids having to probe several seconds for an audio track that
    may never come.

  * Removes the requirement to have a MediaMTX server in the mix.
    This can substantially simplify the infrastructure requirements
    for many deployments. However, MediaMTX would still be needed for
    RTMP ingest or non-WHEP playback.

  * The promise of much finer control over tracks and stats reports.

/// Detailed Changes

  * Adds a WHEP server. Supports video (H.264) and audio (Opus).

  * Adds a mpegts-to-RTP timestamp converter to handle roll-overs
    between 33-bit mpegts (ugh) and 32-bit RTP timestamps.

  * Adds the output RingBuffer to the LivePipelines struct so it can be
    accessible via the WHEP handler. If the output is not yet ready,
    the WHEP handler waits until it is (via condvar). This allows WHEP
    sessions to be set up immediately as soon as ingest begins,
    regardless of whether the output is ready yet.

  * Updates the output RingBuffer to follow the io.Closer interface.
    Needed so we close the output buffer instead of writing to it
    indefinitely after a WHEP peerconnection goes away.

  * Off by default. The `LIVE_AI_WHEP_ADDR` environment variable
    needs to be set with at least a port number. This should be
    different from the WHIP port at `LIVE_AI_WHIP_ADDR`.

  * MediaMTX playback is still available; nothing changes there.

  * The swapping behavior does not change for now. If we swap
    orchestrators, the WHEP peerconnection will close and the client
    will have to re-connect WHEP (swaps do not affect ingest).

/// Rollout Plan

  * Enable on staging via configuring a port via the
    `LIVE_AI_WHEP_ADDR` env var. Update `LIVE_AI_WHEP_URL` to
    point to the gateway instead of MediaMTX.

  * Leave things on staging for a couple weeks

  * When we are satisfied with staging, enable on production.

Note that we can always enable WHEP on the gateway for manual testing
without making it the playback default as long as the `LIVE_AI_WHEP_URL`
env var points to MediaMTX.
@j0sh j0sh marked this pull request as ready for review September 18, 2025 01:27
@j0sh j0sh requested review from leszko, mjh1 and victorges September 18, 2025 01:27
Copy link
Member

@victorges victorges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sending partial

u.initialized = true
}
u.lastRaw = rawPts
return uint32(rawPts + u.wrapCount*mpegtsMaxPts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this overflow as soon as u.wrapCount > 0? Am I missing sth or maybe you need a % with sth?

https://go.dev/play/p/dVuB5eJTbbQ

./prog.go:8:21: constant 8589963326 overflows uint32

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would overflow a uint32 by wrapping back to zero. The main problem is there would be two discontinuous overflows: first when it overflows 32 bits during the cast to RTP timestamp, then again when the mpegts timestamp overflows 33 bits. So this bit of plumbing is to make those timestamps consistent.

The unit tests aren't the best at describing this intended behavior right now, so I'll add some comments here and rework the tests a bit so things are clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I assume we want it to overflow and "mod" to a value that fits into a uint32, instead of a panic. How come this doesn't panic like the playground snippet I shared?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, probably some compile-time constant propagation + bounds checking. With real variables this doesn't happen, maybe because all the intermediate types are int64 until the last minute. The uint32 conversion does mod the value during overflows.

Here is the conversion code with a simplified set of test cases that shows the overflow behavior: https://go.dev/play/p/_6NNdC7LbSN

BTW had GPT math this out and seems like the explicit wraparound tracking isn't actually necessary since 2^33 - 2^32 = 2^32 ... so maybe we can just do a straight cast from the int64 timestamp to uint32 unless we want to also track the actual timestamps for some reason, idk

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the timestamp handling for now in favor of a straight cast to unit32. If we ever need explicit rollover tracking then we can add something similar back in. 935c462

}

func (s *WHEPServer) CreateWHEP(ctx context.Context, w http.ResponseWriter, r *http.Request, mediaReader io.ReadCloser) {
clog.Info(ctx, "creating whep")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some more fields from the req?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do try to add the stream key, etc to the context a few lines down ... I can reshuffle so we validate and add that before printing this. What other fields would you add to the context?

I'd avoid things like the user-agent or IP; those can be printed once but doesn't need to be on every log line for this request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing specific that I was thinking of, just thought that a log with only "creating whep" wouldn't be very useful

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's useful because it is the first log line that tells us there's a new WHEP request coming in, similar to this one for WHIP. Added the user-agent and IP to this log line in 556bd84

http.Error(w, "Failed to create PeerConnection", http.StatusInternalServerError)
return
}
peerConnection.OnICEConnectionStateChange(func(connectionState webrtc.ICEConnectionState) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stopped review here, will continue monday

The golang 1.25 update turned out into something of an ordeal, so
splitting it up from #3739
to allow for better review.

While everything compiled in 1.25, some tests and CI steps broke.

* Go lint needed to be updated to v2+ to support golang 1.25
* Bumping go vet from v1 to v2 entailed a bunch of CLI changes.
  Tried to make those 1-to-1 as much as possible, however:

  Go fmt now needs to be invoked separately for golint. We check
  go fmt in the CI step right before lint ... so just remove the
  go fmt step?

  The revive linter flagged a ton of missing comments. But it
  seems the linter only applies to the `pm` and `verification`
  packages which hardly change. Rather than update those packages,
  remove the use of the `revive` linter.

* Some linting step (not sure which one? vet?) now complains
  about format string functions being used incorrectly, eg
  using a variable in place of a format string literal.
  Those were fixed in the code.

* The `rand.Seed` function apparently became a no-op as of
  golang 1.24 and that broke most of our tests that use the RNG.
  Removed the init() function that sets the seed, and added a
  package level RNG context (using the same seed as the old init)
  and updated any failing tests to use that context.

  Not all uses of the RNG were updated, just ones that broke tests.
  The global RNG is safely initialized so we can continue to use it
  by default in the code that still uses it without test coverage.
Copy link
Member

@victorges victorges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if err != nil {
clog.InfofErr(ctx, "Error creating track for h264", err)
http.Error(w, "Error creating track for h264", http.StatusInternalServerError)
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no context: I wonder if we need to be closing the pion "session" on these early returns? (if it's not automatic or sth)
Just in case not to leave the client hanging. Could have some kind of defer that automatically closes on errors idk

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By 'the client' I presume you mean the webrtc client making the WHEP request, eg the browser.

At this point in time, the HTTP error return is enough to close the client since it won't receive a valid WHEP response to fully complete its own setup.

The Pion session (mostly ICE) will terminate eventually if that's active (not sure if it actually is at this point) but yes it is probably a good idea to close the it ... 826e240

hasVideo = true

case *mpegts.CodecOpus:
webrtcTrack, err := NewLocalVideoTrack(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

misnamed struct? NewLocalVideoTrack

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed in 935c462

Comment on lines 133 to 142
if !hasAudio && !hasVideo {
clog.InfofErr(ctx, "No audio or video in media stream", errors.New("no audio or video"))
http.Error(w, "No audio or video in media stream", http.StatusInternalServerError)
return
} else if !hasVideo {
clog.Info(ctx, "No video in output")
} else if !hasAudio {
clog.Info(ctx, "No audio in output")
}
clog.Info(ctx, "Outputs", " tracks", trackCodecs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now should we always err if !hasVideo and only log if no audio? Or maybe just log both props, together in the log below

Suggested change
if !hasAudio && !hasVideo {
clog.InfofErr(ctx, "No audio or video in media stream", errors.New("no audio or video"))
http.Error(w, "No audio or video in media stream", http.StatusInternalServerError)
return
} else if !hasVideo {
clog.Info(ctx, "No video in output")
} else if !hasAudio {
clog.Info(ctx, "No audio in output")
}
clog.Info(ctx, "Outputs", " tracks", trackCodecs)
if !hasVideo {
clog.InfofErr(ctx, "No video in media stream", errors.New("no video"))
http.Error(w, "No video in media stream", http.StatusInternalServerError)
return
}
clog.Info(ctx, "Outputs", "hasVideo", hasVideo, "hasAudio, hasAudio, "tracks", trackCodecs)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to keep it going for output even if there's no video, but consolidated the logging in 556bd84


writeErrs := []error{}
for _, p := range packets {
p.Timestamp = ts // ¯\_(ツ)_/¯
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this (comment) mean

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved the comment in 935c462 .. basically all the LocalTrack cruft here is for this one single line

stream, requestID := params.liveParams.stream, params.liveParams.requestID
sess, exists := params.node.LivePipelines[stream]
if !exists || sess.RequestID != requestID {
clog.Info(ctx, "Did not set output writer due to nonexistent stream or mismatched request ID")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have the check and log, maybe specify in the log which one happened?

Suggested change
clog.Info(ctx, "Did not set output writer due to nonexistent stream or mismatched request ID")
clog.Info(ctx, "Did not set output writer due to nonexistent stream or mismatched request ID", "exists", exists)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved the logging in 556bd84

module github.com/livepeer/go-livepeer

go 1.23.2
go 1.25.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess this will be already in main

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's in main now

Copy link

codecov bot commented Oct 15, 2025

Codecov Report

❌ Patch coverage is 2.88462% with 303 lines in your changes missing coverage. Please review.
✅ Project coverage is 31.52753%. Comparing base (0b54a8e) to head (575aa2e).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
media/whep_server.go 0.00000% 241 Missing ⚠️
server/ai_mediaserver.go 0.00000% 29 Missing and 1 partial ⚠️
server/ai_live_video.go 0.00000% 29 Missing ⚠️
media/whip_server.go 0.00000% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                 @@
##              master       #3691         +/-   ##
===================================================
- Coverage   31.71055%   31.52753%   -0.18302%     
===================================================
  Files            158         159          +1     
  Lines          47593       47901        +308     
===================================================
+ Hits           15092       15102         +10     
- Misses         31613       31910        +297     
- Partials         888         889          +1     
Files with missing lines Coverage Δ
core/livepeernode.go 60.22099% <ø> (ø)
media/ring.go 100.00000% <100.00000%> (ø)
media/whip_server.go 0.00000% <0.00000%> (ø)
server/ai_live_video.go 0.00000% <0.00000%> (ø)
server/ai_mediaserver.go 6.65529% <0.00000%> (-0.17483%) ⬇️
media/whep_server.go 0.00000% <0.00000%> (ø)

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b54a8e...575aa2e. Read the comment docs.

Files with missing lines Coverage Δ
core/livepeernode.go 60.22099% <ø> (ø)
media/ring.go 100.00000% <100.00000%> (ø)
media/whip_server.go 0.00000% <0.00000%> (ø)
server/ai_live_video.go 0.00000% <0.00000%> (ø)
server/ai_mediaserver.go 6.65529% <0.00000%> (-0.17483%) ⬇️
media/whep_server.go 0.00000% <0.00000%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@j0sh j0sh enabled auto-merge (squash) October 15, 2025 15:49
@j0sh j0sh merged commit f56f5d5 into master Oct 15, 2025
15 of 16 checks passed
@j0sh j0sh deleted the ja/new-whep branch October 15, 2025 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Issues and PR related to the AI-video branch. dependencies Pull requests that update a dependency file go Pull requests that update Go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants