Skip to content

obs-outputs: Correct FLV CTS calculation #12151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 19, 2025

Conversation

dsaedtler
Copy link
Contributor

@dsaedtler dsaedtler commented May 13, 2025

Description

Fixes the CTS (composition timestamp/offset) calculation in OBS's native FLV muxer.

Motivation and Context

Currently the offset calculation will round down both the DTS and CTS, resulting in an off-by-one error when in the assembled PTS (DTS+CTS) timestamp.

To illustrate the problem:

def get_ms_time(ts: int, den: int):
    return ts * 1000 // den

timebase_den = 60 # 60 FPS: 1/60

# 0 b-frames
dts = pts = 120
cts = pts - dts # == 0

dts_ms = get_ms_time(dts, timebase_den) # 2000
cts_ms = get_ms_time(cts, timebase_den) # 0

pts_ms = dts_ms + cts_ms
print(pts_ms) # prints "2000"

# 1 b-frame
pts = 120
dts = pts - 1 # 1 frame of delay
cts = pts - dts # == 1

dts_ms = get_ms_time(dts, timebase_den) # 1983 (we lost 0.3̅)
cts_ms = get_ms_time(cts, timebase_den) # 16 (we lost 0.6̅)

pts_ms = dts_ms + cts_ms
print(pts_ms) # prints "1999"

However, if we convert both parts to milliseconds before calculating the difference, the math works out:

# 1 b-frame
pts = 120
dts = pts - 1 # 1 frame of delay

dts_ms = get_ms_time(dts, timebase_den) # 1983 (we lost 0.3̅)
cts_ms = get_ms_time(pts, timebase_den) - get_ms_time(dts, timebase_den) # 17 (we gained 0.3̅)

pts_ms = dts_ms + cts_ms
print(pts_ms) # prints "2000"

How Has This Been Tested?

Copious amounts of logging. Also verified that FFmpeg will produce a file with the expected PTS alignments.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My code has been run through clang-format.
  • I have read the contributing document.
  • My code is not on the master branch.
  • The code has been tested.
  • All commit messages are properly formatted and commits squashed where appropriate.
  • I have included updates to all appropriate documentation.

@dsaedtler dsaedtler force-pushed the rodney/fix-flv-cts branch from 8bdf61e to 1372148 Compare May 13, 2025 01:05
@RytoEX RytoEX added the Bug Fix Non-breaking change which fixes an issue label May 13, 2025
@RytoEX RytoEX requested a review from Lain-B May 13, 2025 01:10
@dsaedtler dsaedtler force-pushed the rodney/fix-flv-cts branch from 1372148 to 499ac38 Compare May 13, 2025 01:27
@RytoEX RytoEX requested a review from tt2468 May 13, 2025 18:06
@RytoEX
Copy link
Member

RytoEX commented May 13, 2025

cc @norihiro who may more quickly understand the math.

@stephematician
Copy link
Contributor

stephematician commented May 14, 2025

I found a few secondary sources that confirm that CTS (composition time) should be calculated as PTS - DTS - specifically in milliseconds; so these changes make sense. That said, I don't have access to a 'source of truth' for the FLV format.

Copy link
Member

@RytoEX RytoEX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the example:

def get_ms_time(ts: int, den: int):
    return ts * 1000 // den

timebase_den = 60 # 60 FPS: 1/60

# 0 b-frames
dts = pts = 120
cts = pts - dts # == 0

dts_ms = get_ms_time(dts, timebase_den) # 2000
cts_ms = get_ms_time(cts, timebase_den) # 0

The relevant OBS function get_ms_time() is:

#define MILLISECOND_DEN 1000

static int32_t get_ms_time(struct encoder_packet *packet, int64_t val)
{
	return (int32_t)(val * MILLISECOND_DEN / packet->timebase_den);
}

In the example, dts = pts = 120 and timebase_den = 60 passed as ts and den returns 2000.
In OBS' get_ms_time, packet->timebase_den = 60 and packet->pts = 120 (passed as packet and packet->pts returns 2000.
Trying to follow between the example and OBS code, I got a little confused because the parameters are swapped.

I'm trying to follow along here, but I'm having trouble finding where the math goes wrong within the changed lines. With the sample values in a mocked up test, changing between 0 and 1 b-frames, I notice that time_ms (line 311) is the value that changes from 2000 to 1999, which itself seems odd. All other values seem consistent.

@dsaedtler
Copy link
Contributor Author

dsaedtler commented May 14, 2025

I'm trying to follow along here, but I'm having trouble finding where the math goes wrong within the changed lines. With the sample values in a mocked up test, changing between 0 and 1 b-frames, I notice that time_ms (line 311) is the value that changes from 2000 to 1999, which itself seems odd. All other values seem consistent.

time_ms is the DTS (dts_ms in the example) which would end up being 1983. The value 1999 is never calculated by OBS itself, but when the receiver attempts to calculate the PTS by adding up the DTS of 1983 and the incorrectly calculated CTS of 16 it will get 1999.

In order for this to be correct, we have to convert both PTS and DTS to milliseconds first, then calculate the offset, which will give us 1983 and 17, which correctly adds up to 2000.

Edit: Also to clarify, with this change all the PTS values when b-frames are used will match the ones when no b-frames are used. Right now in OBS at 60 FPS 40 out of 60 frames in a second will have a wrong timestamp.

@RytoEX
Copy link
Member

RytoEX commented May 14, 2025

Okay, I think I follow now. Thanks.

@stephematician
Copy link
Contributor

In order for this to be correct, we have to convert both PTS and DTS to milliseconds first, then calculate the offset, which will give us 1983 and 17, which correctly adds up to 2000.

This is correct; get PTS and DTS in milliseconds first, and then the (consistent) CTS is evaluated by calculating PTS - DTS.

@norihiro
Copy link
Contributor

Using x264 encoder with bf=3, dts_offset is -33 ms so that DTS and CTS become as below.

Using the PR 12151,

  • DTS=1949 CTS=17 DTS+CTS=1966 (packet->pts is 116/60 s)
  • DTS=1966 CTS=33 DTS+CTS=1999 (packet->pts is 118/60 s)
  • DTS=1983 CTS=33 DTS+CTS=2016 (packet->pts is 119/60 s)
  • DTS=1999 CTS=34 DTS+CTS=2033 (packet->pts is 120/60 s) ... The PR changed this.
  • DTS=2016 CTS=33 DTS+CTS=2049 (packet->pts is 121/60 s)

On the master (FYI):

  • DTS=1949 CTS=33 DTS+CTS=1982
  • DTS=1966 CTS=33 DTS+CTS=1999
  • DTS=1983 CTS=33 DTS+CTS=2016
  • DTS=1999 CTS=33 DTS+CTS=2032 ... This is changed by the PR.
  • DTS=2016 CTS=33 DTS+CTS=2049

So, DTS+CTS is still 1999 instead of 2000. Is this the expected result?
If you want to get DTS+CTS=2000, probably, we should calculate dts_offset before converting to milliseconds.

@dsaedtler
Copy link
Contributor Author

dsaedtler commented May 15, 2025

Using x264 encoder with bf=3, dts_offset is -33 ms so that DTS and CTS become as below.

[snip]

So, DTS+CTS is still 1999 instead of 2000. Is this the expected result? If you want to get DTS+CTS=2000, probably, we should calculate dts_offset before converting to milliseconds.

The PR description is just an illustrative example for how the math goes wrong, not what OBS will actually produce. The DTS in FLV is an unsigned integer1, meaning that we cannot start at PTS 0 if we use b-frames because that would require the starting DTS to be negative (since DTS + CTS = 0 with a CTS > 0 requires DTS < 0). So the real-world numbers will be slightly different than the example.

The main issue that this PR solves is that the PTS for frames (but especially keyframes) between multiple renditions in multitrack does not line up when different amounts of b-frames are used (e.g. with AMD cards or NVIDIA Turing/10-series not supporting b-frames for HEVC).

To further explain the reasoning here, when we convert the timestamps to milliseconds the deltas between DTS timestamps will be 2/3 17 ms and 1/3 16 ms, which over time will average out to 16.6... ms. The same needs to apply to the CTS as it is also meant to average out to the b-frames delay (1 frame, or 16.6... ms in my 60 FPS example) over time, but currently it gets always rounded down so the average ends up being too low.

Footnotes

  1. There is some debate around this (see also: https://github.com/obsproject/obs-studio/pull/11013), since technically the FLV spec says it can be signed, but RTMP specifies it as unsigned. Currently, every implementation I'm aware of treats it as an unsigned integer and offsets the DTS to start at 0.

@dsaedtler
Copy link
Contributor Author

dsaedtler commented May 15, 2025

Here's a real world sample of timestamps that show the problem with a multitrack output that intentionally has different b-frame numbers configured as follows (from track 0 to 3): 0, 1, 2, 3

Without the fix:

--- initial keyframe:
track: 0, PTS: 50 (DTS: 50, CTS: 0, DTS Offset: -50)
track: 1, PTS: 50 (DTS: 34, CTS: 16, DTS Offset: -50)
track: 2, PTS: 50 (DTS: 17, CTS: 33, DTS Offset: -50)
track: 3, PTS: 50 (DTS: 0, CTS: 50, DTS Offset: -50)
--- 2 seconds later, next keyframe:
track: 0, PTS: 2050 (DTS: 2050, CTS: 0, DTS Offset: -50)
track: 1, PTS: 2049 (DTS: 2033, CTS: 16, DTS Offset: -50) -- off-by-one
track: 2, PTS: 2049 (DTS: 2016, CTS: 33, DTS Offset: -50) -- off-by-one
track: 3, PTS: 2050 (DTS: 2000, CTS: 50, DTS Offset: -50)

With the fix:

--- initial keyframe:
track: 0, PTS: 50 (DTS: 50, CTS: 0, DTS Offset: -50)
track: 1, PTS: 50 (DTS: 34, CTS: 16, DTS Offset: -50)
track: 2, PTS: 50 (DTS: 17, CTS: 33, DTS Offset: -50)
track: 3, PTS: 50 (DTS: 0, CTS: 50, DTS Offset: -50)
--- 2 seconds later, next keyframe
track: 0, PTS: 2050 (DTS: 2050, CTS: 0, DTS Offset: -50)
track: 1, PTS: 2050 (DTS: 2033, CTS: 17, DTS Offset: -50) -- corrected
track: 2, PTS: 2050 (DTS: 2016, CTS: 34, DTS Offset: -50) -- corrected
track: 3, PTS: 2050 (DTS: 2000, CTS: 50, DTS Offset: -50)

@norihiro
Copy link
Contributor

Thank you for the explanation.
If I understand correctly, the motivation is matching PTS values across tracks for the same timing. If that's the case, this math looks good to me.

The conversion to milliseconds rounds down. Rounding down DTS and CT
values independently results in a cumulative error that results in the
PTS (DTS + CT) being incorrect (off-by-one).
@dsaedtler dsaedtler force-pushed the rodney/fix-flv-cts branch from 499ac38 to 0778a7e Compare May 15, 2025 16:18
@RytoEX RytoEX self-assigned this May 19, 2025
@RytoEX RytoEX added this to the OBS Studio 31.1 milestone May 19, 2025
@RytoEX RytoEX merged commit 8e129b1 into obsproject:master May 19, 2025
15 checks passed
@dsaedtler dsaedtler deleted the rodney/fix-flv-cts branch May 19, 2025 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Fix Non-breaking change which fixes an issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants