Skip to content

Optimize CELT inverse transform#119

Draft
zshang-oai wants to merge 1 commit into
mainfrom
codex/optimize-celt-inverse-transform
Draft

Optimize CELT inverse transform#119
zshang-oai wants to merge 1 commit into
mainfrom
codex/optimize-celt-inverse-transform

Conversation

@zshang-oai
Copy link
Copy Markdown
Contributor

Summary

  • reuse CELT and hybrid decode scratch buffers across frames
  • replace the recursive inverse complex DFT with a preplanned mixed-radix inverse FFT
  • keep the transform in Go with precomputed CELT twiddles and bit-reversal tables

Performance

Production-like decode-only conformance harness: RFC8251 packets, 48 kHz stereo, verification hidden, OPUS_STRESS_REPEATS=1.

  • current main: about 799 packets/s median from the earlier run
  • this PR: about 20,424 packets/s median with -benchtime=3x -count=5
  • libopus/hraban reference: about 60,405 packets/s median from the earlier run

This PR is the first CELT optimization split. A follow-up CWRS/PVQ decode optimization is intentionally left out of this draft.

Validation

  • go test ./...
  • go test -tags conformance -run 'TestRFC6716Conformance/vectors/rate_48000/channels_2' -count=1 -parallel=1 .

@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

❌ Patch coverage is 93.44262% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.89%. Comparing base (ee1d1de) to head (b8af911).

Files with missing lines Patch % Lines
internal/celt/decoder.go 69.23% 4 Missing ⚠️
internal/celt/synthesis.go 96.07% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #119      +/-   ##
==========================================
+ Coverage   82.75%   82.89%   +0.13%     
==========================================
  Files          26       26              
  Lines        5626     5706      +80     
==========================================
+ Hits         4656     4730      +74     
- Misses        745      750       +5     
- Partials      225      226       +1     
Flag Coverage Δ
go 82.89% <93.44%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown

RFC 6716 / 8251 conformation

Status: pass

The action extracts the RFC 6716 reference implementation, applies the RFC 8251 decoder update patch, and then builds the patched reference tools.

Legend: numeric cells are opus_compare quality percentages; FAIL means the vector did not pass.

Inputs use the shared RFC 6716 / RFC 8251 bitstream corpus; accepted references follow RFC 8251 Section 11.

rate ch 01 02 03 04 05 06 07 08 09 10 11 12
8000 1 91.4 59.7 66.3 75.1 75.0 67.8 76.0 70.0 75.5 85.9 91.0 43.4
8000 2 93.3 57.6 66.1 75.3 75.2 67.9 76.0 70.4 76.2 86.0 93.0 43.7
12000 1 95.6 83.4 71.8 79.1 77.0 69.0 85.1 81.6 84.8 88.1 94.9 66.0
12000 2 96.0 83.3 71.3 79.2 77.3 69.1 85.1 81.8 85.2 87.0 95.8 66.1
16000 1 95.3 91.4 88.1 81.6 77.2 68.9 89.9 86.2 78.8 89.5 96.3 56.5
16000 2 94.7 90.7 88.1 80.6 77.6 69.1 89.8 87.6 78.9 87.5 96.4 56.7
24000 1 96.7 92.0 83.2 85.9 77.5 68.4 93.9 92.4 89.2 95.4 97.9 68.5
24000 2 96.8 90.6 82.8 86.1 77.8 68.8 93.9 93.5 92.1 87.7 98.1 68.6
48000 1 98.4 92.1 87.7 85.9 77.4 68.3 98.1 96.2 95.9 96.0 98.4 88.8
48000 2 99.8 90.6 87.8 86.1 77.7 68.6 99.6 93.7 94.4 87.7 99.7 88.9
Run output
=== CONT  TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector12
TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector04: Opus quality metric: 80.6 %
=== CONT  TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector11
TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector02: Opus quality metric: 90.7 %
=== CONT  TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector10
TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector12: Opus quality metric: 56.5 %
=== CONT  TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector09
TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector11: Opus quality metric: 96.3 %
=== CONT  TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector03
TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector01: Opus quality metric: 94.7 %
=== CONT  TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector07
TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector03: Opus quality metric: 88.1 %
=== CONT  TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector06
TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector09: Opus quality metric: 78.8 %
=== CONT  TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector05
TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector10: Opus quality metric: 89.5 %
=== CONT  TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector04
TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector07: Opus quality metric: 89.9 %
=== CONT  TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector12
TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector06: Opus quality metric: 68.9 %
=== CONT  TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector02
TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector04: Opus quality metric: 81.6 %
=== CONT  TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector01
TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector05: Opus quality metric: 77.2 %
=== CONT  TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector11
TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector02: Opus quality metric: 91.4 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector12
TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector01: Opus quality metric: 95.3 %
=== CONT  TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector07
TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector12: Opus quality metric: 66.1 %
=== CONT  TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector06
TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector11: Opus quality metric: 95.8 %
=== CONT  TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector05
TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector07: Opus quality metric: 98.1 %
=== CONT  TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector04
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector12: Opus quality metric: 68.6 %
=== CONT  TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector03
TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector06: Opus quality metric: 68.3 %
=== CONT  TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector02
TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector05: Opus quality metric: 77.4 %
=== CONT  TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector01
TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector03: Opus quality metric: 87.7 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector08
TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector04: Opus quality metric: 85.9 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector11
TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector02: Opus quality metric: 92.1 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector10
TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector01: Opus quality metric: 98.4 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector09
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector08: Opus quality metric: 93.5 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector06
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector11: Opus quality metric: 98.1 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector07
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector10: Opus quality metric: 87.7 %
=== CONT  TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector10
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector09: Opus quality metric: 92.1 %
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector06: Opus quality metric: 68.8 %
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector07: Opus quality metric: 93.9 %
TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector10: Opus quality metric: 87.0 %
Opus conformance matrix
Legend: numeric cells are opus_compare quality percentages; FAIL means the vector did not pass.
Inputs use the shared RFC 6716 / RFC 8251 bitstream corpus; accepted references follow RFC 8251 Section 11.
+----------+----+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| rate     | ch | 01    | 02    | 03    | 04    | 05    | 06    | 07    | 08    | 09    | 10    | 11    | 12    |
+----------+----+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| 8000     | 1  | 91.4  | 59.7  | 66.3  | 75.1  | 75.0  | 67.8  | 76.0  | 70.0  | 75.5  | 85.9  | 91.0  | 43.4  |
| 8000     | 2  | 93.3  | 57.6  | 66.1  | 75.3  | 75.2  | 67.9  | 76.0  | 70.4  | 76.2  | 86.0  | 93.0  | 43.7  |
| 12000    | 1  | 95.6  | 83.4  | 71.8  | 79.1  | 77.0  | 69.0  | 85.1  | 81.6  | 84.8  | 88.1  | 94.9  | 66.0  |
| 12000    | 2  | 96.0  | 83.3  | 71.3  | 79.2  | 77.3  | 69.1  | 85.1  | 81.8  | 85.2  | 87.0  | 95.8  | 66.1  |
| 16000    | 1  | 95.3  | 91.4  | 88.1  | 81.6  | 77.2  | 68.9  | 89.9  | 86.2  | 78.8  | 89.5  | 96.3  | 56.5  |
| 16000    | 2  | 94.7  | 90.7  | 88.1  | 80.6  | 77.6  | 69.1  | 89.8  | 87.6  | 78.9  | 87.5  | 96.4  | 56.7  |
| 24000    | 1  | 96.7  | 92.0  | 83.2  | 85.9  | 77.5  | 68.4  | 93.9  | 92.4  | 89.2  | 95.4  | 97.9  | 68.5  |
| 24000    | 2  | 96.8  | 90.6  | 82.8  | 86.1  | 77.8  | 68.8  | 93.9  | 93.5  | 92.1  | 87.7  | 98.1  | 68.6  |
| 48000    | 1  | 98.4  | 92.1  | 87.7  | 85.9  | 77.4  | 68.3  | 98.1  | 96.2  | 95.9  | 96.0  | 98.4  | 88.8  |
| 48000    | 2  | 99.8  | 90.6  | 87.8  | 86.1  | 77.7  | 68.6  | 99.6  | 93.7  | 94.4  | 87.7  | 99.7  | 88.9  |
+----------+----+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
--- PASS: TestRFC6716Conformance (101.22s)
    --- PASS: TestRFC6716Conformance/vectors (0.00s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector01 (1.87s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector09 (3.45s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector11 (3.52s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector05 (3.95s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector08 (3.23s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector07 (2.71s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector06 (3.01s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector05 (3.26s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector04 (2.96s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector03 (2.34s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector02 (2.75s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector12 (1.53s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector11 (1.90s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector01 (3.63s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector10 (2.17s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector09 (1.88s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector08 (1.68s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector07 (1.40s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector06 (1.63s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector05 (1.71s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector03 (1.19s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector04 (1.54s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector02 (1.44s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector01 (2.01s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector12 (2.88s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector10 (3.91s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector09 (3.39s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector08 (3.11s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector07 (2.61s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector06 (2.93s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector05 (3.14s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector04 (2.84s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector03 (2.26s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector12 (1.47s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector11 (1.84s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector02 (2.66s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector01 (3.53s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector08 (1.63s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector10 (2.13s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector09 (1.84s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector07 (1.37s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector06 (1.52s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector04 (1.47s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector05 (1.64s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector03 (1.17s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector02 (1.35s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector07 (2.87s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector03 (2.84s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector04 (3.60s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector02 (3.36s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector12 (1.89s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector11 (2.26s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector01 (4.30s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector10 (2.58s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector09 (2.30s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector08 (1.98s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector07 (1.72s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector06 (1.90s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector04 (1.85s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector05 (2.03s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector03 (1.45s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector02 (1.71s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector01 (2.27s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector12 (3.18s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector11 (3.83s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector10 (4.23s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector09 (3.68s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector08 (3.40s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector08 (2.98s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector12 (5.51s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector11 (6.48s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector09 (6.07s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector10 (7.04s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector08 (5.81s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector07 (4.83s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector06 (5.42s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector05 (5.88s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector03 (4.37s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector04 (5.49s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector02 (5.15s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector12 (2.80s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector11 (3.32s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector01 (6.43s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector08 (1.78s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector10 (3.71s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector09 (3.31s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector06 (3.16s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector03 (2.48s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector05 (3.41s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector04 (3.13s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector02 (2.91s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector12 (1.62s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector11 (2.02s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector01 (3.82s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector03 (1.27s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector09 (2.02s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector10 (2.27s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector07 (1.51s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector06 (1.66s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector04 (1.61s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector05 (1.78s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector02 (1.48s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector01 (2.03s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector12 (2.98s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector11 (3.66s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector07 (2.51s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector12 (3.60s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector06 (2.79s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector05 (3.01s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector03 (2.22s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector04 (2.79s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector02 (2.62s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector01 (3.36s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector08 (3.88s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector11 (4.34s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector10 (4.77s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector09 (4.16s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector06 (3.62s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector07 (3.04s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector10 (3.34s)
PASS
ok  	github.com/pion/opus	101.225s

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on improving CELT (and some hybrid) decode performance by reusing scratch buffers across frames and replacing the inverse complex DFT used in the IMDCT path with a planned mixed-radix inverse FFT implementation, while keeping the transform code in pure Go.

Changes:

  • Introduces persistent per-decoder scratch buffers to reduce per-frame allocations in CELT synthesis and postfiltering.
  • Replaces the inverse complex DFT inner step with a mixed-radix recursive inverse FFT using precomputed twiddle/rotation tables.
  • Reuses scratch buffers for hybrid SILK internal/resampled PCM buffers to reduce allocations in hybrid decode.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
internal/celt/synthesis.go Adds decoder scratch structs, precomputed inverse-transform plans, and replaces inverse complex DFT with a mixed-radix inverse FFT; switches synthesis to reuse scratch buffers.
internal/celt/decoder.go Adds a lazily allocated scratch buffer to the CELT decoder and updates decode paths to reuse it.
internal/celt/celt.go Introduces maxFrameSampleCount constant to size reusable CELT scratch buffers.
decoder.go Reuses hybrid SILK buffers across frames via a shared resize helper to avoid allocations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 532 to +545
// inverseComplexDFT is the complex inverse transform used by the current IMDCT
// implementation. It is kept separate so a later FFT implementation can replace
// this step without changing the surrounding RFC 6716 Section 4.3.7 mapping.
func inverseComplexDFT(in []complex32) []complex32 {
n := len(in)
out := make([]complex32, n)
for k := range n {
sumR := float32(0)
sumI := float32(0)
for m, value := range in {
angle := 2 * math.Pi * float64(k*m) / float64(n)
cosine := float32(math.Cos(angle))
sine := float32(math.Sin(angle))
sumR += value.r*cosine - value.i*sine
sumI += value.r*sine + value.i*cosine
out := make([]complex32, len(in))
work := make([]complex32, len(in))
inverseComplexDFTInto(in, out, work, inverseTransformPlanForFrameSampleCount(len(in)*2))

return out
}

func inverseComplexDFTInto(in []complex32, out []complex32, work []complex32, plan *inverseTransformPlan) {
inverseComplexFFTRecursive(in, 1, out, work, len(in), plan)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants