Optimize CELT inverse transform#119
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #119 +/- ##
==========================================
+ Coverage 82.75% 82.89% +0.13%
==========================================
Files 26 26
Lines 5626 5706 +80
==========================================
+ Hits 4656 4730 +74
- Misses 745 750 +5
- Partials 225 226 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
RFC 6716 / 8251 conformationStatus: pass The action extracts the RFC 6716 reference implementation, applies the RFC 8251 decoder update patch, and then builds the patched reference tools. Legend: numeric cells are Inputs use the shared RFC 6716 / RFC 8251 bitstream corpus; accepted references follow RFC 8251 Section 11.
Run output |
There was a problem hiding this comment.
Pull request overview
This PR focuses on improving CELT (and some hybrid) decode performance by reusing scratch buffers across frames and replacing the inverse complex DFT used in the IMDCT path with a planned mixed-radix inverse FFT implementation, while keeping the transform code in pure Go.
Changes:
- Introduces persistent per-decoder scratch buffers to reduce per-frame allocations in CELT synthesis and postfiltering.
- Replaces the inverse complex DFT inner step with a mixed-radix recursive inverse FFT using precomputed twiddle/rotation tables.
- Reuses scratch buffers for hybrid SILK internal/resampled PCM buffers to reduce allocations in hybrid decode.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| internal/celt/synthesis.go | Adds decoder scratch structs, precomputed inverse-transform plans, and replaces inverse complex DFT with a mixed-radix inverse FFT; switches synthesis to reuse scratch buffers. |
| internal/celt/decoder.go | Adds a lazily allocated scratch buffer to the CELT decoder and updates decode paths to reuse it. |
| internal/celt/celt.go | Introduces maxFrameSampleCount constant to size reusable CELT scratch buffers. |
| decoder.go | Reuses hybrid SILK buffers across frames via a shared resize helper to avoid allocations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // inverseComplexDFT is the complex inverse transform used by the current IMDCT | ||
| // implementation. It is kept separate so a later FFT implementation can replace | ||
| // this step without changing the surrounding RFC 6716 Section 4.3.7 mapping. | ||
| func inverseComplexDFT(in []complex32) []complex32 { | ||
| n := len(in) | ||
| out := make([]complex32, n) | ||
| for k := range n { | ||
| sumR := float32(0) | ||
| sumI := float32(0) | ||
| for m, value := range in { | ||
| angle := 2 * math.Pi * float64(k*m) / float64(n) | ||
| cosine := float32(math.Cos(angle)) | ||
| sine := float32(math.Sin(angle)) | ||
| sumR += value.r*cosine - value.i*sine | ||
| sumI += value.r*sine + value.i*cosine | ||
| out := make([]complex32, len(in)) | ||
| work := make([]complex32, len(in)) | ||
| inverseComplexDFTInto(in, out, work, inverseTransformPlanForFrameSampleCount(len(in)*2)) | ||
|
|
||
| return out | ||
| } | ||
|
|
||
| func inverseComplexDFTInto(in []complex32, out []complex32, work []complex32, plan *inverseTransformPlan) { | ||
| inverseComplexFFTRecursive(in, 1, out, work, len(in), plan) | ||
| } |
Summary
Performance
Production-like decode-only conformance harness: RFC8251 packets, 48 kHz stereo, verification hidden,
OPUS_STRESS_REPEATS=1.main: about 799 packets/s median from the earlier run-benchtime=3x -count=5This PR is the first CELT optimization split. A follow-up CWRS/PVQ decode optimization is intentionally left out of this draft.
Validation
go test ./...go test -tags conformance -run 'TestRFC6716Conformance/vectors/rate_48000/channels_2' -count=1 -parallel=1 .