[3/4] Optimize SILK LPC synthesis#116
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #116 +/- ##
==========================================
+ Coverage 82.75% 82.80% +0.04%
==========================================
Files 26 26
Lines 5626 5652 +26
==========================================
+ Hits 4656 4680 +24
- Misses 745 746 +1
- Partials 225 226 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
RFC 6716 / 8251 conformationStatus: pass The action extracts the RFC 6716 reference implementation, applies the RFC 8251 decoder update patch, and then builds the patched reference tools. Legend: numeric cells are Inputs use the shared RFC 6716 / RFC 8251 bitstream corpus; accepted references follow RFC 8251 Section 11.
Run output |
f861bef to
7a0a95c
Compare
3845541 to
7a0a95c
Compare
There was a problem hiding this comment.
Pull request overview
This PR further optimizes the SILK decoder LPC synthesis hot path by removing remaining per-sample branching in the first-subframe case and adding a specialized wideband (16-tap) steady-state implementation.
Changes:
- Replaces dual normalized/reversed LPC coefficient prep with a single reversed+normalized coefficient array.
- Reworks first-subframe LPC synthesis to reuse the steady-state path by constructing a contiguous history+output buffer.
- Adds an unrolled 16-tap steady-state LPC synthesis fast path for wideband frames while keeping the generic loop for 10-tap cases.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
02c9e11 to
a19859e
Compare
02c9e11 to
a19859e
Compare
|
Abandoning this draft; the measured gain on the production-like 48 kHz stereo conformance decode workload is too small to carry forward. |
Summary
Reduce the remaining low-risk LPC synthesis overhead in the SILK decode hot path. This now sits directly on
mainafter #115 merged.Major changes
96as an implicit assumption.Why
After #114 and #115 remove the larger allocation and copy costs, the remaining SILK decode profile still spends visible time in the branchy first-subframe LPC reconstruction and in the generic 16-tap loop used by wideband frames.
This keeps the same floating-point decoder behavior, but removes per-tap history branching on the first subframe and lets the common 16-tap case run without the generic inner loop.
Validation
Ran:
GOCACHE=/private/tmp/opus-go-build-pr116-rebase GOLANGCI_LINT_CACHE=/private/tmp/opus-golangci-lint-pr116-rebase golangci-lint run GOCACHE=/private/tmp/opus-go-build-pr116-rebase go test ./...End-to-end stress harness only, no focused microbenchmark.