[2/3] Reduce SILK decode hot-path copies by zshang-oai · Pull Request #115 · pion/opus

zshang-oai · 2026-05-18T23:15:47Z

Summary

Reduce the remaining low-risk scalar overhead in the SILK decode hot path. This is the follow-up to #114, now rebased onto current main.

Major changes

Packet and output staging

For Opus code-0 packets, reuse a decoder-owned [1][]byte instead of building a fresh [][]byte through parsePacketFrames.
For mono SILK resampling, skip the deinterleave/reinterleave scratch path and resample the already-contiguous samples directly.
In decodeToFloat32, write decoded/resampled SILK output directly into the caller buffer when the decoded channel layout already matches and there are no SILK/CELT redundancy fades to apply. The old path always staged through resampleBuffer and copied out afterward.

Float32 to s16 packing

Replace math.Min/math.Max clamping in Float32ToSigned16 with simple branches.
Add a resampleCount == 1 fast path in ConvertFloat32LittleEndianToSigned16LittleEndian, avoiding the general nested-loop path for the common case here.

LPC synthesis

Normalize LPC coefficients once per synthesis call instead of dividing inside the sample/coefficient loop.
Split first-subframe handling from steady-state subframes.
In the steady-state path, use contiguous LPC history slices to avoid repeated branchy “current subframe vs previous frame vs zero” lookups for every coefficient.
Move previousFrameLPCValues handoff out of the per-sample hot loop.

Why

After #114 removed most repeated SILK scratch allocations, the remaining local stress cost was mostly scalar copy and output staging work around the decode path.

This PR keeps the decode algorithm unchanged and removes avoidable hot-path movement between temporary buffers.

Validation

Ran:

GOCACHE=/private/tmp/opus-go-build GOLANGCI_LINT_CACHE=/private/tmp/opus-golangci-lint golangci-lint run
GOCACHE=/private/tmp/opus-go-build go test ./...

End-to-end stress benchmark only, no focused microbenchmarks:

go test -run '^$' -bench 'BenchmarkPionDecodeSerial$' -benchmem -benchtime=5s -count=1

Observed locally on Apple M4 Max, darwin/arm64:

Branch	Throughput	Allocations
`main` after #114	~54.8k packets/s	4096 allocs/op
This PR	~95.4k packets/s	0 allocs/op

codecov · 2026-05-18T23:17:01Z

Codecov Report

❌ Patch coverage is 86.33094% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.41%. Comparing base (f682a5f) to head (7a0a95c).

Files with missing lines	Patch %	Lines
decoder.go	78.37%	12 Missing and 4 partials ⚠️
internal/bitdepth/bitdepth.go	76.92%	1 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #115      +/-   ##
==========================================
- Coverage   82.56%   82.41%   -0.15%     
==========================================
  Files          22       22              
  Lines        4742     4828      +86     
==========================================
+ Hits         3915     3979      +64     
- Misses        635      654      +19     
- Partials      192      195       +3

Flag	Coverage Δ
go	`82.41% <86.33%> (-0.15%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR reduces scalar overhead and avoids unnecessary buffer copies/allocation in the Opus SILK decode hot path, aiming to improve throughput while keeping the decode algorithm behavior unchanged.

Changes:

Avoid per-packet [][]byte allocation for Code 0 packets by reusing a decoder-owned single-frame holder.
Reduce SILK resampling/copy staging by writing directly into caller output when safe, and add a mono resample fast path.
Optimize hot loops in LPC synthesis and float32→s16 packing by reducing per-sample work and adding common-case fast paths.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`internal/silk/decoder.go`	Refactors LPC synthesis to normalize coefficients once per call and uses a steady-state history slice to reduce per-sample branching.
`internal/bitdepth/bitdepth.go`	Adds a `resampleCount==1` fast path and replaces min/max clamp calls with branches for float→int16 conversion.
`decoder.go`	Reduces staging/copies in SILK decode output handling, adds mono resample shortcut, and reuses a single-frame slice for Code 0 packets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

zshang-oai mentioned this pull request May 18, 2026

Reduce SILK decode hot-path copies zshang-oai/opus#2

Closed

zshang-oai force-pushed the codex/reduce-silk-decode-hot-path branch 2 times, most recently from 20933b3 to f861bef Compare May 20, 2026 23:12

zshang-oai marked this pull request as ready for review May 20, 2026 23:39

zshang-oai requested review from JoTurk, Sean-Der and Copilot May 20, 2026 23:41

Copilot started reviewing on behalf of zshang-oai May 20, 2026 23:42 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

Comment thread decoder.go

Comment thread decoder.go

zshang-oai mentioned this pull request May 20, 2026

[3/4] Optimize SILK LPC synthesis #116

Closed

Reduce SILK decode hot-path copies

7a0a95c

zshang-oai force-pushed the codex/reduce-silk-decode-hot-path branch from f861bef to 7a0a95c Compare May 21, 2026 03:25

JoTurk approved these changes May 21, 2026

View reviewed changes

zshang-oai merged commit ee1d1de into pion:main May 21, 2026
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[2/3] Reduce SILK decode hot-path copies#115

[2/3] Reduce SILK decode hot-path copies#115
zshang-oai merged 1 commit into
pion:mainfrom
zshang-oai:codex/reduce-silk-decode-hot-path

zshang-oai commented May 18, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zshang-oai commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Major changes

Packet and output staging

Float32 to s16 packing

LPC synthesis

Why

Validation

Uh oh!

codecov Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zshang-oai commented May 18, 2026 •

edited

Loading

codecov Bot commented May 18, 2026 •

edited

Loading