ringbuf: add zero-copy consumer APIs #1915

jschwinger233 · 2025-12-01T17:20:24Z

The current ringbuf consumer APIs Read/ReadInto always copy samples into a user-provided buffer, see https://github.com/cilium/ebpf/blob/v0.20.0/ringbuf/ring.go#L96. For callers that consume data synchronizely, this extra copy is unnecessary.

This change adds a zero-copy consumer API:

Peek / PeekInto – return a view into the mmap’d ring buffer without advancing the consumer position.
Consume – advance the consumer position once the view has been processed.

Existing Read/ReadInto semantics are unchanged and continue to work as before.

A preliminary microbenchmark [1] shows zero-copy advantage grows with larger records because copy throughput falls while view throughput stays roughly flat.

Single-core run on CPU 2, ring size 1 GiB, Go 1.24.4. Throughput is in million events/second (Mev/s); "speedup" is zero-copy / copy.

event-size (B)	events/run	copy (Mev/s)	zero-copy (Mev/s)	speedup
128	7,895,160	45.63	49.83	1.09x
512	2,064,888	25.03	34.94	1.40x
1024	1,040,447	8.90	34.94	3.93x
2048	522,247	4.57	29.56	6.47x

[1] https://github.com/jschwinger233/bpf_ringbuf_zc_benchmark

florianl · 2025-12-02T19:31:21Z

ringbuf/reader.go

+
+// Consume advances the consumer position after a successful Peek/PeekInto.
+func (r *Reader) Consume(view *View) {
+	atomic.StoreUintptr(r.ring.cons_pos, view.nextCons)


It feels like this change introduces a race condition for the following call chain:

Peek() or PeekInto()

Read() or ReadInto()

Consume()

A similar undefined behaviour might also be possible, if different Go routines call Peek() or PeekInto() and Consume() in various orders.

Thanks for pointing that out. I have to confess, it's by design to only support a single consumer with peek-only APIs. 😅

My first version was implemented with a boolean viewInFlight in the ring buffer to take care of multiple calls to Peek, Read after Peek, etc. However, it turns out the additional synchronization inside the ring buffer (mutex or atomic for viewInFlight) jeopardizes performance — the advantages of zero-copy are dragged down. So I re-designed the APIs as they are.

(BTW, I was wondering if I could get some advice on fixing the CI failures. 🙏 The new test cases are commented out — it seems like my changes regarding Read/ReadInto are problematic? Which test case in particular failed? I don't understand the error log: https://github.com/cilium/ebpf/actions/runs/19843613354/job/56856983722?pr=1915 😬)

That error would be due to lmb/vimto#29. A re-run should fix that, the patch was already reverted in vimto.

I feel like it's always going to be difficult to marry Read(Into) and Peek/Consume when they're implemented onto a single reader and are thus sharing state. Users of a Reader expect some form of concurrency safety. Interfaces with different safety guarantees should probably be separated. Or, instead of Peek/Consume, design a different interface that e.g. takes a caller-provided function that allows the caller to inspect (parts of) a message to decide if they'd be interested in receiving a copy (kind of like pin.Pin.Take()), or where they can unmarshal (part of) a message and send it to a channel. After invoking the callback, you move the consumer position. That saves one copy compared to Read(Into).

Also, a 13% raw throughput improvement sounds rather.. disappointing? I think allocation statistics are missing from this picture. Have you profiled it to find other potential gains? Would it be possible to write this as a plain Go benchmark? (might be difficult due to needing to spin up a producer somewhere, etc.)

Hi Timo, long time no see.

I did another round of benchmark, the latest results are updated in PR description. It seems throughput improvement grows with larger records, zero-copy API can be 4x faster when record size is 1024 bytes. I think it's valuable to switch to zero-copy APIs when ringbuf events are of big size.

The current ringbuf consumer APIs Read/ReadInto always copy samples into a user-provided buffer, see https://github.com/cilium/ebpf/blob/v0.20.0/ringbuf/ring.go#L96. For callers that consume data synchronizely, this extra copy is unnecessary. This change adds a zero-copy consumer API: - Peek / PeekInto – return a view into the mmap’d ring buffer without advancing the consumer position. - Consume – advance the consumer position once the view has been processed. Existing Read/ReadInto semantics are unchanged and continue to work as before. A preliminary microbenchmark [1] shows zero-copy advantage grows with larger records because copy throughput falls while view throughput stays roughly flat. Single-core run on CPU 2, ring size 1 GiB, Go 1.24.4. Throughput is in million events/second (Mev/s); "speedup" is zero-copy / copy. | event-size (B) | events/run | copy (Mev/s) | zero-copy (Mev/s) | speedup | | --- | --- | --- | --- | --- | | 128 | 7,895,160 | 45.63 | 49.83 | 1.09x | | 512 | 2,064,888 | 25.03 | 34.94 | 1.40x | | 1024 | 1,040,447 | 8.90 | 34.94 | 3.93x | | 2048 | 522,247 | 4.57 | 29.56 | 6.47x | [1] https://github.com/jschwinger233/bpf_ringbuf_zc_benchmark Signed-off-by: graymon <[email protected]>

jschwinger233 · 2025-12-04T01:52:43Z

Thank you for the inputs 🙏

API: Zero-copy APIs are intentionally single-consumer and mirror existing Read/ReadInto semantics. I’m open to reshaping this (separate interfaces, callback-based API, etc.) if you prefer another direction.
Performance: Benchmarks in the PR description are updated; for 1024-byte records the zero-copy path is up to ~4× faster.
CI: a re-run is now green.

From my side this is ready for review. Please feel free to request changes if any part of the design or implementation doesn’t make sense.

jschwinger233 force-pushed the gray/zc branch 4 times, most recently from 1866173 to c2d44c8 Compare December 2, 2025 01:32

florianl reviewed Dec 2, 2025

View reviewed changes

jschwinger233 force-pushed the gray/zc branch 2 times, most recently from 0345238 to 82ac374 Compare December 4, 2025 01:27

jschwinger233 force-pushed the gray/zc branch from 82ac374 to e94c116 Compare December 4, 2025 01:31

jschwinger233 marked this pull request as ready for review December 4, 2025 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ringbuf: add zero-copy consumer APIs #1915

ringbuf: add zero-copy consumer APIs #1915

jschwinger233 commented Dec 1, 2025 •

edited

Loading

Uh oh!

florianl Dec 2, 2025

Uh oh!

jschwinger233 Dec 3, 2025 •

edited

Loading

Uh oh!

ti-mo Dec 3, 2025 •

edited

Loading

Uh oh!

jschwinger233 Dec 4, 2025

Uh oh!

jschwinger233 commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ringbuf: add zero-copy consumer APIs #1915

Are you sure you want to change the base?

ringbuf: add zero-copy consumer APIs #1915

Conversation

jschwinger233 commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

florianl Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

jschwinger233 Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ti-mo Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jschwinger233 Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

jschwinger233 commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jschwinger233 commented Dec 1, 2025 •

edited

Loading

jschwinger233 Dec 3, 2025 •

edited

Loading

ti-mo Dec 3, 2025 •

edited

Loading