Skip to content

Conversation

@jschwinger233
Copy link
Member

@jschwinger233 jschwinger233 commented Dec 1, 2025

The current ringbuf consumer APIs Read/ReadInto always copy samples into a user-provided buffer, see https://github.com/cilium/ebpf/blob/v0.20.0/ringbuf/ring.go#L96. For callers that consume data synchronizely, this extra copy is unnecessary.

This change adds a zero-copy consumer API:

  • Peek / PeekInto – return a view into the mmap’d ring buffer without advancing the consumer position.
  • Consume – advance the consumer position once the view has been processed.

Existing Read/ReadInto semantics are unchanged and continue to work as before.

A preliminary microbenchmark [1] shows zero-copy advantage grows with larger records because copy throughput falls while view throughput stays roughly flat.

Single-core run on CPU 2, ring size 1 GiB, Go 1.24.4. Throughput is in million events/second (Mev/s); "speedup" is zero-copy / copy.

event-size (B) events/run copy (Mev/s) zero-copy (Mev/s) speedup
128 7,895,160 45.63 49.83 1.09x
512 2,064,888 25.03 34.94 1.40x
1024 1,040,447 8.90 34.94 3.93x
2048 522,247 4.57 29.56 6.47x

[1] https://github.com/jschwinger233/bpf_ringbuf_zc_benchmark

@jschwinger233 jschwinger233 force-pushed the gray/zc branch 4 times, most recently from 1866173 to c2d44c8 Compare December 2, 2025 01:32

// Consume advances the consumer position after a successful Peek/PeekInto.
func (r *Reader) Consume(view *View) {
atomic.StoreUintptr(r.ring.cons_pos, view.nextCons)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this change introduces a race condition for the following call chain:

  • Peek() or PeekInto()
  • Read() or ReadInto()
  • Consume()

A similar undefined behaviour might also be possible, if different Go routines call Peek() or PeekInto() and Consume() in various orders.

Copy link
Member Author

@jschwinger233 jschwinger233 Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out. I have to confess, it's by design to only support a single consumer with peek-only APIs. 😅

My first version was implemented with a boolean viewInFlight in the ring buffer to take care of multiple calls to Peek, Read after Peek, etc. However, it turns out the additional synchronization inside the ring buffer (mutex or atomic for viewInFlight) jeopardizes performance — the advantages of zero-copy are dragged down. So I re-designed the APIs as they are.

(BTW, I was wondering if I could get some advice on fixing the CI failures. 🙏 The new test cases are commented out — it seems like my changes regarding Read/ReadInto are problematic? Which test case in particular failed? I don't understand the error log: https://github.com/cilium/ebpf/actions/runs/19843613354/job/56856983722?pr=1915 😬)

Copy link
Collaborator

@ti-mo ti-mo Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That error would be due to lmb/vimto#29. A re-run should fix that, the patch was already reverted in vimto.

I feel like it's always going to be difficult to marry Read(Into) and Peek/Consume when they're implemented onto a single reader and are thus sharing state. Users of a Reader expect some form of concurrency safety. Interfaces with different safety guarantees should probably be separated. Or, instead of Peek/Consume, design a different interface that e.g. takes a caller-provided function that allows the caller to inspect (parts of) a message to decide if they'd be interested in receiving a copy (kind of like pin.Pin.Take()), or where they can unmarshal (part of) a message and send it to a channel. After invoking the callback, you move the consumer position. That saves one copy compared to Read(Into).

Also, a 13% raw throughput improvement sounds rather.. disappointing? I think allocation statistics are missing from this picture. Have you profiled it to find other potential gains? Would it be possible to write this as a plain Go benchmark? (might be difficult due to needing to spin up a producer somewhere, etc.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Timo, long time no see.

I did another round of benchmark, the latest results are updated in PR description. It seems throughput improvement grows with larger records, zero-copy API can be 4x faster when record size is 1024 bytes. I think it's valuable to switch to zero-copy APIs when ringbuf events are of big size.

@jschwinger233 jschwinger233 force-pushed the gray/zc branch 2 times, most recently from 0345238 to 82ac374 Compare December 4, 2025 01:27
The current ringbuf consumer APIs Read/ReadInto always copy samples into
a user-provided buffer, see
https://github.com/cilium/ebpf/blob/v0.20.0/ringbuf/ring.go#L96. For
callers that consume data synchronizely, this extra copy is unnecessary.

This change adds a zero-copy consumer API:

- Peek / PeekInto – return a view into the mmap’d ring buffer without
  advancing the consumer position.
- Consume – advance the consumer position once the view has been
  processed.

Existing Read/ReadInto semantics are unchanged and continue to work as
before.

A preliminary microbenchmark [1] shows zero-copy advantage grows with
larger records because copy throughput falls while view throughput stays
roughly flat.

Single-core run on CPU 2, ring size 1 GiB, Go 1.24.4. Throughput is in
million events/second (Mev/s); "speedup" is zero-copy / copy.

| event-size (B) | events/run | copy (Mev/s) | zero-copy (Mev/s) | speedup |
| --- | --- | --- | --- | --- |
| 128 | 7,895,160 | 45.63 | 49.83 | 1.09x |
| 512 | 2,064,888 | 25.03 | 34.94 | 1.40x |
| 1024 | 1,040,447 | 8.90 | 34.94 | 3.93x |
| 2048 | 522,247 | 4.57 | 29.56 | 6.47x |

[1] https://github.com/jschwinger233/bpf_ringbuf_zc_benchmark

Signed-off-by: graymon <[email protected]>
@jschwinger233
Copy link
Member Author

Thank you for the inputs 🙏

  • API: Zero-copy APIs are intentionally single-consumer and mirror existing Read/ReadInto semantics. I’m open to reshaping this (separate interfaces, callback-based API, etc.) if you prefer another direction.
  • Performance: Benchmarks in the PR description are updated; for 1024-byte records the zero-copy path is up to ~4× faster.
  • CI: a re-run is now green.

From my side this is ready for review. Please feel free to request changes if any part of the design or implementation doesn’t make sense.

@jschwinger233 jschwinger233 marked this pull request as ready for review December 4, 2025 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants