-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Direct writing to/reading from of underlying buffer without Bincode #9
base: master
Are you sure you want to change the base?
Conversation
@andrew-manifold Thanks so much for this, and sorry for the delayed response. I'm planning to review this as soon as I have the chance -- probably this Friday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for this. I think the send_direct*
parts look good overall (see some comments inline), but I have concerns about the {try_}recv_direct
parts, and I think ZeroCopyContext::{try_}recv::<_, &[u8]>
can do the job just as efficiently and more correctly.
If you can demonstrate that bincode
is adding significant overhead even for the &[u8]
case, then perhaps we could add {try_}recv_direct
functions to ZeroCopyContext
, but I don't believe they can be added soundly to Receiver
.
src/lib.rs
Outdated
/// Attempt to read a message without blocking. Returns a slice directly from the ring buffer | ||
/// | ||
/// This will return `Ok(None)` if there are no messages immediately available. | ||
pub fn try_recv_direct<'a>(&'a self) -> Result<Option<&'a [u8]>>{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What advantage does this have over using ZeroCopyContext::try_recv<_, &[u8]>
? I would expect it to give identical performance. Have you run any benchmarks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as you've mentioned below it's more around giving an assumption less means for the developer to get access to the underlying bytes. ShmWriter
puts all the ownership on the user to properly implement encoding and it was helpful to have a means to allow for proper decoding too. Ideally there would be another trait for ShmReader
but the lifetime pollution issues were getting pretty horrific so I opted to just return the backing buffer
src/lib.rs
Outdated
@@ -267,6 +287,45 @@ impl Receiver { | |||
}) | |||
} | |||
|
|||
fn try_recv_direct_0<'a>(&'a self) -> Result<Option<(&'a [u8], u32)>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's significant code duplication across this and try_recv_0
(and likewise across recv_direct_timeout_0
and recv_timeout_0
). Assuming these provide a significant performance improvement over ZeroCopyContext
, and that they can be used safely and soundly (per my comment above), perhaps we could re-implement try_recv_0
in terms of try_recv_direct_0
(and likewise recv_timeout_0
in terms of recv_direct_timeout_0
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree, the code duplication isn't great. I'll try to come up with a more unified approach that would generalize across the safe + unsafe implementations.
/// This will return `Ok(None)` if there are no messages immediately available. | ||
pub fn try_recv_direct<'a>(&'a self) -> Result<Option<&'a [u8]>>{ | ||
Ok(if let Some((value, position)) = self.try_recv_direct_0()? { | ||
self.seek(position)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is both unsafe and unsound. The call to seek
tells the writer(s) that we're done reading the buffer, which is not true, since we're about to return a shared reference to the caller. That violates Rust's aliasing rules because any writer(s) will believe they have unique, mutable access to something the caller has shared, immutable access to.
ZeroCopyContext
exists precisely to avoid this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a valid point, I'll drop that method for the direct
version, it indeed should only every be called within the usage of ZeroCopyContext
/// `Err(`[`Error::ZeroSizedMessage`](enum.Error.html#variant.ZeroSizedMessage)`))`. If the message size is | ||
/// greater than the ring buffer capacity, this method will return | ||
/// `Err(`[`Error::MessageTooLarge`](enum.Error.html#variant.MessageTooLarge)`))`. | ||
pub fn send_direct(&self, value: &impl ShmWriter) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all the send_direct*
methods should be marked unsafe
given the inherent unsafety of ShmWriter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, can easily update those
src/lib.rs
Outdated
self.send_direct_timeout_0(value, true, None).map(drop) | ||
} | ||
|
||
fn send_direct_timeout_0( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could reduce code duplication here by re-implementing send_timeout_0
in terms of send_direct_timeout_0
, using a bincode
-based ShmWriter
implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I like that idea lot. It will keep things much cleaner and easier to maintain
src/lib.rs
Outdated
/// to account for padding, alignment, proper offsetting etc. or you will cause mayhem. | ||
pub trait ShmWriter { | ||
fn msg_len(&self) -> u32; | ||
fn write_to_shm(&self, shm_ptr: *mut u8, msg_len: u32) -> Result<()>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the advantage of using a raw pointer here rather than a &mut [u8]
slice? That would make the API safe, but still allow the implementer to convert the slice to a raw pointer if they want to do something unusual (in which case soundness is their responsibility).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't necessarily think there is an advantage for using a raw pointer, probably more so a result of some bad habits that have been picked up over time haha. Will update to use &mut [u8]
as it does give more flexibility
Ah, I just realized that using |
The context of my interest in adding these methods is that I've run into some corner cases with All the feedback is very much appreciated, I'll try to get updates into this PR soon that address all the issues you've pointed out. |
@dicej I believe all major issues have been fixed, let me know if you think other tweaks would be helpful. I managed to unify both the sender and receiver api's to use one common implementation underneath for each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking; thanks for the updates.
Three requests (plus another inline, below):
- Would you please run
cargo fmt
on the code? - Would you mind adding one or more tests for the
_direct
cases so we have some coverage there? That will help ensure this feature doesn't regress in the future. - I just noticed that there seems to be a subtle regression such that
cargo test arbitrary_case
hangs indefinitely on your branch but works on themaster
branch. It's not consuming any CPU, so it might be a deadlock. Would you mind taking a look at that?
src/lib.rs
Outdated
/// `Err(`[`Error::ZeroSizedMessage`](enum.Error.html#variant.ZeroSizedMessage)`))`. If the message size is | ||
/// greater than the ring buffer capacity, this method will return | ||
/// `Err(`[`Error::MessageTooLarge`](enum.Error.html#variant.MessageTooLarge)`))`. | ||
pub unsafe fn send_direct(&self, value: &impl ShmWriter) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I recommended adding unsafe
to these, but now that ShmWriter::write_to_shim
takes a &mut [u8]
instead of a *mut u8
, I think we can remove the unsafe
keywords from the send_direct*
methods. AFAICT, there's no way for the caller to use them unsoundly unless the caller (or ShmWriter
implementer) itself uses unsafe
. Sorry for the confusion.
I have everything besides the
Currently looking into what is setup incorrectly to cause this as it's this section in the tests:
|
@dicej so after doing more digging I've found a few takeaways:
As fas as how this is happening with my PR and not on master is still an unknown to me. My hypothesis is that the compiler is able to possibly do some optimizations in one of the two versions that allows for faster processing. I'll need to look at the assembly to see where things are inconsistent. running this test on my branch on one of my dev servers and not a Mac book pro (my laptop) has it pass (machine specs below):
|
Additional unit tests are now included that cover |
This crate has a fantastically designed API but at times the hard dependency on
bincode
introduces excessive overhead for very low latency/high throughput applications. I've included an identical set of methods forSender
andReceiver
(the_direct_
versions) that allow for:ShmWriter
trait allows implementing structs to customize their encoding logic and directly write to the mmap backing buffer without additional costs around bincode's layer and additional allocations this may have originally resulted in (i.e. no intermediaryVec<u8>
gets setup)&[u8]
's contained within a message should be able to be easily offset to with no real overhead