-
Notifications
You must be signed in to change notification settings - Fork 149
Account for CPU time spent on handshakes in bach sims #2717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
51c112b
to
a838ac7
Compare
// In practice this was chosen to make s2n-quic-sim simulate an uncontended mTLS handshake | ||
// as taking 2ms (in combination with the transition edge adding some extra cost), which is | ||
// fairly close to what we see in one scenario with real handshakes. | ||
s2n_quic_core::io::event_loop::attribute_cpu(core::time::Duration::from_micros(100)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this for now. I do wonder if @maddeleine's TLS offload work would be a bit less intrusive, though, since you could essentially intercept the task and inject this delay. But I'm fine with unblocking you for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could in principle put this in a wrapper around s2n-quic-tls's Provider (similar to slow_tls), it's just fairly painful to write that. I think @maddeleine's work doesn't change that -- it's not offloading just the crypto (what we'd probably ideally do here), so there's no obvious attach point. Every poll is too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, yeah.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note; I don't know if attributing some micros per on_read call will lead to the most accurate depiction of TLS handshake times. s2n-tls will calls on_read twice for every TLS record(once to retrieve the record header and then once to retrieve the full record). Additionally it can be called even if there is no TLS data to provide to s2n-tls.
Maybe we don't care about being precise here, we're just trying to attribute some time to the TLS ops. But I dunno, feels like this estimation could get wildly off if you vary which handshake you're performing.
I think the approach here makes sense. My problem with bach::record_cost(Duration::from_millis(20)); and it will not schedule that task until the cost has been met on the simulated time. What do you think? |
I guess that's basically a sleep -- the tricky part is that what we really want here is something like tokio's block_in_place, since the nature of the work we're injecting is that it's not modeled as async and so it's hard to stick async stuff in it :) |
For sure. But I think the difference with having it integrated is:
It could cause some inconsistencies though, since time is now relative for each task. So if, for example, you have multiple tasks interacting with a mutex it won't really accurately reflect the CPU costs and interleave those tasks differently from how they would actually behave in a real system. |
a838ac7
to
f5e8737
Compare
f5e8737
to
d549077
Compare
use core::time::Duration; | ||
|
||
// CPU today is attributed within the event loop, which is at least today always single | ||
// threaded, and we never yield while there's still unspent CPU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thought I kind of wanted to jot down: This breaks with offloading; it messes with the assumptions being made in this PR. Because if the TLS task is now async then you could be attributing CPU while the event loop task is sleeping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think we should build on the changes we made for the async TLS task to accomplish this. Basically we could use that runtime trait to spawn a wrapped TLS task that adds delays. We may want to also add support for delaying responses? Not sure... Anyway that area of the code seems like the right place to hook in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm yeah, that does sound doable. It wouldn't help if you wanted to simulate a non-offload handshake though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, yeah
Release Summary:
n/a, purely internal
Resolved issues:
n/a
Description of changes:
This adds internal sleeps during the s2n-quic event loop which account for CPU time spent doing work. That time is not otherwise tracked during s2n-quic-sim simulations, which makes it hard to simulate handshake workloads -- at least where concurrency etc. are involved.
Call-outs:
Requesting review from @camshaft to discuss whether this seems like an OK way to communicate this to bach, or if we should be trying to add some kind of advance_time to bach's simulated time. Tokio has sort of a similar function (https://docs.rs/tokio/latest/tokio/time/fn.advance.html) but I'm not sure how much bach's executor cares about the differences vs sleep noted in The Tokio docs.
If bach did offer such an API, ideally it would be a non-async function so we could call it directly from the innards of s2n-quic.
As-is this can't land since we need some way to detect which time to use -- these sleeps should be no-ops unless we're in a bach context. I'm not sure what the best way to achieve that is, maybe there's a bach-set thread local we could check.
Testing:
Ran s2n-quic-sim locally successfully, where even a zero-latency network now take some time during handshakes (this is sort of a bad graph since it's really just a single datapoint...):
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.