Account for CPU time spent on handshakes in bach sims #2717

Mark-Simulacrum · 2025-07-16T17:29:27Z

Release Summary:

n/a, purely internal

Resolved issues:

n/a

Description of changes:

This adds internal sleeps during the s2n-quic event loop which account for CPU time spent doing work. That time is not otherwise tracked during s2n-quic-sim simulations, which makes it hard to simulate handshake workloads -- at least where concurrency etc. are involved.

Call-outs:

Requesting review from @camshaft to discuss whether this seems like an OK way to communicate this to bach, or if we should be trying to add some kind of advance_time to bach's simulated time. Tokio has sort of a similar function (https://docs.rs/tokio/latest/tokio/time/fn.advance.html) but I'm not sure how much bach's executor cares about the differences vs sleep noted in The Tokio docs.

If bach did offer such an API, ideally it would be a non-async function so we could call it directly from the innards of s2n-quic.

As-is this can't land since we need some way to detect which time to use -- these sleeps should be no-ops unless we're in a bach context. I'm not sure what the best way to achieve that is, maybe there's a bach-set thread local we could check.

Testing:

Ran s2n-quic-sim locally successfully, where even a zero-latency network now take some time during handshakes (this is sort of a bad graph since it's really just a single datapoint...):

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

quic/s2n-quic-core/src/io/event_loop.rs

camshaft · 2025-07-16T21:03:58Z

quic/s2n-quic-tls/src/callback.rs

+        // In practice this was chosen to make s2n-quic-sim simulate an uncontended mTLS handshake
+        // as taking 2ms (in combination with the transition edge adding some extra cost), which is
+        // fairly close to what we see in one scenario with real handshakes.
+        s2n_quic_core::io::event_loop::attribute_cpu(core::time::Duration::from_micros(100));


I'm fine with this for now. I do wonder if @maddeleine's TLS offload work would be a bit less intrusive, though, since you could essentially intercept the task and inject this delay. But I'm fine with unblocking you for now.

I could in principle put this in a wrapper around s2n-quic-tls's Provider (similar to slow_tls), it's just fairly painful to write that. I think @maddeleine's work doesn't change that -- it's not offloading just the crypto (what we'd probably ideally do here), so there's no obvious attach point. Every poll is too much.

That's a good point, yeah.

A note; I don't know if attributing some micros per on_read call will lead to the most accurate depiction of TLS handshake times. s2n-tls will calls on_read twice for every TLS record(once to retrieve the record header and then once to retrieve the full record). Additionally it can be called even if there is no TLS data to provide to s2n-tls.
Maybe we don't care about being precise here, we're just trying to attribute some time to the TLS ops. But I dunno, feels like this estimation could get wildly off if you vary which handshake you're performing.

camshaft · 2025-07-16T21:11:40Z

Requesting review from @camshaft to discuss whether this seems like an OK way to communicate this to bach, or if we should be trying to add some kind of advance_time to bach's simulated time.

I think the approach here makes sense. My problem with advance_time is it's a global value and where bach is simulating actors across a network it's a bit of a blunt instrument, especially in this context where the CPU load shouldn't affect another machine. So I think injecting sleeps is exactly how to do it today with the provided tools. I do think it would be interesting for bach to optionally capture this information, though, where you can say something like

bach::record_cost(Duration::from_millis(20));

and it will not schedule that task until the cost has been met on the simulated time. What do you think?

Mark-Simulacrum · 2025-07-16T21:15:40Z

and it will not schedule that task until the cost has been met on the simulated time. What do you think?

I guess that's basically a sleep -- the tricky part is that what we really want here is something like tokio's block_in_place, since the nature of the work we're injecting is that it's not modeled as async and so it's hard to stick async stuff in it :)

camshaft · 2025-07-16T21:20:23Z

For sure. But I think the difference with having it integrated is:

It automatically injects the sleep for you - you don't have to have explicit await points in the task for (though i guess we could also support that)
We could do a similar thing as tokio where calls to Instant::now return the underlying now + current cost, so from the perspective of that task time is advancing.

It could cause some inconsistencies though, since time is now relative for each task. So if, for example, you have multiple tasks interacting with a mutex it won't really accurately reflect the CPU costs and interleave those tasks differently from how they would actually behave in a real system.

maddeleine · 2025-08-20T23:16:23Z

quic/s2n-quic-core/src/io/event_loop.rs

+    use core::time::Duration;
+
+    // CPU today is attributed within the event loop, which is at least today always single
+    // threaded, and we never yield while there's still unspent CPU.


Another thought I kind of wanted to jot down: This breaks with offloading; it messes with the assumptions being made in this PR. Because if the TLS task is now async then you could be attributing CPU while the event loop task is sleeping.

I actually think we should build on the changes we made for the async TLS task to accomplish this. Basically we could use that runtime trait to spawn a wrapped TLS task that adds delays. We may want to also add support for delaying responses? Not sure... Anyway that area of the code seems like the right place to hook in.

Hmmm yeah, that does sound doable. It wouldn't help if you wanted to simulate a non-offload handshake though.

That's true, yeah

Mark-Simulacrum requested a review from camshaft July 16, 2025 17:29

Mark-Simulacrum force-pushed the improve-simulation branch from 51c112b to a838ac7 Compare July 16, 2025 18:16

camshaft reviewed Jul 16, 2025

View reviewed changes

Mark-Simulacrum force-pushed the improve-simulation branch from a838ac7 to f5e8737 Compare July 17, 2025 14:11

Account for CPU time spent on handshakes in bach sims

d549077

Mark-Simulacrum force-pushed the improve-simulation branch from f5e8737 to d549077 Compare July 17, 2025 14:29

maddeleine reviewed Aug 20, 2025

View reviewed changes

maddeleine mentioned this pull request Aug 21, 2025

Account for CPU time spent on handshakes in bach sims #2763

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Account for CPU time spent on handshakes in bach sims #2717

Account for CPU time spent on handshakes in bach sims #2717

Uh oh!

Mark-Simulacrum commented Jul 16, 2025

Uh oh!

Uh oh!

camshaft Jul 16, 2025

Uh oh!

Mark-Simulacrum Jul 16, 2025

Uh oh!

camshaft Jul 16, 2025

Uh oh!

maddeleine Aug 20, 2025

Uh oh!

camshaft commented Jul 16, 2025

Uh oh!

Mark-Simulacrum commented Jul 16, 2025

Uh oh!

camshaft commented Jul 16, 2025

Uh oh!

maddeleine Aug 20, 2025

Uh oh!

camshaft Aug 20, 2025

Uh oh!

maddeleine Aug 20, 2025

Uh oh!

camshaft Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Account for CPU time spent on handshakes in bach sims #2717

Are you sure you want to change the base?

Account for CPU time spent on handshakes in bach sims #2717

Uh oh!

Conversation

Mark-Simulacrum commented Jul 16, 2025

Release Summary:

Resolved issues:

Description of changes:

Call-outs:

Testing:

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

camshaft commented Jul 16, 2025

Uh oh!

Mark-Simulacrum commented Jul 16, 2025

Uh oh!

camshaft commented Jul 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants