-
Notifications
You must be signed in to change notification settings - Fork 138
Enable guest tracing #695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Enable guest tracing #695
Conversation
15f9623
to
030a019
Compare
The output data should be changed to something that other tools can open #704. |
Just to document this: it was confirmed that the host under a nested MSHV will incorrectly not see the invariant TSC bit, and this should be ignored; on the azure platforms, at least, the TSC is in fact invariant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Could we add a step where the test binaries are executing in the CI pipeline? Right now it looks like none of this code is being exercised since it is all behind feature flag.
@@ -616,6 +617,7 @@ impl ExclusiveSharedMemory { | |||
/// Return the address of memory at an offset to this `SharedMemory` checking | |||
/// that the memory is within the bounds of the `SharedMemory`. | |||
#[instrument(err(Debug), skip_all, parent = Span::current(), level= "Trace")] | |||
#[allow(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why all the deadcode allows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are to get rid of clippy warnings when you build with the basic trace_guest infrastructure support but not with any of the things that use it. Maybe another option here is to replace the feature with a cfg item that is any(features_which_use_this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is something weird for which I spent some time and I wasn't able to find out why it happened.
The commit in question is : [hyperlight_host/trace] Support collecting guest stacktraces
after which all these clippy warnings appear. The weird thing is I couldn't find where this commit actually stopped using these functions.
I'll have another look at it.
baf50a8
to
af68b04
Compare
In the future, the outb handler will need to take a Hypervisor instance in order to be able to access register and memory state of the VM, so it doesn't make sense for these interfaces to be more public than the `Hypervisor` trait. Nobody outside of Hyperlight seems to use these at the moment, so it's probably simplest to restrict these to `pub(crate)`. Signed-off-by: Lucy Menon <[email protected]>
This adds (unused) support for creating trace files for sandboxes and passing them around to relevant sandbox event handler code. This will be used for collecting debug trace and profiling information. Signed-off-by: Lucy Menon <[email protected]> Signed-off-by: Doru Blânzeanu <[email protected]>
This will be useful in the near future, when it will allow transforming the exe_info into unwind information without an extra copy. Signed-off-by: Lucy Menon <[email protected]>
This adds a new interface which tracing code can use to request the values of registers from the hypervisor supervising a sandbox. Signed-off-by: Lucy Menon <[email protected]>
Signed-off-by: Lucy Menon <[email protected]> Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
This allows for producing profiles of memory usage. Signed-off-by: Lucy Menon <[email protected]> Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Lucy Menon <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
…ords - `hyperlight-guest-tracing` defines a `TraceBuffer` that keeps `TraceRecord`s that the guest issues. When the buffer capacity is reached, it automatically issues an `Out` instruction with the corresponding info for the host to retrieve the buffer. - The guest can issue `TraceRecord`s by using the `hyperlight-guest-tracing-macro` crate that pushes new records to the buffer. - `hyperlight_common` contains the definitions for the frame types a guest can send to the host using the `Out` instruction. Signed-off-by: Doru Blânzeanu <[email protected]>
- When an `Out` instruction is intercepted, the Hypervisor checks for the frame Id, to verify what type of exit it is. Based on this, when a trace record type is received, we copy the array of trace records from the guest's memory, calculate the timestamp based on the cycles returned by the guest's RDTSC and write the record to the trace file. Signed-off-by: Doru Blânzeanu <[email protected]>
- Add traces wherever we think it might be of use to help us better profile the codebase. - Add flush instructions before halting or running an `OutB` instruction to ensure the records are sent before the guest exits due to an error or a normal exit Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
- This adds tests for both the macro and the tracing crates - Some refactoring was needed to the tracing crate to test thoroughly - Improve macros by allowing expressions to be wrapped with trace records Signed-off-by: Doru Blânzeanu <[email protected]>
…lculated - Rely on TSC readings on the host from when the guest is started and when the first records are received. Use these two moments in time and TSC readings to calculate the TSC frequency. - Afterwards we store the TSC reading from inside the guest and calculate all incoming TSC readings relative to the first one and using the frequency we can calculate the amount of time it took. Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Description
This PR updates the work @syntactically has done in #103 to latest main and extends it to support tracing the guest for performance.
This effort is part of the #669 issue.
Improvements on the trace file output will be addressed in #704 .
How it works
It defines a
TraceBuffer
in the guest that holdsTraceRecord
s. These records are only sent to the host when the buffer gets full, or an exit (Out/Halt
instructions) is anticipated.The records are sent to the host using an
Out
instruction and providing the pointer to an array ofTraceRecords
.The host accesses the guest memory, reads these records and writes them to the trace file.
The guest uses the Invariant TSC and
rdtsc
instruction to read the clock cycles in the guest and calculate the timestamp on the host.The timestamp is relative to when the TraceInfo started (before creating the Hypervisor Driver in this case).
Two additional crates define the guest logic for tracing. One defines some macros meant to make the annotation of functions simpler to the user
Note: The data processing could be improved (maybe OTEL), I've just used what we had at hand.
Results
Below are some results I have generated based on our hello-world example in the hyperlight repository.
1. My host machine on WSL2 running KVM on Intel(R) Core(TM) Ultra 7 165H
Has Invariant TSC
2. Azure VM running mshv3 on AMD EPYC 7763 64-Core Processor
Does not report an invariant TSC. I am currently investigating if I can get an Azure VM that reports supporting an invariant TSC.TODO
dump_trace
functionality to visualize the traces