Skip to content

Rework metrics to use metrics crate #361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
731 changes: 528 additions & 203 deletions Cargo.lock

Large diffs are not rendered by default.

5 changes: 2 additions & 3 deletions Justfile
Original file line number Diff line number Diff line change
Expand Up @@ -89,12 +89,11 @@ test-rust target=default-target features="": (test-rust-int "rust" target featur
# ignored tests - these tests need to run serially or with specific properties
cargo test {{ if features =="" {''} else if features=="no-default-features" {"--no-default-features" } else {"--no-default-features -F " + features } }} --profile={{ if target == "debug" { "dev" } else { target } }} test_trace -p hyperlight-host --lib -- --ignored
cargo test {{ if features =="" {''} else if features=="no-default-features" {"--no-default-features" } else {"--no-default-features -F " + features } }} --profile={{ if target == "debug" { "dev" } else { target } }} test_drop -p hyperlight-host --lib -- --ignored
cargo test {{ if features =="" {''} else if features=="no-default-features" {"--no-default-features" } else {"--no-default-features -F " + features } }} --profile={{ if target == "debug" { "dev" } else { target } }} hypervisor::metrics::tests::test_gather_metrics -p hyperlight-host --lib -- --ignored
cargo test {{ if features =="" {''} else if features=="no-default-features" {"--no-default-features" } else {"--no-default-features -F " + features } }} --profile={{ if target == "debug" { "dev" } else { target } }} sandbox::metrics::tests::test_gather_metrics -p hyperlight-host --lib -- --ignored
cargo test {{ if features =="" {''} else if features=="no-default-features" {"--no-default-features" } else {"--no-default-features -F " + features } }} --profile={{ if target == "debug" { "dev" } else { target } }} test_metrics -p hyperlight-host --lib -- --ignored
cargo test {{ if features =="" {''} else if features=="no-default-features" {"--no-default-features" } else {"--no-default-features -F " + features } }} --profile={{ if target == "debug" { "dev" } else { target } }} --test integration_test log_message -- --ignored
cargo test {{ if features =="" {''} else if features=="no-default-features" {"--no-default-features" } else {"--no-default-features -F " + features } }} --profile={{ if target == "debug" { "dev" } else { target } }} sandbox::uninitialized::tests::test_log_trace -p hyperlight-host --lib -- --ignored
cargo test {{ if features =="" {''} else if features=="no-default-features" {"--no-default-features" } else {"--no-default-features -F " + features } }} --profile={{ if target == "debug" { "dev" } else { target } }} hypervisor::hypervisor_handler::tests::create_1000_sandboxes -p hyperlight-host --lib -- --ignored
cargo test {{ if features =="" {''} else if features=="no-default-features" {"--no-default-features" } else {"--no-default-features -F " + features } }} --profile={{ if target == "debug" { "dev" } else { target } }} -p hyperlight-host --lib -- metrics::tests::test_metrics_are_emitted --exact --ignored
cargo test {{ if features =="" {''} else if features=="no-default-features" {"--no-default-features" } else {"--no-default-features -F function_call_metrics," + features } }} --profile={{ if target == "debug" { "dev" } else { target } }} -p hyperlight-host --lib -- metrics::tests::test_metrics_are_emitted --exact --ignored
{{ set-trace-env-vars }} cargo test {{ if features =="" {''} else if features=="no-default-features" {"--no-default-features" } else {"--no-default-features -F " + features } }} --profile={{ if target == "debug" { "dev" } else { target } }} --lib sandbox::outb::tests::test_log_outb_log -- --ignored

test-seccomp target=default-target features="":
Expand Down
35 changes: 10 additions & 25 deletions docs/hyperlight-metrics-logs-and-traces.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,46 +2,31 @@

Hyperlight provides the following observability features:

* [Metrics](#metrics) are provided using Prometheus.
* [Metrics](#metrics) are provided using the [metrics](https://docs.rs/metrics/latest/metrics/index.html) crate, which is a lightweight metrics facade.
* [Logs](#logs) are provided using the Rust [log crate](https://docs.rs/log/0.4.6/log/), and can be consumed by any Rust logger implementation, including LogTracer which can be used to emit log records as tracing events.
* [Tracing](#tracing) is provided using the Rust [tracing crate](https://docs.rs/tracing/0.1.37/tracing/), and can be consumed by any Rust tracing implementation. In addition, the [log feature](https://docs.rs/tracing/latest/tracing/#crate-feature-flags) is enabled which means that should a hyperlight host application not want to consume tracing events, you can still consume them as logs.

## Metrics

Hyperlight provides metrics using Prometheus. The metrics are registered using either the [default_registry](https://docs.rs/prometheus/latest/prometheus/fn.default_registry.html) or a registry instance provided by the host application.
Metrics are provided using the [metrics](https://docs.rs/metrics/latest/metrics/index.html) crate, which is a lightweight metrics facade. When an executable installs a [recorder](https://docs.rs/metrics/latest/metrics/trait.Recorder.html), Hyperlight will emit its metrics to that recorder, which allows library authors to seamless emit their own metrics without knowing or caring which exporter implementation is chosen, or even if one is installed. In case where no recorder is installed, the metrics will be emitted to the default recorder, which is a no-op implementation with minimal overhead.

To provide a registry to Hyperlight, use the `set_metrics_registry` function and pass a reference to a registry with `static` lifetime:

```rust
use hyperlight_host::metrics::set_metrics_registry;
use prometheus::Registry;
use lazy_static::lazy_static;

lazy_static! {
static ref REGISTRY: Registry = Registry::new();
}

set_metrics_registry(&REGISTRY);
```
There are many different implementations of recorders. One example is the [prometheus exporter](https://docs.rs/metrics-exporter-prometheus/latest/metrics_exporter_prometheus/) which can be used to export metrics to a Prometheus server. An example of how to use this is provided in the [examples/metrics](../src/hyperlight_host/examples/metrics) directory.

The following metrics are provided and are enabled by default:

* `hyperlight_guest_error_count` - a vector of counters that tracks the number of guest errors by code and message.
* `hyperlight_number_of_cancelled_guest_execution` - a counter that tracks the number of guest executions that have been cancelled because the execution time exceeded the time allowed.
* `guest_errors_total` - Counter that tracks the number of guest errors by error code.
* `guest_cancellations_total` - Counter that tracks the number of guest executions that have been cancelled because the execution time exceeded the time allowed.

The following metrics are provided but are disabled by default and require the feature `function_call_metrics` to be enabled:
The following metrics are provided but are disabled by default:

* `hyperlight_guest_function_call_duration_microseconds` - a vector of histograms that tracks the execution time of guest functions in microseconds by function name. The histogram also tracks the number of calls to each function.
* `hyperlight_host_function_calls_duration_microseconds` - a vector of histograms that tracks the execution time of host functions in microseconds by function name. The histogram also tracks the number of calls to each function.
* `guest_call_duration_seconds` - Histogram that tracks the execution time of guest functions in seconds by function name. The histogram also tracks the number of calls to each function.
* `host_call_duration_seconds` - Histogram that tracks the execution time of host functions in seconds by function name. The histogram also tracks the number of calls to each function.

The rationale for disabling the function call metrics by default is that:

* A Hyperlight host may wish to provide its own metrics for function calls.
* Enabling a trace subscriber will cause the function call metrics to be emitted as trace events, which may be sufficient for some use cases.

There is an example of how to gather metrics in the [examples/metrics](../src/hyperlight_host/examples/metrics) directory.

The metrics capabilities provided by Hyperlight can also be used by a library or host that is using Hyperlight to provide additional metrics, see the [hypervisor metrics module](../src/hyperlight_host/src/hypervisor/metrics.rs) for an example of how to define metrics.
* These 2 metrics require string clones for the function names, which may be too expensive for some use cases.
We might consider enabling these metrics by default in the future.

## Logs

Expand Down
4 changes: 2 additions & 2 deletions src/hyperlight_common/src/flatbuffer_wrappers/guest_error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ use crate::flatbuffers::hyperlight::generated::{
ErrorCode as FbErrorCode, GuestError as FbGuestError, GuestErrorArgs,
};

#[derive(Debug, Clone, Eq, PartialEq)]
#[derive(Debug, Clone, Copy, Eq, PartialEq)]
#[repr(C)]
/// `ErrorCode` represents an error that occurred in the Hyperlight Guest.
pub enum ErrorCode {
Expand Down Expand Up @@ -222,7 +222,7 @@ impl TryFrom<&GuestError> for Vec<u8> {
let guest_error_fb = FbGuestError::create(
&mut builder,
&GuestErrorArgs {
code: value.code.clone().into(),
code: value.code.into(),
message: Some(message),
},
);
Expand Down
6 changes: 3 additions & 3 deletions src/hyperlight_host/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ lazy_static = "1.4.0"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
log = "0.4.27"
once_cell = { version = "1.21.3" }
tracing = { version = "0.1.41", features = ["log"] }
tracing-log = "0.2.0"
tracing-core = "0.1.33"
Expand All @@ -43,11 +42,10 @@ vmm-sys-util = "0.13.0"
crossbeam = "0.8.0"
crossbeam-channel = "0.5.15"
thiserror = "2.0.12"
prometheus = "0.14.0"
strum = { version = "0.27", features = ["derive"] }
tempfile = { version = "3.19", optional = true }
serde_yaml = "0.9"
anyhow = "1.0"
metrics = "0.24.1"

[target.'cfg(windows)'.dependencies]
windows = { version = "0.61", features = [
Expand Down Expand Up @@ -103,6 +101,8 @@ opentelemetry_sdk = { version = "0.29", features = ["rt-tokio"] }
tokio = { version = "1.44.2", features = ["full"] }
criterion = "0.5.1"
tracing-chrome = "0.7.2"
metrics-util = "0.19.0"
metrics-exporter-prometheus = "0.16.2"

[target.'cfg(windows)'.dev-dependencies]
windows = { version = "0.61", features = [
Expand Down
107 changes: 25 additions & 82 deletions src/hyperlight_host/examples/metrics/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,22 +22,29 @@ use hyperlight_common::flatbuffer_wrappers::function_types::{ParameterValue, Ret
use hyperlight_host::sandbox::uninitialized::UninitializedSandbox;
use hyperlight_host::sandbox_state::sandbox::EvolvableSandbox;
use hyperlight_host::sandbox_state::transition::Noop;
use hyperlight_host::{set_metrics_registry, GuestBinary, MultiUseSandbox, Result};
use hyperlight_host::{GuestBinary, MultiUseSandbox, Result};
use hyperlight_testing::simple_guest_as_string;
use lazy_static::lazy_static;
use prometheus::Registry;

lazy_static! {
static ref HOST_REGISTRY: Registry = Registry::new();
}
fn fn_writer(_msg: String) -> Result<i32> {
Ok(0)
}
// Run this rust example with the flag --features "function_call_metrics" to enable more metrics to be emitted

fn main() {
// Install prometheus metrics exporter.
// We only install the metrics recorder here, but you can also use the
// `metrics_exporter_prometheus::PrometheusBuilder::new().install()` method
// to install a HTTP listener that serves the metrics.
let prometheus_handle = metrics_exporter_prometheus::PrometheusBuilder::new()
.install_recorder()
.expect("Failed to install Prometheus exporter");

fn main() -> Result<()> {
// If this is not called then the default registry `prometheus::default_registry` will be used.
set_metrics_registry(&HOST_REGISTRY)?;
// Do some hyperlight stuff to generate metrics.
do_hyperlight_stuff();

// Get the metrics and print them in prometheus exposition format.
let payload = prometheus_handle.render();
println!("Prometheus metrics:\n{}", payload);
}

fn do_hyperlight_stuff() {
// Get the path to a simple guest binary.
let hyperlight_guest_path =
simple_guest_as_string().expect("Cannot find the guest binary at the expected location.");
Expand All @@ -60,7 +67,7 @@ fn main() -> Result<()> {

let no_op = Noop::<UninitializedSandbox, MultiUseSandbox>::default();

let mut multiuse_sandbox = usandbox.evolve(no_op)?;
let mut multiuse_sandbox = usandbox.evolve(no_op).expect("Failed to evolve sandbox");

// Call a guest function 5 times to generate some metrics.
for _ in 0..5 {
Expand Down Expand Up @@ -97,13 +104,14 @@ fn main() -> Result<()> {
None,
None,
None,
)?;
)
.expect("Failed to create UninitializedSandbox");

// Initialize the sandbox.

let no_op = Noop::<UninitializedSandbox, MultiUseSandbox>::default();

let mut multiuse_sandbox = usandbox.evolve(no_op)?;
let mut multiuse_sandbox = usandbox.evolve(no_op).expect("Failed to evolve sandbox");

// Call a function that gets cancelled by the host function 5 times to generate some metrics.

Expand All @@ -121,73 +129,8 @@ fn main() -> Result<()> {
let result = join_handle.join();
assert!(result.is_ok());
}

get_metrics();

Ok(())
}

fn get_metrics() {
// Get the metrics from the registry.

let metrics = HOST_REGISTRY.gather();

// Print the metrics.

print!("\nMETRICS:\n");

for metric in metrics.iter() {
match metric.get_field_type() {
prometheus::proto::MetricType::COUNTER => {
println!("Counter: {:?}", metric.help());
metric.get_metric().iter().for_each(|metric| {
let pair = metric.get_label();
for pair in pair.iter() {
println!("Label: {:?} Name: {:?}", pair.name(), pair.value());
}
println!("Value: {:?}", metric.get_counter().value());
});
}
prometheus::proto::MetricType::GAUGE => {
println!("Gauge: {:?}", metric.help());
metric.get_metric().iter().for_each(|metric| {
let pair = metric.get_label();
for pair in pair.iter() {
println!("Label: {:?} Name: {:?}", pair.name(), pair.value());
}
println!("Value: {:?}", metric.get_gauge().value());
});
}
prometheus::proto::MetricType::UNTYPED => {
println!("Metric: {:?}", metric.help());
}
prometheus::proto::MetricType::HISTOGRAM => {
println!("Histogram: {:?}", metric.help());
for metric in metric.get_metric() {
let pair = metric.get_label();
for pair in pair.iter() {
println!("Label: {:?} Name: {:?}", pair.name(), pair.value());
}
let count = metric.get_histogram().get_sample_count();
println!("Number of observations: {:?}", count);
let sm = metric.get_histogram().get_sample_sum();
println!("Sum of observations: {:?}", sm);
metric
.get_histogram()
.get_bucket()
.iter()
.for_each(|bucket| {
println!(
"Bucket: {:?} Count: {:?}",
bucket.upper_bound(),
bucket.cumulative_count()
)
});
}
}
prometheus::proto::MetricType::SUMMARY => {
println!("Summary: {:?}", metric.help());
}
}
}
fn fn_writer(_msg: String) -> Result<i32> {
Ok(0)
}
4 changes: 0 additions & 4 deletions src/hyperlight_host/src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -237,10 +237,6 @@ pub enum HyperlightError {
#[error("Failure processing PE File {0:?}")]
PEFileProcessingFailure(#[from] goblin::error::Error),

/// a Prometheus error occurred
#[error("Prometheus Error {0:?}")]
Prometheus(#[from] prometheus::Error),

/// Raw pointer is less than base address
#[error("Raw pointer ({0:?}) was less than the base address ({1})")]
RawPointerLessThanBaseAddress(RawPtr, u64),
Expand Down
28 changes: 8 additions & 20 deletions src/hyperlight_host/src/func/guest_err.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,13 @@ See the License for the specific language governing permissions and
limitations under the License.
*/

use hyperlight_common::flatbuffer_wrappers::guest_error::{
ErrorCode, GuestError as GuestErrorStruct,
};
use hyperlight_common::flatbuffer_wrappers::guest_error::ErrorCode;

use crate::error::HyperlightError::{GuestError, OutBHandlingError, StackOverflow};
use crate::mem::shared_mem::HostSharedMemory;
use crate::metrics::{METRIC_GUEST_ERROR, METRIC_GUEST_ERROR_LABEL_CODE};
use crate::sandbox::mem_mgr::MemMgrWrapper;
use crate::sandbox::metrics::SandboxMetric::GuestErrorCount;
use crate::{int_counter_vec_inc, log_then_return, Result};
use crate::{log_then_return, Result};
/// Check for a guest error and return an `Err` if one was found,
/// and `Ok` if one was not found.
pub(crate) fn check_for_guest_error(mgr: &MemMgrWrapper<HostSharedMemory>) -> Result<()> {
Expand All @@ -31,7 +29,8 @@ pub(crate) fn check_for_guest_error(mgr: &MemMgrWrapper<HostSharedMemory>) -> Re
ErrorCode::NoError => Ok(()),
ErrorCode::OutbError => match mgr.as_ref().get_host_error()? {
Some(host_err) => {
increment_guest_error_count(&guest_err);
metrics::counter!(METRIC_GUEST_ERROR, METRIC_GUEST_ERROR_LABEL_CODE => (guest_err.code as u64).to_string()).increment(1);

log_then_return!(OutBHandlingError(
host_err.source.clone(),
guest_err.message.clone()
Expand All @@ -41,23 +40,12 @@ pub(crate) fn check_for_guest_error(mgr: &MemMgrWrapper<HostSharedMemory>) -> Re
None => Ok(()),
},
ErrorCode::StackOverflow => {
increment_guest_error_count(&guest_err.clone());
metrics::counter!(METRIC_GUEST_ERROR, METRIC_GUEST_ERROR_LABEL_CODE => (guest_err.code as u64).to_string()).increment(1);
log_then_return!(StackOverflow());
}
_ => {
increment_guest_error_count(&guest_err.clone());
log_then_return!(GuestError(
guest_err.code.clone(),
guest_err.message.clone()
));
metrics::counter!(METRIC_GUEST_ERROR, METRIC_GUEST_ERROR_LABEL_CODE => (guest_err.code as u64).to_string()).increment(1);
log_then_return!(GuestError(guest_err.code, guest_err.message.clone()));
}
}
}

fn increment_guest_error_count(guest_err: &GuestErrorStruct) {
let guest_err_code_string: String = guest_err.code.clone().into();
int_counter_vec_inc!(
&GuestErrorCount,
&[&guest_err_code_string, guest_err.message.clone().as_str()]
);
}
Loading