Skip to content

Commit 6e131b8

Browse files
committed
Rework metrics to use metrics crate
Signed-off-by: Ludvig Liljenberg <[email protected]>
1 parent f67cf90 commit 6e131b8

File tree

24 files changed

+657
-1882
lines changed

24 files changed

+657
-1882
lines changed

Diff for: Cargo.lock

+420-25
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Diff for: docs/hyperlight-metrics-logs-and-traces.md

+8-22
Original file line numberDiff line numberDiff line change
@@ -2,47 +2,33 @@
22

33
Hyperlight provides the following observability features:
44

5-
* [Metrics](#metrics) are provided using Prometheus.
5+
* [Metrics](#metrics) are provided using the [metrics](https://docs.rs/metrics/latest/metrics/index.html) crate, which is a lightweight metrics facade.
66
* [Logs](#logs) are provided using the Rust [log crate](https://docs.rs/log/0.4.6/log/), and can be consumed by any Rust logger implementation, including LogTracer which can be used to emit log records as tracing events.
77
* [Tracing](#tracing) is provided using the Rust [tracing crate](https://docs.rs/tracing/0.1.37/tracing/), and can be consumed by any Rust tracing implementation. In addition, the [log feature](https://docs.rs/tracing/latest/tracing/#crate-feature-flags) is enabled which means that should a hyperlight host application not want to consume tracing events, you can still consume them as logs.
88

99
## Metrics
1010

11-
Hyperlight provides metrics using Prometheus. The metrics are registered using either the [default_registry](https://docs.rs/prometheus/latest/prometheus/fn.default_registry.html) or a registry instance provided by the host application.
11+
Metrics are provided using the [metrics](https://docs.rs/metrics/latest/metrics/index.html) crate, which is a lightweight metrics facade. When an executable installs a recorder, Hyperlight will emit its metrics to that record, which allows library authors to seamless emit their own metrics without knowing or caring which exporter implementation is chosen, or even if one is installed. In case no recorder is installed, the metrics will be emitted to the default recorder, which is a no-op implementation with minimal overhead.
1212

13-
To provide a registry to Hyperlight, use the `set_metrics_registry` function and pass a reference to a registry with `static` lifetime:
14-
15-
```rust
16-
use hyperlight_host::metrics::set_metrics_registry;
17-
use prometheus::Registry;
18-
use lazy_static::lazy_static;
19-
20-
lazy_static! {
21-
static ref REGISTRY: Registry = Registry::new();
22-
}
23-
24-
set_metrics_registry(&REGISTRY);
25-
```
13+
There are many different implementations of recorders. One example is the [prometheus exporter](https://docs.rs/metrics-exporter-prometheus/latest/metrics_exporter_prometheus/) which can be used to export metrics to a Prometheus server.
14+
Hyperlight provides metrics using Prometheus.
2615

2716
The following metrics are provided and are enabled by default:
2817

29-
* `hyperlight_guest_error_count` - a vector of counters that tracks the number of guest errors by code and message.
30-
* `hyperlight_number_of_cancelled_guest_execution` - a counter that tracks the number of guest executions that have been cancelled because the execution time exceeded the time allowed.
18+
* `NUM_GUEST_ERRORS` - Counter that tracks the number of guest errors by code and message.
19+
* `NUM_GUEST_CANCELLATIONS` - Counter that tracks the number of guest executions that have been cancelled because the execution time exceeded the time allowed.
3120

3221
The following metrics are provided but are disabled by default and require the feature `function_call_metrics` to be enabled:
3322

34-
* `hyperlight_guest_function_call_duration_microseconds` - a vector of histograms that tracks the execution time of guest functions in microseconds by function name. The histogram also tracks the number of calls to each function.
35-
* `hyperlight_host_function_calls_duration_microseconds` - a vector of histograms that tracks the execution time of host functions in microseconds by function name. The histogram also tracks the number of calls to each function.
23+
* `GUEST_CALL_DURATION` - Histogram that tracks the execution time of guest functions in seconds by function name. The histogram also tracks the number of calls to each function.
24+
* `HOST_CALL_DURATION` - Histogram that tracks the execution time of host functions in seconds by function name. The histogram also tracks the number of calls to each function.
3625

3726
The rationale for disabling the function call metrics by default is that:
3827

3928
* A Hyperlight host may wish to provide its own metrics for function calls.
4029
* Enabling a trace subscriber will cause the function call metrics to be emitted as trace events, which may be sufficient for some use cases.
4130

4231
There is an example of how to gather metrics in the [examples/metrics](../src/hyperlight_host/examples/metrics) directory.
43-
44-
The metrics capabilities provided by Hyperlight can also be used by a library or host that is using Hyperlight to provide additional metrics, see the [hypervisor metrics module](../src/hyperlight_host/src/hypervisor/metrics.rs) for an example of how to define metrics.
45-
4632
## Logs
4733

4834
Hyperlight provides logs using the Rust [log crate](https://docs.rs/log/0.4.6/log/), and can be consumed by any Rust logger implementation, including LogTracer which can be used to emit log records as tracing events(see below for more details). To consume logs, the host application must provide a logger implementation either by using the `set_logger` function directly or using a logger implementation that is compatible with the log crate.

Diff for: src/hyperlight_common/src/flatbuffer_wrappers/guest_error.rs

+2-2
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ use crate::flatbuffers::hyperlight::generated::{
2828
ErrorCode as FbErrorCode, GuestError as FbGuestError, GuestErrorArgs,
2929
};
3030

31-
#[derive(Debug, Clone, Eq, PartialEq)]
31+
#[derive(Debug, Clone, Copy, Eq, PartialEq)]
3232
#[repr(C)]
3333
/// `ErrorCode` represents an error that occurred in the Hyperlight Guest.
3434
pub enum ErrorCode {
@@ -222,7 +222,7 @@ impl TryFrom<&GuestError> for Vec<u8> {
222222
let guest_error_fb = FbGuestError::create(
223223
&mut builder,
224224
&GuestErrorArgs {
225-
code: value.code.clone().into(),
225+
code: value.code.into(),
226226
message: Some(message),
227227
},
228228
);

Diff for: src/hyperlight_host/Cargo.toml

+4-4
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,6 @@ lazy_static = "1.4.0"
3434
serde = { version = "1.0", features = ["derive"] }
3535
serde_json = "1.0"
3636
log = "0.4.26"
37-
once_cell = { version = "1.21.1" }
3837
tracing = { version = "0.1.41", features = ["log"] }
3938
tracing-log = "0.2.0"
4039
tracing-core = "0.1.33"
@@ -43,11 +42,10 @@ vmm-sys-util = "0.12.1"
4342
crossbeam = "0.8.0"
4443
crossbeam-channel = "0.5.14"
4544
thiserror = "2.0.12"
46-
prometheus = "0.13.3"
47-
strum = { version = "0.27", features = ["derive"] }
4845
tempfile = { version = "3.18", optional = true }
4946
serde_yaml = "0.9"
5047
anyhow = "1.0"
48+
metrics = "0.24.1"
5149

5250
[target.'cfg(windows)'.dependencies]
5351
windows = { version = "0.59", features = [
@@ -104,6 +102,8 @@ opentelemetry_sdk = { version = "0.28", features = ["rt-tokio"] }
104102
tokio = { version = "1.43.0", features = ["full"] }
105103
criterion = "0.5.1"
106104
tracing-chrome = "0.7.2"
105+
metrics-util = "0.19.0"
106+
metrics-exporter-prometheus = "0.16.2"
107107

108108
[target.'cfg(windows)'.dev-dependencies]
109109
windows = { version = "0.59", features = [
@@ -119,7 +119,7 @@ cfg_aliases = "0.2.1"
119119
built = { version = "0.7.7", features = ["chrono", "git2"] }
120120

121121
[features]
122-
default = ["kvm", "mshv2", "seccomp"]
122+
default = ["kvm", "mshv2", "seccomp", "function_call_metrics"]
123123
seccomp = ["dep:seccompiler"]
124124
function_call_metrics = []
125125
executable_heap = []

Diff for: src/hyperlight_host/examples/metrics/main.rs

+23-82
Original file line numberDiff line numberDiff line change
@@ -22,22 +22,27 @@ use hyperlight_common::flatbuffer_wrappers::function_types::{ParameterValue, Ret
2222
use hyperlight_host::sandbox::uninitialized::UninitializedSandbox;
2323
use hyperlight_host::sandbox_state::sandbox::EvolvableSandbox;
2424
use hyperlight_host::sandbox_state::transition::Noop;
25-
use hyperlight_host::{set_metrics_registry, GuestBinary, MultiUseSandbox, Result};
25+
use hyperlight_host::{GuestBinary, MultiUseSandbox, Result};
2626
use hyperlight_testing::simple_guest_as_string;
27-
use lazy_static::lazy_static;
28-
use prometheus::Registry;
2927

30-
lazy_static! {
31-
static ref HOST_REGISTRY: Registry = Registry::new();
32-
}
33-
fn fn_writer(_msg: String) -> Result<i32> {
34-
Ok(0)
28+
fn main() {
29+
// Install prometheus metrics exporter.
30+
// We only install the metrics recorder here, but you can also use the
31+
// `metrics_exporter_prometheus::PrometheusBuilder::new().install()` method
32+
// to install a HTTP listener that serves the metrics.
33+
let prometheus_handle = metrics_exporter_prometheus::PrometheusBuilder::new()
34+
.install_recorder()
35+
.expect("Failed to install Prometheus exporter");
36+
37+
// Do some hyperlight stuff to generate metrics.
38+
do_hyperlight_stuff();
39+
40+
// Get the metrics and print them in prometheus exposition format.
41+
let payload = prometheus_handle.render();
42+
println!("Prometheus metrics:\n{}", payload);
3543
}
3644

37-
fn main() -> Result<()> {
38-
// If this is not called then the default registry `prometheus::default_registry` will be used.
39-
set_metrics_registry(&HOST_REGISTRY)?;
40-
45+
fn do_hyperlight_stuff() {
4146
// Get the path to a simple guest binary.
4247
let hyperlight_guest_path =
4348
simple_guest_as_string().expect("Cannot find the guest binary at the expected location.");
@@ -60,7 +65,7 @@ fn main() -> Result<()> {
6065

6166
let no_op = Noop::<UninitializedSandbox, MultiUseSandbox>::default();
6267

63-
let mut multiuse_sandbox = usandbox.evolve(no_op)?;
68+
let mut multiuse_sandbox = usandbox.evolve(no_op).expect("Failed to evolve sandbox");
6469

6570
// Call a guest function 5 times to generate some metrics.
6671
for _ in 0..5 {
@@ -97,13 +102,14 @@ fn main() -> Result<()> {
97102
None,
98103
None,
99104
None,
100-
)?;
105+
)
106+
.expect("Failed to create UninitializedSandbox");
101107

102108
// Initialize the sandbox.
103109

104110
let no_op = Noop::<UninitializedSandbox, MultiUseSandbox>::default();
105111

106-
let mut multiuse_sandbox = usandbox.evolve(no_op)?;
112+
let mut multiuse_sandbox = usandbox.evolve(no_op).expect("Failed to evolve sandbox");
107113

108114
// Call a function that gets cancelled by the host function 5 times to generate some metrics.
109115

@@ -121,73 +127,8 @@ fn main() -> Result<()> {
121127
let result = join_handle.join();
122128
assert!(result.is_ok());
123129
}
124-
125-
get_metrics();
126-
127-
Ok(())
128130
}
129131

130-
fn get_metrics() {
131-
// Get the metrics from the registry.
132-
133-
let metrics = HOST_REGISTRY.gather();
134-
135-
// Print the metrics.
136-
137-
print!("\nMETRICS:\n");
138-
139-
for metric in metrics.iter() {
140-
match metric.get_field_type() {
141-
prometheus::proto::MetricType::COUNTER => {
142-
println!("Counter: {:?}", metric.get_help());
143-
metric.get_metric().iter().for_each(|metric| {
144-
let pair = metric.get_label();
145-
for pair in pair.iter() {
146-
println!("Label: {:?} Name: {:?}", pair.get_name(), pair.get_value());
147-
}
148-
println!("Value: {:?}", metric.get_counter().get_value());
149-
});
150-
}
151-
prometheus::proto::MetricType::GAUGE => {
152-
println!("Gauge: {:?}", metric.get_help());
153-
metric.get_metric().iter().for_each(|metric| {
154-
let pair = metric.get_label();
155-
for pair in pair.iter() {
156-
println!("Label: {:?} Name: {:?}", pair.get_name(), pair.get_value());
157-
}
158-
println!("Value: {:?}", metric.get_gauge().get_value());
159-
});
160-
}
161-
prometheus::proto::MetricType::UNTYPED => {
162-
println!("Metric: {:?}", metric.get_help());
163-
}
164-
prometheus::proto::MetricType::HISTOGRAM => {
165-
println!("Histogram: {:?}", metric.get_help());
166-
for metric in metric.get_metric() {
167-
let pair = metric.get_label();
168-
for pair in pair.iter() {
169-
println!("Label: {:?} Name: {:?}", pair.get_name(), pair.get_value());
170-
}
171-
let count = metric.get_histogram().get_sample_count();
172-
println!("Number of observations: {:?}", count);
173-
let sm = metric.get_histogram().get_sample_sum();
174-
println!("Sum of observations: {:?}", sm);
175-
metric
176-
.get_histogram()
177-
.get_bucket()
178-
.iter()
179-
.for_each(|bucket| {
180-
println!(
181-
"Bucket: {:?} Count: {:?}",
182-
bucket.get_upper_bound(),
183-
bucket.get_cumulative_count()
184-
)
185-
});
186-
}
187-
}
188-
prometheus::proto::MetricType::SUMMARY => {
189-
println!("Summary: {:?}", metric.get_help());
190-
}
191-
}
192-
}
132+
fn fn_writer(_msg: String) -> Result<i32> {
133+
Ok(0)
193134
}

Diff for: src/hyperlight_host/src/error.rs

-4
Original file line numberDiff line numberDiff line change
@@ -237,10 +237,6 @@ pub enum HyperlightError {
237237
#[error("Failure processing PE File {0:?}")]
238238
PEFileProcessingFailure(#[from] goblin::error::Error),
239239

240-
/// a Prometheus error occurred
241-
#[error("Prometheus Error {0:?}")]
242-
Prometheus(#[from] prometheus::Error),
243-
244240
/// Raw pointer is less than base address
245241
#[error("Raw pointer ({0:?}) was less than the base address ({1})")]
246242
RawPointerLessThanBaseAddress(RawPtr, u64),

Diff for: src/hyperlight_host/src/func/guest_err.rs

+8-20
Original file line numberDiff line numberDiff line change
@@ -14,15 +14,13 @@ See the License for the specific language governing permissions and
1414
limitations under the License.
1515
*/
1616

17-
use hyperlight_common::flatbuffer_wrappers::guest_error::{
18-
ErrorCode, GuestError as GuestErrorStruct,
19-
};
17+
use hyperlight_common::flatbuffer_wrappers::guest_error::ErrorCode;
2018

2119
use crate::error::HyperlightError::{GuestError, OutBHandlingError, StackOverflow};
2220
use crate::mem::shared_mem::HostSharedMemory;
21+
use crate::metrics::{CounterMetric, Metric};
2322
use crate::sandbox::mem_mgr::MemMgrWrapper;
24-
use crate::sandbox::metrics::SandboxMetric::GuestErrorCount;
25-
use crate::{int_counter_vec_inc, log_then_return, Result};
23+
use crate::{log_then_return, Result};
2624
/// Check for a guest error and return an `Err` if one was found,
2725
/// and `Ok` if one was not found.
2826
pub(crate) fn check_for_guest_error(mgr: &MemMgrWrapper<HostSharedMemory>) -> Result<()> {
@@ -31,7 +29,8 @@ pub(crate) fn check_for_guest_error(mgr: &MemMgrWrapper<HostSharedMemory>) -> Re
3129
ErrorCode::NoError => Ok(()),
3230
ErrorCode::OutbError => match mgr.as_ref().get_host_error()? {
3331
Some(host_err) => {
34-
increment_guest_error_count(&guest_err);
32+
CounterMetric::guest_error(guest_err.code.into(), guest_err.message.clone()).emit();
33+
3534
log_then_return!(OutBHandlingError(
3635
host_err.source.clone(),
3736
guest_err.message.clone()
@@ -41,23 +40,12 @@ pub(crate) fn check_for_guest_error(mgr: &MemMgrWrapper<HostSharedMemory>) -> Re
4140
None => Ok(()),
4241
},
4342
ErrorCode::StackOverflow => {
44-
increment_guest_error_count(&guest_err.clone());
43+
CounterMetric::guest_error(guest_err.code.into(), guest_err.message.clone()).emit();
4544
log_then_return!(StackOverflow());
4645
}
4746
_ => {
48-
increment_guest_error_count(&guest_err.clone());
49-
log_then_return!(GuestError(
50-
guest_err.code.clone(),
51-
guest_err.message.clone()
52-
));
47+
CounterMetric::guest_error(guest_err.code.into(), guest_err.message.clone()).emit();
48+
log_then_return!(GuestError(guest_err.code, guest_err.message.clone()));
5349
}
5450
}
5551
}
56-
57-
fn increment_guest_error_count(guest_err: &GuestErrorStruct) {
58-
let guest_err_code_string: String = guest_err.code.clone().into();
59-
int_counter_vec_inc!(
60-
&GuestErrorCount,
61-
&[&guest_err_code_string, guest_err.message.clone().as_str()]
62-
);
63-
}

Diff for: src/hyperlight_host/src/hypervisor/hypervisor_handler.rs

+5-9
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,6 @@ use windows::Win32::System::Hypervisor::{WHvCancelRunVirtualProcessor, WHV_PARTI
3737

3838
#[cfg(gdb)]
3939
use super::gdb::create_gdb_thread;
40-
#[cfg(feature = "function_call_metrics")]
41-
use crate::histogram_vec_observe;
4240
#[cfg(gdb)]
4341
use crate::hypervisor::handlers::DbgMemAccessHandlerWrapper;
4442
use crate::hypervisor::handlers::{MemAccessHandlerWrapper, OutBHandlerWrapper};
@@ -50,11 +48,10 @@ use crate::mem::mgr::SandboxMemoryManager;
5048
use crate::mem::ptr::{GuestPtr, RawPtr};
5149
use crate::mem::ptr_offset::Offset;
5250
use crate::mem::shared_mem::{GuestSharedMemory, HostSharedMemory, SharedMemory};
51+
use crate::metrics::Metric;
5352
#[cfg(gdb)]
5453
use crate::sandbox::config::DebugInfo;
5554
use crate::sandbox::hypervisor::{get_available_hypervisor, HypervisorType};
56-
#[cfg(feature = "function_call_metrics")]
57-
use crate::sandbox::metrics::SandboxMetric::GuestFunctionCallDurationMicroseconds;
5855
#[cfg(target_os = "linux")]
5956
use crate::signal_handlers::setup_signal_handlers;
6057
use crate::HyperlightError::{
@@ -438,6 +435,8 @@ impl HypervisorHandler {
438435
let res = {
439436
#[cfg(feature = "function_call_metrics")]
440437
{
438+
use crate::metrics::HistogramMetric;
439+
441440
let start = std::time::Instant::now();
442441
let result = hv.dispatch_call_from_host(
443442
dispatch_function_addr,
@@ -447,11 +446,8 @@ impl HypervisorHandler {
447446
#[cfg(gdb)]
448447
configuration.dbg_mem_access_handler.clone(),
449448
);
450-
histogram_vec_observe!(
451-
&GuestFunctionCallDurationMicroseconds,
452-
&[function_name.as_str()],
453-
start.elapsed().as_micros() as f64
454-
);
449+
let elapsed = start.elapsed();
450+
HistogramMetric::guest_call(elapsed).emit();
455451
result
456452
}
457453

0 commit comments

Comments
 (0)