Add ControllerStartupLatency metric for SandboxClaims#522
Add ControllerStartupLatency metric for SandboxClaims#522igooch wants to merge 1 commit intokubernetes-sigs:mainfrom
Conversation
✅ Deploy Preview for agent-sandbox canceled.
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: igooch The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| // Inline patch, no early return, to avoid forcing a second reconcile cycle. | ||
| tc := r.Tracer.GetTraceContext(ctx) | ||
| if tc != "" && (claim.Annotations == nil || claim.Annotations[asmetrics.TraceContextAnnotation] == "") { | ||
| obsAnnotation := "agent-sandbox.kubernetes.io/controller-first-observed-at" |
There was a problem hiding this comment.
Nit: I would just call the variables traceContext, observabilityAnnotation, needObservabilityPatch, needTraceContextPatch; it makes it easier to read
| tc := r.Tracer.GetTraceContext(ctx) | ||
| if tc != "" && (claim.Annotations == nil || claim.Annotations[asmetrics.TraceContextAnnotation] == "") { | ||
| obsAnnotation := "agent-sandbox.kubernetes.io/controller-first-observed-at" | ||
| needObsPatch := claim.Annotations == nil || claim.Annotations[obsAnnotation] == "" |
There was a problem hiding this comment.
| needObsPatch := claim.Annotations == nil || claim.Annotations[obsAnnotation] == "" | |
| needObsPatch := claim.Annotations[obsAnnotation] == "" |
You can read from a nil map, it is treated as the empty map
| if tc != "" && (claim.Annotations == nil || claim.Annotations[asmetrics.TraceContextAnnotation] == "") { | ||
| obsAnnotation := "agent-sandbox.kubernetes.io/controller-first-observed-at" | ||
| needObsPatch := claim.Annotations == nil || claim.Annotations[obsAnnotation] == "" | ||
| needTcPatch := tc != "" && (claim.Annotations == nil || claim.Annotations[asmetrics.TraceContextAnnotation] == "") |
There was a problem hiding this comment.
| needTcPatch := tc != "" && (claim.Annotations == nil || claim.Annotations[asmetrics.TraceContextAnnotation] == "") | |
| needTcPatch := tc != "" && claim.Annotations[asmetrics.TraceContextAnnotation] == "" |
|
|
||
| // Record controller startup latency | ||
| obsAnnotation := "agent-sandbox.kubernetes.io/controller-first-observed-at" | ||
| if claim.Annotations != nil && claim.Annotations[obsAnnotation] != "" { |
There was a problem hiding this comment.
| if claim.Annotations != nil && claim.Annotations[obsAnnotation] != "" { | |
| if obsStr := claim.Annotations[obsAnnotation]; obsStr != "" { |
| } | ||
| claim.Annotations[asmetrics.TraceContextAnnotation] = tc | ||
| if needObsPatch { | ||
| claim.Annotations[obsAnnotation] = time.Now().Format(time.RFC3339Nano) |
There was a problem hiding this comment.
You might consider just keeping an in memory map, but ... given we're already writing to the apiserver for the trace... sgtm
|
Thanks Ivy. I think we also need to have a version of the original metric where it optionally looks at a client provided timestamp (in a pre-defined annotation). OR we can skip emitting the claim_latency_metric if that annotation is not set. |
This PR introduces a new metric,
agent_sandbox_claim_controller_startup_latency_ms, to provide higher precision tracking of SandboxClaim startup performance.Problem
Currently, startup latency is measured using the standard Kubernetes
creationTimestamp. However, this timestamp has one-second granularity. For fast-provisioning resources like SandboxClaims, where target latencies are often in the millisecond range, this granularity is too coarse and leads to inaccurate P50/P90 metrics.Proposed Solution
The controller now stamps a high-precision
controller-first-observed-atannotation during its first reconciliation cycle. The new metric measures the duration from this observation point to the "Ready" state.Notes for the reviewer