-
Notifications
You must be signed in to change notification settings - Fork 18
Rpc/execution related metrics #584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- needs to be cleaned up to handle all program loaders - unwrap needs to be removed as well
- covered in newer tests - not repeatable due to hardcoded accounts - testing obsolete behavior (non-eager cloning)
Manual Deploy AvailableYou can trigger a manual deploy of this PR branch to testnet: Alternative: Comment
Comment updated automatically when the PR is synchronized. |
WalkthroughAdds a metrics crate to the workspace and instruments aperture and processor code with new label-aware metrics: per-RPC request timers/counters, transaction processing/skip metrics, ensure-accounts timers, WebSocket subscription guards, and a failed-transaction counter; also adds RPC method string helpers and small cleanup in account mutation. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant HttpDispatcher
participant RequestHandler
participant AccountsEnsurer
participant TxProcessor
note over HttpDispatcher: Metrics: RPC_REQUESTS_COUNT, RPC_REQUEST_HANDLING_TIME
Client->>HttpDispatcher: send RPC request (method)
HttpDispatcher->>RPC: RPC_REQUESTS_COUNT.inc(method)
HttpDispatcher->>RPC: RPC_REQUEST_HANDLING_TIME.start(method)
HttpDispatcher->>RequestHandler: dispatch request
alt send_transaction
RequestHandler->>TxProcessor: prepare_transaction(...)
TxProcessor->>RPC: TRANSACTION_PROCESSING_TIME.start()
alt skip_preflight
TxProcessor->>RPC: TRANSACTION_SKIP_PREFLIGHT.inc()
end
TxProcessor->>AccountsEnsurer: ensure transaction accounts
AccountsEnsurer->>RPC: ENSURE_ACCOUNTS_TIME.start("transaction")
AccountsEnsurer-->>TxProcessor: ensured / warn on failure
TxProcessor-->>RequestHandler: result
else account read
RequestHandler->>AccountsEnsurer: read_account_with_ensure(...)
AccountsEnsurer->>RPC: ENSURE_ACCOUNTS_TIME.start("account"/"multi-account")
AccountsEnsurer-->>RequestHandler: ensured / warn on failure
end
RequestHandler-->>HttpDispatcher: response
HttpDispatcher->>RPC: RPC_REQUEST_HANDLING_TIME.observe_finish()
sequenceDiagram
autonumber
participant Client
participant WsDispatcher
participant SubscriptionManager
participant RPC
note over WsDispatcher: SubMetricGuard increments/decrements RPC_WS_SUBSCRIPTIONS_COUNT
Client->>WsDispatcher: subscribe(account/program/logs/slot)
WsDispatcher->>SubscriptionManager: create subscription
SubscriptionManager->>RPC: RPC_WS_SUBSCRIPTIONS_COUNT.inc(label) -- via SubMetricGuard::new
SubscriptionManager-->>WsDispatcher: subscription id
Client->>WsDispatcher: unsubscribe(id)
WsDispatcher->>SubscriptionManager: remove subscription
SubscriptionManager->>RPC: RPC_WS_SUBSCRIPTIONS_COUNT.dec(label) -- via SubMetricGuard drop
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (5)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2025-10-21T14:00:54.642ZApplied to files:
🧬 Code graph analysis (4)magicblock-aperture/src/requests/http/mod.rs (1)
magicblock-aperture/src/state/subscriptions.rs (2)
magicblock-aperture/src/requests/http/send_transaction.rs (1)
magicblock-metrics/src/metrics/mod.rs (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
🔇 Additional comments (9)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
magicblock-aperture/src/state/subscriptions.rs (2)
109-121: Avoid metric leaks on shutdown: move guard to SubscriptionHandle (RAII), not async cleanupDecrementing via drop(metric) inside a spawned cleanup task can be skipped during runtime shutdown, leaving RPC_WS_SUBSCRIPTIONS_COUNT over-reported. Prefer owning the guard on SubscriptionHandle so the gauge decrements synchronously on handle drop; keep the async DB cleanup as-is.
Apply these changes:
@@ pub(crate) async fn subscribe_to_account(...) - let metric = SubMetricGuard::new("account"); + let metric = SubMetricGuard::new("account"); @@ - if entry.remove_subscriber(conid, &encoder) { + if entry.remove_subscriber(conid, &encoder) { let _ = entry.remove(); } - drop(metric) }; - let cleanup = CleanUp(Some(Box::pin(callback))); - SubscriptionHandle { id, cleanup } + let cleanup = CleanUp(Some(Box::pin(callback))); + SubscriptionHandle { id, cleanup, _metric: Some(metric) }And update the handle definition (see Lines 376-379).
372-379: Store metric in SubscriptionHandle to guarantee gauge decrement on dropThe metric is currently created and incremented at each subscription constructor (lines 109, 151, 219, 248) but only decremented when the async cleanup task runs. If that task fails or doesn't run, the gauge leaks. Add
_metric: Option<SubMetricGuard>to the struct and update all 4 constructors (lines 123, 162, 225, 254) to include it—this ensures synchronous decrement when the handle is dropped, even if async cleanup fails.pub(crate) struct SubscriptionHandle { pub(crate) id: SubscriptionID, pub(crate) cleanup: CleanUp, + _metric: Option<SubMetricGuard>, }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (11)
magicblock-aperture/Cargo.toml(1 hunks)magicblock-aperture/src/requests/http/mod.rs(4 hunks)magicblock-aperture/src/requests/http/send_transaction.rs(3 hunks)magicblock-aperture/src/requests/mod.rs(1 hunks)magicblock-aperture/src/server/http/dispatch.rs(2 hunks)magicblock-aperture/src/server/websocket/dispatch.rs(2 hunks)magicblock-aperture/src/state/subscriptions.rs(7 hunks)magicblock-metrics/src/metrics/mod.rs(3 hunks)magicblock-processor/Cargo.toml(1 hunks)magicblock-processor/src/executor/processing.rs(2 hunks)programs/magicblock/src/mutate_accounts/account_mod_data.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-21T14:00:54.642Z
Learnt from: bmuddha
PR: magicblock-labs/magicblock-validator#578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
Applied to files:
magicblock-aperture/src/state/subscriptions.rs
🧬 Code graph analysis (3)
magicblock-aperture/src/requests/http/send_transaction.rs (1)
magicblock-aperture/src/requests/mod.rs (1)
params(22-26)
magicblock-aperture/src/state/subscriptions.rs (2)
magicblock-aperture/src/server/websocket/dispatch.rs (1)
new(50-62)magicblock-aperture/src/server/websocket/connection.rs (1)
new(66-84)
magicblock-metrics/src/metrics/mod.rs (1)
magicblock-aperture/src/state/subscriptions.rs (2)
new(338-351)new(406-409)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: run_make_ci_test
- GitHub Check: run_make_ci_lint
- GitHub Check: run_make_ci_test
- GitHub Check: run_make_ci_format
🔇 Additional comments (21)
programs/magicblock/src/mutate_accounts/account_mod_data.rs (1)
74-76: LGTM! Clean refactor aligning with centralized metrics.The simplification to a single-line expression is idiomatic and preserves the original behavior. The removal of local metrics updates aligns with the broader PR objective of centralizing metrics in the
magicblock-metricscrate.magicblock-aperture/Cargo.toml (1)
36-36: LGTM! Dependency addition supports metrics instrumentation.The addition of the
magicblock-metricsworkspace dependency enables the aperture crate to instrument HTTP/WebSocket dispatch and request processing with the new centralized metrics system.magicblock-processor/src/executor/processing.rs (2)
11-11: LGTM! Import supports transaction failure tracking.The import of
FAILED_TRANSACTIONS_COUNTenables instrumentation of transaction load failures in the executor.
58-58: LGTM! Metric correctly placed in error path.The increment of
FAILED_TRANSACTIONS_COUNTis appropriately positioned after committing the failed transaction and before returning, accurately tracking transactions that fail to load.magicblock-aperture/src/requests/mod.rs (2)
91-150: LGTM! Comprehensive method-to-string mapping for HTTP methods.The
as_strimplementation provides consistent camelCase labels for all HTTP RPC methods, supporting metrics instrumentation. All enum variants are covered with correct mappings.
152-167: LGTM! Comprehensive method-to-string mapping for WebSocket methods.The
as_strimplementation provides consistent camelCase labels for all WebSocket RPC methods, supporting metrics instrumentation. All enum variants are covered with correct mappings.magicblock-processor/Cargo.toml (1)
19-19: LGTM! Dependency addition supports transaction failure metrics.The addition of the
magicblock-metricsworkspace dependency enables the processor crate to track failed transactions usingFAILED_TRANSACTIONS_COUNT.magicblock-aperture/src/server/http/dispatch.rs (2)
9-11: LGTM! Imports enable HTTP request metrics.The imports of
RPC_REQUESTS_COUNTandRPC_REQUEST_HANDLING_TIMEsupport per-method request tracking and latency measurement in the HTTP dispatch path.
117-121: LGTM! Well-positioned metrics instrumentation.The metrics are correctly placed at the start of the
processmethod:
- Request count incremented immediately with the method label
- Timer started using RAII pattern (automatically records on drop)
- Both metrics use the same method label for consistent correlation
magicblock-aperture/src/server/websocket/dispatch.rs (2)
20-21: LGTM! Import enables WebSocket request counting.The import of
RPC_REQUESTS_COUNTsupports per-method request tracking in the WebSocket dispatch path.
70-72: LGTM! Consistent WebSocket request metrics.The counter increment is appropriately placed before request handling, using the method label for consistent tracking across HTTP and WebSocket transports.
magicblock-aperture/src/requests/http/send_transaction.rs (4)
1-4: LGTM! Imports enable transaction metrics and logging.The imports support:
- Enhanced error logging via
warn!- Transaction processing time measurement
- Preflight skip tracking
21-21: LGTM! Timer correctly positioned for full request duration.The
TRANSACTION_PROCESSING_TIMEtimer is started at function entry and will automatically record the full processing duration when dropped (RAII pattern).
31-31: LGTM! Enhanced error observability.The
inspect_errpattern adds warning-level logging for transaction preparation failures without affecting the error propagation flow.
46-46: LGTM! Preflight skip tracking.The metric increment correctly tracks when transactions bypass preflight validation, providing valuable insights into transaction submission patterns.
magicblock-aperture/src/requests/http/mod.rs (2)
105-108: Good: Low-overhead timing instrumentationUsing ENSURE_ACCOUNTS_TIME.start_timer() scoped via _timer is correct; drop records duration reliably.
196-199: Metrics registration verified at startupMetrics are correctly registered early in the initialization chain. The
try_start_metrics_service()function inmagicblock-api/src/magic_validator.rs:216is called during validator startup viaMagicValidator::try_from_config(), which ensuresmetrics::register()executes before the metrics service spawns. This timing guarantees observations are available for export.magicblock-aperture/src/state/subscriptions.rs (2)
18-18: WS subscriptions metric importImport looks correct; label cardinality constrained by &'static str usage.
96-124: Note: account_subscribe config fields still ignored (tracking issue #579)Instrumentation doesn’t affect behavior; the known omission (data_slice, commitment, min_context_slot ignored) remains. Just a reminder; no change requested here.
Based on learnings
magicblock-metrics/src/metrics/mod.rs (2)
114-168: Metrics additions and registration look solidNew Histogram/Vec and Counter/Vec definitions use consistent buckets; all are registered.
Also applies to: 226-233
193-234: Verified: metrics::register() is invoked during startupThe call chain is confirmed:
magicblock-validator/src/main.rs:97callsMagicValidator::try_from_config(), which in turn callstry_start_metrics_service()atmagicblock-api/src/magic_validator.rs:216, which immediately invokesmetrics::register()atmagicblock-metrics/src/service.rs:20. Collectors are properly exported under the mbv registry during validator initialization.
| let metric = SubMetricGuard::new("slot"); | ||
| let callback = async move { | ||
| slot.write().txs.remove(&conid); | ||
| drop(metric) | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Same fix for slot subscriptions
Store the guard on the handle; remove drop(metric) in the callback.
- let metric = SubMetricGuard::new("slot");
+ let metric = SubMetricGuard::new("slot");
@@
- slot.write().txs.remove(&conid);
- drop(metric)
+ slot.write().txs.remove(&conid);
};
- let cleanup = CleanUp(Some(Box::pin(callback)));
- SubscriptionHandle { id, cleanup }
+ let cleanup = CleanUp(Some(Box::pin(callback)));
+ SubscriptionHandle { id, cleanup, _metric: Some(metric) }Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In magicblock-aperture/src/state/subscriptions.rs around lines 248 to 252, the
SubMetricGuard created for "slot" is dropped inside the async callback which
prematurely ends the metric; instead store the guard on the subscription handle
so its lifetime matches the subscription and remove the explicit drop(metric)
from the callback; modify the subscription handle (or its enclosing struct) to
hold the SubMetricGuard for the slot subscription and move/assign the guard into
that handle when creating the callback, leaving the callback to only perform
slot.write().txs.remove(&conid).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general, just some nits we should address/clarify.
Also I feel sorry for @lucacillario since he'll have to update all the dashboards 😓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but maybe address some of the coderabbit comments before merging.
This PR adds some metrics related to RPC methods processing and execution timings, which can provide useful insights into common usage patterns and areas of improvement.
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Chores