-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce IC log volume #3638
base: master
Are you sure you want to change the base?
Reduce IC log volume #3638
Conversation
rs/consensus/dkg/src/lib.rs
Outdated
@@ -263,6 +263,7 @@ impl DkgImpl { | |||
.into() | |||
} | |||
Err(err) => { | |||
// TODO: Adjust frequency error log. There are instances of this log line > 200 times/min | |||
error!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should feel free to demote in this PR.
If it happens often the 99% is not an error :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it's a transitive error which should go away after a couple of retries, so it should be a warning at best.
I've adjusted the log
@@ -131,6 +131,7 @@ impl Orchestrator { | |||
thread::sleep(Duration::from_secs(10 * 60)); | |||
let (ipv4, ipv6) = Self::get_ip_addresses(); | |||
|
|||
// TODO: group all newlines together currently split by newline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with this code.
@andrewbattat: does Pietro's suggestion sound reasonable to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the suggestion to turn this:
let message = indoc::formatdoc!(
r#"
Node-id: {node_id}
Replica version: {version}
IPv6: {ipv6}
IPv4: {ipv4}
"#
);
Into this:
let message = format!(
"Node-id: {node_id}, Replica version: {version}, IPv6: {ipv6}, IPv4: {ipv4}"
);
?
This message is logged to the host console. It's purpose is so that when a node provider opens their terminal, they have this info immediately surfaced—so the newlines are there for readability. So I'd rather not sacrifice this readability if we don't have to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for cleaning up the logs.
@@ -2015,6 +2021,7 @@ fn observe_replicated_state_metrics( | |||
// Log all old call contexts, but not (nearly) every round. | |||
if current_round.get() % SPAMMY_LOG_INTERVAL_ROUNDS == 0 { | |||
for (origin, origin_time) in &old_call_contexts { | |||
// TODO: Demote it to INFO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to make the change in this PR.
// sender: CanisterId(dtvyz-2iaaa-aaaao-a2xxq-cai), sender_reply_callback: 20791, | ||
// payment: Cycles(0), method_name: "index", method_payload: "10 bytes;4449444c016d7b010000", | ||
// metadata: RequestMetadata { call_tree_depth: 1, call_tree_start_time: Time(1737940399724249864) }, | ||
// deadline: CoarseTime(0) }) on same subnet failed with error 'Canister 7xpjo-2iaaa-aaaao-a2vxq-cai is stopped'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the error always that the canister is stopped? Maybe we should match on the error and log the other cases, but skip that one.
This PR provides a set of recommendations for reducing log volume and minimizing redundancy within the OBS stack.