Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce IC log volume #3638

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

pietrodimarco-dfinity
Copy link
Contributor

This PR provides a set of recommendations for reducing log volume and minimizing redundancy within the OBS stack.

@pietrodimarco-dfinity pietrodimarco-dfinity requested review from a team as code owners January 27, 2025 16:00
@@ -263,6 +263,7 @@ impl DkgImpl {
.into()
}
Err(err) => {
// TODO: Adjust frequency error log. There are instances of this log line > 200 times/min
error!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should feel free to demote in this PR.
If it happens often the 99% is not an error :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it's a transitive error which should go away after a couple of retries, so it should be a warning at best.
I've adjusted the log

@@ -131,6 +131,7 @@ impl Orchestrator {
thread::sleep(Duration::from_secs(10 * 60));
let (ipv4, ipv6) = Self::get_ip_addresses();

// TODO: group all newlines together currently split by newline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this code.
@andrewbattat: does Pietro's suggestion sound reasonable to you?

Copy link
Member

@andrewbattat andrewbattat Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the suggestion to turn this:

let message = indoc::formatdoc!(
                r#"
                    Node-id: {node_id}
                    Replica version: {version}
                    IPv6: {ipv6}
                    IPv4: {ipv4}

                "#
            );

Into this:

let message = format!(
    "Node-id: {node_id}, Replica version: {version}, IPv6: {ipv6}, IPv4: {ipv4}"
);

?

This message is logged to the host console. It's purpose is so that when a node provider opens their terminal, they have this info immediately surfaced—so the newlines are there for readability. So I'd rather not sacrifice this readability if we don't have to.

Copy link
Contributor

@adambratschikaye adambratschikaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning up the logs.

@@ -2015,6 +2021,7 @@ fn observe_replicated_state_metrics(
// Log all old call contexts, but not (nearly) every round.
if current_round.get() % SPAMMY_LOG_INTERVAL_ROUNDS == 0 {
for (origin, origin_time) in &old_call_contexts {
// TODO: Demote it to INFO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to make the change in this PR.

// sender: CanisterId(dtvyz-2iaaa-aaaao-a2xxq-cai), sender_reply_callback: 20791,
// payment: Cycles(0), method_name: "index", method_payload: "10 bytes;4449444c016d7b010000",
// metadata: RequestMetadata { call_tree_depth: 1, call_tree_start_time: Time(1737940399724249864) },
// deadline: CoarseTime(0) }) on same subnet failed with error 'Canister 7xpjo-2iaaa-aaaao-a2vxq-cai is stopped'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the error always that the canister is stopped? Maybe we should match on the error and log the other cases, but skip that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants