-
Notifications
You must be signed in to change notification settings - Fork 87
chore: decouple ntx builder #1495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: next
Are you sure you want to change the base?
Conversation
53330ae to
5267cbf
Compare
crates/ntx-builder/src/builder.rs
Outdated
| tokio::spawn(async move { | ||
| if let Err(err) = account_loader_store.stream_network_account_ids(account_tx).await { | ||
| tracing::error!(%err, "failed to load network accounts from store"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question is whether we should be reacting to task failure here. As in, should we abort the ntx builder (or restart the task) if it fails? Or is logging a single error good enough.
Failure here would mean that we never load accounts with existing notes from the store unless they also get a new event coming in. So I think we should abort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went for the abort approach
crates/ntx-builder/src/store.rs
Outdated
| Err(err) => { | ||
| // Exponential backoff with base 500ms and max 30s. | ||
| let backoff = Duration::from_millis(500) | ||
| .saturating_mul(1 << retry_counter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail/rollover after.. 32 or 64 retries 😁 Also need to saturating exponential (or limit counter I guess).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this here, and in some other places where we were using this pattern
crates/ntx-builder/src/store.rs
Outdated
| /// This method is designed to be run in a background task, sending accounts to the main event | ||
| /// loop as they are loaded. This allows the ntx-builder to start processing mempool events | ||
| /// without waiting for all accounts to be preloaded. | ||
| #[instrument(target = COMPONENT, name = "store.client.load_committed_accounts", skip_all, err)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a bit of a problem with the instrumentation as is. One needs to be careful with long-running tasks and traces. This is because a trace is only complete once the root span completes.
In this case, what we'll have is:
start load_committed_accounts
fetch_page(0)
submit_page(0)
fetch_page(1)
submit_page(1)
...
...
... a long, long, long time later
fetch_page(chain tip)
submit_page(chain tip)
close load_committed_accounts
Instead we shouldn't instrument this method at all, and each iteration of the loop should be its own root span. This means you'll want to reshuffle things a bit.
See here for an example with retries. The first example is what we have per each loop iteration. Let me know if this is still ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, perfect. I will check the network-monitor for this too and open an issue if I find something.
closes #1487
depends on #1453 and #1501 to fully work.