-
Notifications
You must be signed in to change notification settings - Fork 979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harden ledger state invariance: move LCL header, HAS and Soroban config to read-only BucketList snapshot #4597
Conversation
575bbb2
to
4ebf3fa
Compare
Open question: should we allow snapshot auto-updates at all? It feels pretty evil, as consumers of a snapshot can't reason about its consistency while accessing the snapshot at arbitrary times. For example, we may have a function that calls loadAccount multiple times on the same snapshot. The problem is, if a snapshot changed in the background due to ledgerClose, we'd get different results on different calls to loadAccount within the same function, potentially invalidating previously executed logic. I think it'd be more sane to force callers to update the snapshots explicitly (this can happen when apply is done, but on the main thread). |
I'd vote for the snapshots to be completely immutable, unless that causes some issues. That would also open the way for the thread-safe ledger access if we ever come to that point. It would be nice if a thread could keep a local copy of a snapshot for as long as necessary. |
I would probably just get rid of auto updates entirely. Best interface is probably just to expose an |
I think we can expose a manual update function to achieve "thread can keep a local ledger copy indefinitely". I think immutability would be a bit annoying from the interface perspective and would require more calls to lock. If snapshots are immutable, threads would have to keep a local copy, check with the snapshot manager to see if its update (taking the lock), throw away the outdated version, then get the new version from the snapshot manager (taking the lock again). I think a mutable snapshot with an |
4ebf3fa
to
5832a26
Compare
return *mSorobanNetworkConfig; | ||
} | ||
else | ||
return *mSorobanNetworkConfig; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not particularly happy with this change, but it looks like our tests are still modifying soroban configs directly without closing ledgers, so I'm not sure what a better alternative would be. @SirTyson maybe you have some input?
5832a26
to
7d0b178
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it looks good, but I think we need to be careful about how we expose LedgerHeader and network config state from the snapshots. I don't know if it makes sense anymore to get this info from the LedgerManager as apposed to just getting it directly from the snapshot. We also need to be careful about references to state inside a snapshot, as these can be invalidated very easily.
// LedgerHeader associated with this ledger state snapshot | ||
LedgerHeader const mHeader; | ||
// Last closed ledger associated with this snapshot | ||
LastClosedLedger const mLastClosedLedger; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we switch this to std::shared_ptr<LastClosedLedger const> const
? We're making a lot of copies of this struct and it's not trivial in size.
src/bucket/BucketManager.cpp
Outdated
auto searchableBL = | ||
mSnapshotManager->copySearchableLiveBucketListSnapshot(); | ||
auto searchableBL = mSnapshotManager->copySearchableLiveBucketListSnapshot( | ||
/* autoUpdate */ true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
autoUpdate should go away anyway, but want to point out that background eviction should not be ever calling update on its snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this is why this interface is problematic - autoupdate is enabled everywhere in master, which is definitely a footgun. I'm not sure it'll be easy to remove autoupdates in this PR (will it break other use cases?), but I'll try.
@@ -40,6 +40,7 @@ | |||
#include "util/XDRCereal.h" | |||
#include "util/XDRStream.h" | |||
#include "work/WorkScheduler.h" | |||
#include "xdrpp/printer.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: My linter is complaining that this is an unused include.
return *mSorobanNetworkConfig; | ||
#else | ||
return getLastClosedLedger(mApp).getSorobanNetworkConfig().value(); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function (and all other getters that return a reference to snapshot held state) is dangerous from a memory perspective. We're returning a reference to a field in a snapshot, but that snapshot may change to a new memory address at any time when the snapshot is updated. Given that ledger state now lives in the snapshot, we should probably change the interface such that you query a snapshot for ledger header / network config instead of LedgerManager. That way it's more clear that the lifetime of the snapshot needs to be persisted when you get the reference. Returning a shared pointer may also work.
On second thought, I think I am pro immutable snapshots given the memory issue with references. |
FWIW this is how I envision thread-safe access to the ledger in the future. It's a necessary property for a thread to keep a thread-local snapshot pointer (the implementation details are not that important; it would be great if it can be lightweight). It's not just about reference, but in general about the ledger access patterns. We really want to avoid a thread having an inconsistent view of a ledger while its executing its business logic. Using references + immutability works around this issue for the most part. That said, we don't need to do everything right now, given that we don't need the thread-safety yet. |
11d243f
to
1566fe1
Compare
1566fe1
to
33b06e2
Compare
Revisiting this approach, I think it's too complicated: LCL is queried across the whole codebase, so switching every single use case to snapshots isn't particularly feasible at this time. I opened #4602 with something simpler: the change makes ledger snapshot const and removes auto-update and allows LedgerManager to store its own read-only snapshot that subsystems outside of apply can use. LM continues to manage rad-only LCL and Soroban config, but with some new invariance that those functions can only be called from the main thread (this becomes relevant for parallell ledger close). |
This change moves last closed ledger header, HAS and Soroban config into BucketListSnapshot to enforce the invariant that read-only ledger state remains unchanged, especially as we add ledger close in the background. This way main thread does not need to worry about synchronizing access to LCL, and it can be sure underlying LCL won't change unexpectedly.