Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harden ledger state invariance: move LCL header, HAS and Soroban config to read-only BucketList snapshot #4597

Closed
wants to merge 8 commits into from

Conversation

marta-lokhova
Copy link
Contributor

This change moves last closed ledger header, HAS and Soroban config into BucketListSnapshot to enforce the invariant that read-only ledger state remains unchanged, especially as we add ledger close in the background. This way main thread does not need to worry about synchronizing access to LCL, and it can be sure underlying LCL won't change unexpectedly.

@marta-lokhova marta-lokhova force-pushed the bucketSnapshot branch 3 times, most recently from 575bbb2 to 4ebf3fa Compare December 20, 2024 21:47
@marta-lokhova
Copy link
Contributor Author

Open question: should we allow snapshot auto-updates at all? It feels pretty evil, as consumers of a snapshot can't reason about its consistency while accessing the snapshot at arbitrary times. For example, we may have a function that calls loadAccount multiple times on the same snapshot. The problem is, if a snapshot changed in the background due to ledgerClose, we'd get different results on different calls to loadAccount within the same function, potentially invalidating previously executed logic. I think it'd be more sane to force callers to update the snapshots explicitly (this can happen when apply is done, but on the main thread).

@dmkozh
Copy link
Contributor

dmkozh commented Dec 20, 2024

Open question: should we allow snapshot auto-updates at all? It feels pretty evil, as consumers of a snapshot can't reason about its consistency while accessing the snapshot at arbitrary times

I'd vote for the snapshots to be completely immutable, unless that causes some issues. That would also open the way for the thread-safe ledger access if we ever come to that point. It would be nice if a thread could keep a local copy of a snapshot for as long as necessary.

@SirTyson
Copy link
Contributor

Open question: should we allow snapshot auto-updates at all? It feels pretty evil, as consumers of a snapshot can't reason about its consistency while accessing the snapshot at arbitrary times. For example, we may have a function that calls loadAccount multiple times on the same snapshot. The problem is, if a snapshot changed in the background due to ledgerClose, we'd get different results on different calls to loadAccount within the same function, potentially invalidating previously executed logic. I think it'd be more sane to force callers to update the snapshots explicitly (this can happen when apply is done, but on the main thread).

I would probably just get rid of auto updates entirely. Best interface is probably just to expose an updateToLatest function with the expectation that it is called prior to a set of load calls that need to be ledger consistent.

@SirTyson
Copy link
Contributor

Open question: should we allow snapshot auto-updates at all? It feels pretty evil, as consumers of a snapshot can't reason about its consistency while accessing the snapshot at arbitrary times

I'd vote for the snapshots to be completely immutable, unless that causes some issues. That would also open the way for the thread-safe ledger access if we ever come to that point. It would be nice if a thread could keep a local copy of a snapshot for as long as necessary.

I think we can expose a manual update function to achieve "thread can keep a local ledger copy indefinitely". I think immutability would be a bit annoying from the interface perspective and would require more calls to lock. If snapshots are immutable, threads would have to keep a local copy, check with the snapshot manager to see if its update (taking the lock), throw away the outdated version, then get the new version from the snapshot manager (taking the lock again). I think a mutable snapshot with an updateToLatest function is preferable.

return *mSorobanNetworkConfig;
}
else
return *mSorobanNetworkConfig;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not particularly happy with this change, but it looks like our tests are still modifying soroban configs directly without closing ledgers, so I'm not sure what a better alternative would be. @SirTyson maybe you have some input?

Copy link
Contributor

@SirTyson SirTyson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it looks good, but I think we need to be careful about how we expose LedgerHeader and network config state from the snapshots. I don't know if it makes sense anymore to get this info from the LedgerManager as apposed to just getting it directly from the snapshot. We also need to be careful about references to state inside a snapshot, as these can be invalidated very easily.

// LedgerHeader associated with this ledger state snapshot
LedgerHeader const mHeader;
// Last closed ledger associated with this snapshot
LastClosedLedger const mLastClosedLedger;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we switch this to std::shared_ptr<LastClosedLedger const> const? We're making a lot of copies of this struct and it's not trivial in size.

auto searchableBL =
mSnapshotManager->copySearchableLiveBucketListSnapshot();
auto searchableBL = mSnapshotManager->copySearchableLiveBucketListSnapshot(
/* autoUpdate */ true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autoUpdate should go away anyway, but want to point out that background eviction should not be ever calling update on its snapshot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is why this interface is problematic - autoupdate is enabled everywhere in master, which is definitely a footgun. I'm not sure it'll be easy to remove autoupdates in this PR (will it break other use cases?), but I'll try.

@@ -40,6 +40,7 @@
#include "util/XDRCereal.h"
#include "util/XDRStream.h"
#include "work/WorkScheduler.h"
#include "xdrpp/printer.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: My linter is complaining that this is an unused include.

return *mSorobanNetworkConfig;
#else
return getLastClosedLedger(mApp).getSorobanNetworkConfig().value();
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function (and all other getters that return a reference to snapshot held state) is dangerous from a memory perspective. We're returning a reference to a field in a snapshot, but that snapshot may change to a new memory address at any time when the snapshot is updated. Given that ledger state now lives in the snapshot, we should probably change the interface such that you query a snapshot for ledger header / network config instead of LedgerManager. That way it's more clear that the lifetime of the snapshot needs to be persisted when you get the reference. Returning a shared pointer may also work.

@SirTyson
Copy link
Contributor

I'd vote for the snapshots to be completely immutable, unless that causes some issues. That would also open the way for the thread-safe ledger access if we ever come to that point. It would be nice if a thread could keep a local copy of a snapshot for as long as necessary.

On second thought, I think I am pro immutable snapshots given the memory issue with references.

@dmkozh
Copy link
Contributor

dmkozh commented Dec 23, 2024

If snapshots are immutable, threads would have to keep a local copy, check with the snapshot manager to see if its update (taking the lock), throw away the outdated version, then get the new version from the snapshot manager

FWIW this is how I envision thread-safe access to the ledger in the future. It's a necessary property for a thread to keep a thread-local snapshot pointer (the implementation details are not that important; it would be great if it can be lightweight). It's not just about reference, but in general about the ledger access patterns. We really want to avoid a thread having an inconsistent view of a ledger while its executing its business logic. Using references + immutability works around this issue for the most part. That said, we don't need to do everything right now, given that we don't need the thread-safety yet.

@marta-lokhova marta-lokhova force-pushed the bucketSnapshot branch 2 times, most recently from 11d243f to 1566fe1 Compare December 29, 2024 08:27
@marta-lokhova
Copy link
Contributor Author

Revisiting this approach, I think it's too complicated: LCL is queried across the whole codebase, so switching every single use case to snapshots isn't particularly feasible at this time. I opened #4602 with something simpler: the change makes ledger snapshot const and removes auto-update and allows LedgerManager to store its own read-only snapshot that subsystems outside of apply can use. LM continues to manage rad-only LCL and Soroban config, but with some new invariance that those functions can only be called from the main thread (this becomes relevant for parallell ledger close).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants