-
Notifications
You must be signed in to change notification settings - Fork 879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bonsai archive feature #7475
base: main
Are you sure you want to change the base?
Bonsai archive feature #7475
Conversation
88f3968
to
7d4a524
Compare
782ae60
to
5752732
Compare
5b06b50
to
dce531e
Compare
Signed-off-by: Jason Frame <[email protected]>
…se constructor that reuses worldStateStorage so that we don't lose values in the EvmToolSpecTests Signed-off-by: Jason Frame <[email protected]>
Signed-off-by: Jason Frame <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
…d state, and freeze it Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
…ten for blocks and move account state to new DB segment Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
…t block state has been frozen for Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
…age from the freezer segment Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
@garyschulte @matkt I've added some commits that refactor the way archive world state and bonsai context work together. I think it's generally much cleaner now. There are some small things I think that could still be improved. I've updated I also added a new class |
I've re-run |
Signed-off-by: Matthew Whitehead <[email protected]>
…unction name Signed-off-by: Matthew Whitehead <[email protected]>
... Is there any time pressure for this feature? With the latest refactor, I'd like to at minimum archive and non-archive sync a network that predates cancun (where we nerfed SELFDESTRUCT). Ideally we would have time to full sync mainnet without archive to ensure no regression, but that is a months long process on commodity hardware. Either way - I will keep 👀 on this PR and kick off regression tests when you signal that it is ready again with your latest commits. |
I think almost all of the latest refactoring is isolated to archive-specific logic or classes. The |
Signed-off-by: Matthew Whitehead <[email protected]>
1b145a3
to
3873f8e
Compare
Signed-off-by: Matthew Whitehead <[email protected]>
.../java/org/hyperledger/besu/ethereum/trie/diffbased/common/worldview/DiffBasedWorldState.java
Outdated
Show resolved
Hide resolved
.../main/java/org/hyperledger/besu/ethereum/trie/diffbased/bonsai/BonsaiWorldStateProvider.java
Outdated
Show resolved
Hide resolved
...yperledger/besu/ethereum/trie/diffbased/bonsai/storage/flat/BonsaiArchiveFlatDbStrategy.java
Show resolved
Hide resolved
...in/java/org/hyperledger/besu/ethereum/trie/diffbased/common/DiffBasedWorldStateProvider.java
Outdated
Show resolved
Hide resolved
877a5f3
to
7d42770
Compare
Signed-off-by: Matthew Whitehead <[email protected]>
7d42770
to
633fad4
Compare
Signed-off-by: Matthew Whitehead <[email protected]>
7809755
to
3d8d8c7
Compare
Signed-off-by: Matt Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
.../core/src/main/java/org/hyperledger/besu/ethereum/trie/common/GenesisWorldStateProvider.java
Outdated
Show resolved
Hide resolved
private Optional<BonsaiContext> getStateArchiveContextForRead( | ||
final SegmentedKeyValueStorage storage) { | ||
// For Bonsai archive get the flat DB context to use for reading archive entries | ||
Optional<byte[]> archiveContext = storage.get(TRIE_BRANCH_STORAGE, WORLD_BLOCK_NUMBER_KEY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we cannot find a way to cache it and not read it everytime ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We possibly could. Maybe something to improve upon in the next PR(s) in this area?
@@ -141,7 +143,7 @@ public Optional<MutableWorldState> getMutable( | |||
return cachedWorldStorageManager | |||
.getWorldState(blockHeader.getHash()) | |||
.or(() -> cachedWorldStorageManager.getNearestWorldState(blockHeader)) | |||
.or(() -> cachedWorldStorageManager.getHeadWorldState(blockchain::getBlockHeader)) | |||
.or(() -> cachedWorldStorageManager.getWorldState(chainHeadBlockHeader.getHash())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why removing 'head' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a race condition where getHeadWorldState
can return empty
because blockchain.getBlockHeader()
returns empty for the world state block hash in rootWorldStateStorage
. For archive it's better that we get the latest block that blockchain
can provide, even if that's not the absolute latest world state. If you think it's better for DiffBasedWorldStateProvider
to return Optional.empty
rather than that, I'm happy to revert the change in DiffBasedWorldStateProvider
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the need. The problem is that with this modification, if your head is not in the cache, it won't work. So I think it's a bit risky. When exactly do you have the race condition? Who is calling the getMutable at that moment? Added that the head is there as a fallback normally if nothing is in the cache. We take the persisted state.
WORLD_BLOCK_NUMBER_KEY, | ||
Long.toHexString(blockHeader == null ? 0L : blockHeader.getNumber()) | ||
.getBytes(StandardCharsets.UTF_8)); | ||
stateUpdater.commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will prefer to remove this one and use the commit ihere https://github.com/hyperledger/besu/pull/7475/files#diff-a914e260cf83b36ae4032f8bb562fd1a794ef1432cd8642e5483cbb7270121c4R235
like that we are sure to have only one commit for everything and we avoid having mix if the next step is failing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe we can use a single DB commit because the PUT
of the WORLD_BLOCK_NUMBER_KEY
needs committing before subsequent state access (in calculatedRootHash
) can do a GET
of that key.
The latest commit just updates it to create an updater, commit it, then recreate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculateRootHash should read the head from the database, so it should be the last block number persisted and not the new one we are trying to persist. The old block number should already be saved in the storage if we have committed the block previously. Therefore, I don't think we need to do it again before calculating the root hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculateRootHash should read the head from the database
Agreed and it still does. It doesn't use WORLD_BLOCK_NUMBER_KEY
to determine which block/state to read. Possibly we should rename WORLD_BLOCK_NUMBER_KEY
-> WORLD_ARCHIVE_BLOCK_KEY
and reserve it for archive-related use? In the future we could add WORLD_BLOCK_NUMBER_KEY
if we thought it was useful and it wouldn't be updated at the same time as WORLD_ARCHIVE_BLOCK_KEY
.
The old block number should already be saved in the storage if we have committed the block previously.
That's true, the old block number should already be persisted. But for the new block & flat-db archive keys we take WORLD_BLOCK_NUMBER_KEY
and use that as the suffix - so it can't be the previous block number when we call calculateRootHash
. For persisting a new block the number in storage needs to be the new block number, not the previous one. So I see this code change as "this is the block number we're about to persist, use it for archive flat-db suffixes". Again, perhaps WORLD_ARCHIVE_BLOCK_KEY
would help ensure it isn't misinterpreted elsewhere in the codebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it's for persisting the new keys in the database only ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's the case why not passing this block number to the updater ? Why I'm trying to avoid that because we had many issues with state inconsistency in the past due to things like this, and I prefer to avoid that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I understand that concern. I think the current design is OK - I don't think we should pass it into the updater - but perhaps I need to change how we use the value of that key:
- When using
WORLD_BLOCK_NUMBER_KEY
for archive read activity (i.e. lookup this account in the archive for this state) just use the value of that key from the DB/state - When using
WORLD_BLOCK_NUMBER_KEY
for archive write activity (i.e. put this new account) we useWORLD_BLOCK_NUMBER_KEY+1
- E.g. we have currently the state for block 10, we're creating block 11, read WORLD_BLOCK_NUMBER_KEY (which is 10) and put archive entries as
WORLD_BLOCK_NUMBER_KEY+1
(11)
- E.g. we have currently the state for block 10, we're creating block 11, read WORLD_BLOCK_NUMBER_KEY (which is 10) and put archive entries as
I've already introduced getStateArchiveContextForRead()
and getStateArchiveContextForWrite()
in BonsaiArchiveFlatDbStrategy
because there are reasons for treating the cases differently (particularly error handling). So I think the above changes would meet your requirement for WORLD_BLOCK_NUMBER_KEY
matching what the world state is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WORLD_BLOCK_NUMBER_KEY+1 but in case of reorg for example we need to go back from the head 11 to 10 ? I think it will be a problem maybe ? because will write to 12 instead of 10 .
I am not 100% confident about this modification, but I will not block the PR it if you think it is good, knowing that it will only impact the Bonsai archive. I would like to have @garyschulte 's opinion on this.
I would prefer not to commit before and only do it in case of success, but I will go with the majority's opinion if Gary thinks this approach is ok.
on my side I see it more like that
public Updater(
final BlockHeader blockContext, // block to persist
final SegmentedKeyValueStorageTransaction composedWorldStateTransaction,
final KeyValueStorageTransaction trieLogStorageTransaction,
final FlatDbStrategy flatDbStrategy)
..
public Updater putAccountInfoState(final Hash accountHash, final Bytes accountValue) {
if (accountValue.size() == 0) {
// Don't save empty values
return this;
}
//strategy use it or not
flatDbStrategy.putFlatAccount(blockContext.getBlockNumber(), composedWorldStateTransaction, accountHash, accountValue);
return this;
}
@Override
public void persist(final BlockHeader blockHeader) {
final Optional<BlockHeader> maybeBlockHeader = Optional.ofNullable(blockHeader);
LOG.atDebug()
.setMessage("Persist world state for block {}")
.addArgument(maybeBlockHeader)
.log();
final DiffBasedWorldStateUpdateAccumulator<?> localCopy = accumulator.copy();
boolean success = false;
final DiffBasedWorldStateKeyValueStorage.Updater stateUpdater =
worldStateKeyValueStorage.updater(blockHeader);
Runnable saveTrieLog = () -> {};
try {
final Hash calculatedRootHash;
...
stateUpdater
.getWorldStateTransaction()
.put(TRIE_BRANCH_STORAGE, WORLD_BLOCK_HASH_KEY, blockHeader.getHash().toArrayUnsafe());
worldStateBlockHash = blockHeader.getHash();
stateUpdater
.getWorldStateTransaction()
.put(TRIE_BRANCH_STORAGE, WORLD_BLOCK_NUMBER_KEY, Bytes.ofUnsignedLong(blockHeader.getNumber()).toArrayUnsafe());
} else {
stateUpdater.getWorldStateTransaction().remove(TRIE_BRANCH_STORAGE, WORLD_BLOCK_HASH_KEY);
worldStateBlockHash = null;
}
@Override
public Updater updater(BlockHeader blockContext) {
return new Updater(
blockContext,
composedWorldStateStorage.startTransaction(),
trieLogStorage.startTransaction(),
getFlatDbStrategy());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to set the block context optimistically and avoid a commit()
, we could just remove the commit and read-through-the-transaction whenever we are looking for the archive block number.
i.e. add a .get(..)
method to SegmentedKeyValueStorageTransaction. RocksDBSnapshotTransaction already implements get(), it just isn't exposed in the interface.
I think it would be trivial to add it to SegmentedInMemoryTransaction, where we read from the updated value map and if not present, defer to the parent SegmentedInMemoryKeyValueStorage.get(). There are a few other variants of this, but these are the two main flavors - rocksdb transactions and in-memory maps.
We'd just need to ensure that when we are accessing the "context" block number, we always read through the transaction.
That being said, in this solution we are using the storage layer to communicate what I think are ephemeral query-time parameters. I don't really like persisting a 'latest query parameter' block number to storage. It is brittle and a potential source of contradiction that we have to handle in all of the corner cases like reorgs, block production, and now querying. IMO we have the block number when we need to produce a block, and we have the block number when we need to query a block. Persisting this somewhere and handling its update seems like an unnecessary complication to an already complex storage mechanism.
.../java/org/hyperledger/besu/ethereum/bonsai/storage/flat/ArchiveFlatDbReaderStrategyTest.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Matthew Whitehead <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get through the whole review due to time, but wanted to provide feedback on what I did get to.
.putStorageValueBySlotHash( | ||
accountHash, Hash.wrap(key), Bytes32.leftPad(RLP.decodeValue(value)))); | ||
}); | ||
if (!worldStateStorageCoordinator.isMatchingFlatMode(FlatDbMode.PARTIAL)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is odd. the condition is if we are not using PARTIAL flat db, we execute putStorageValueBySlotHash(...).
However in the else, meaning we are using PARTIAL flat mode, we execute the exact same statement if the flat mode is FULL. ? What are we trying to accomplish here? As far as I can tell we will never execute the lambda in the else, and it is the same lambda as above. 🤔
((BonsaiWorldStateKeyValueStorage.Updater) updater) | ||
.putAccountInfoState(Hash.wrap(key), value)); | ||
}); | ||
if (!worldStateStorageCoordinator.isMatchingFlatMode(FlatDbMode.PARTIAL)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question here. The lambda in the else will never execute and it is identical
&& worldStateStorageCoordinator.isMatchingFlatMode(FlatDbMode.FULL)) { | ||
if (worldStateKeyValueStorage.getDataStorageFormat().isBonsaiFormat() | ||
&& (worldStateStorageCoordinator.isMatchingFlatMode(FlatDbMode.FULL) | ||
|| worldStateStorageCoordinator.isMatchingFlatMode(FlatDbMode.ARCHIVE))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried to serve snap from an archive node for anything besides a trivial state? I suspect that heavily used contracts and accounts will cause a lot of churn and range responses might take longer than the 5 second timeout. I am thinking uniswap and hex contracts on mainnet for example. Worst case, it might be an easy way to DoS an archive instance.
@SuppressWarnings("BannedMethod") | ||
@BeforeEach | ||
public void setup() { | ||
Configurator.setLevel(LogManager.getLogger(ArchiverTests.class).getName(), Level.TRACE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you making assertions on log statements or is this more of a debugging aid that got left in?
WORLD_BLOCK_NUMBER_KEY, | ||
Long.toHexString(blockHeader == null ? 0L : blockHeader.getNumber()) | ||
.getBytes(StandardCharsets.UTF_8)); | ||
stateUpdater.commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to set the block context optimistically and avoid a commit()
, we could just remove the commit and read-through-the-transaction whenever we are looking for the archive block number.
i.e. add a .get(..)
method to SegmentedKeyValueStorageTransaction. RocksDBSnapshotTransaction already implements get(), it just isn't exposed in the interface.
I think it would be trivial to add it to SegmentedInMemoryTransaction, where we read from the updated value map and if not present, defer to the parent SegmentedInMemoryKeyValueStorage.get(). There are a few other variants of this, but these are the two main flavors - rocksdb transactions and in-memory maps.
We'd just need to ensure that when we are accessing the "context" block number, we always read through the transaction.
That being said, in this solution we are using the storage layer to communicate what I think are ephemeral query-time parameters. I don't really like persisting a 'latest query parameter' block number to storage. It is brittle and a potential source of contradiction that we have to handle in all of the corner cases like reorgs, block production, and now querying. IMO we have the block number when we need to produce a block, and we have the block number when we need to query a block. Persisting this somewhere and handling its update seems like an unnecessary complication to an already complex storage mechanism.
Agreed. This is a big PR and not only is it a nightmare to keep up to date with an evolving storage subsystem, all those merges are potential sources of error. I am keen to get this tested and merged also |
PR description
Introduces a new (experimental) "Bonsai Archive" DB mode which creates a full archive of the chain it syncs with. This allows JSON/RPC calls to be made with historic blocks as context, for example
eth_getBalance
to get the balance of an account at a historic block, oreth_call
to simulate a transaction at a given block in history.The PR is intended to provide part of the function currently offered by the (now deprecated)
FOREST
DB mode. Specifically it allows state to be queried at an arbitrary block in history, but does not currently offereth_getProof
for said state. A subsequent PR will implementeth_getProof
for historic blocks.Summary of the overall design & changes
This PR builds on PR #5865 which proved the basic concept of archiving state in the Bonsai flat DB by suffixing entries with the block in which they were changed.
For example the state for account
0x0e79065B5F11b5BD1e62B935A600976ffF3754B9
at block37834
is stored asIn order to minimise performance degradation over time, historic state and storage entries in the DB are "archived" by moving them into a separate DB segment.
Where account state is stored in segment
ACCOUNT_INFO_STATE
, state that has been archived is stored inACCOUNT_INFO_STATE_ARCHIVE
. Likewise where storage is held in segmentACCOUNT_STORAGE_STORAGE
, archived storage entries are stored inACCOUNT_STORAGE_ARCHIVE
.An example Rocks DB query to retrieve the state of the example account above would be:
Creating a Bonsai Archive node
The PR introduces an entirely new data storage format (as opposed to making it a configuration option of the existing
BONSAI
storage format.To create a bonsai archive node simply set
--data-storage-format=x_bonsai_archive
when creating it.An existing
FOREST
orBONSAI
node cannot be migrated toBONSAI_ARCHIVE
mode.Storage requirements
An archive node intrinsically requires more storage space than a non-archive node. Every state update is retained in the archive DB segments as outlined above. An archive node for the
holesky
testnet as of the raising of this PR requires approximately160Gi
of storage.Sync time
In order to create an archive of an entire chain,
FULL
sync mode must be used. This PR does not preventSNAP
syncing an archive node, but this will result in only a partial archive of the chain.While the node is performing a
FULL
sync with the chain it is also migrating entries from the regular DB segments to the archive DB segments. Overall this increases the time to create the archive node. For a public chain this might require 1 week or more to complete syncing and archiving.