Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bonsai archive feature #7475

Open
wants to merge 64 commits into
base: main
Choose a base branch
from

Conversation

matthew1001
Copy link
Contributor

@matthew1001 matthew1001 commented Aug 16, 2024

PR description

Introduces a new (experimental) "Bonsai Archive" DB mode which creates a full archive of the chain it syncs with. This allows JSON/RPC calls to be made with historic blocks as context, for example eth_getBalance to get the balance of an account at a historic block, or eth_call to simulate a transaction at a given block in history.

The PR is intended to provide part of the function currently offered by the (now deprecated) FOREST DB mode. Specifically it allows state to be queried at an arbitrary block in history, but does not currently offer eth_getProof for said state. A subsequent PR will implement eth_getProof for historic blocks.

Summary of the overall design & changes

This PR builds on PR #5865 which proved the basic concept of archiving state in the Bonsai flat DB by suffixing entries with the block in which they were changed.

For example the state for account 0x0e79065B5F11b5BD1e62B935A600976ffF3754B9 at block 37834 is stored as

<account-hash><block-num-hex> = 0x9ab656e8fa2a1029964289c9a189083db258ca4b46ebaa374477e069b8f47dec00000000000093ca

In order to minimise performance degradation over time, historic state and storage entries in the DB are "archived" by moving them into a separate DB segment.

Where account state is stored in segment ACCOUNT_INFO_STATE, state that has been archived is stored in ACCOUNT_INFO_STATE_ARCHIVE. Likewise where storage is held in segment ACCOUNT_STORAGE_STORAGE, archived storage entries are stored in ACCOUNT_STORAGE_ARCHIVE.

An example Rocks DB query to retrieve the state of the example account above would be:

ldb --db=. get --column_family=ACCOUNT_INFO_STATE_ARCHIVE --key_hex --value_hex 0x9ab656e8fa2a1029964289c9a189083db258ca4b46ebaa374477e069b8f47dec00000000000093ca

Creating a Bonsai Archive node

The PR introduces an entirely new data storage format (as opposed to making it a configuration option of the existing BONSAI storage format.

To create a bonsai archive node simply set --data-storage-format=x_bonsai_archive when creating it.

An existing FOREST or BONSAI node cannot be migrated to BONSAI_ARCHIVE mode.

Storage requirements

An archive node intrinsically requires more storage space than a non-archive node. Every state update is retained in the archive DB segments as outlined above. An archive node for the holesky testnet as of the raising of this PR requires approximately 160Gi of storage.

Sync time

In order to create an archive of an entire chain, FULL sync mode must be used. This PR does not prevent SNAP syncing an archive node, but this will result in only a partial archive of the chain.

While the node is performing a FULL sync with the chain it is also migrating entries from the regular DB segments to the archive DB segments. Overall this increases the time to create the archive node. For a public chain this might require 1 week or more to complete syncing and archiving.

@matthew1001 matthew1001 force-pushed the multi-version-flat-db-rebase branch 2 times, most recently from 88f3968 to 7d4a524 Compare August 20, 2024 10:19
@matthew1001 matthew1001 changed the title Multi version flat db rebase Bonsai archive feature Sep 4, 2024
@matthew1001 matthew1001 force-pushed the multi-version-flat-db-rebase branch 7 times, most recently from 782ae60 to 5752732 Compare October 2, 2024 17:10
@matthew1001 matthew1001 force-pushed the multi-version-flat-db-rebase branch 4 times, most recently from 5b06b50 to dce531e Compare October 4, 2024 16:11
@matthew1001 matthew1001 marked this pull request as ready for review October 7, 2024 16:31
jframe and others added 15 commits October 8, 2024 08:38
Signed-off-by: Jason Frame <[email protected]>
…se constructor that reuses worldStateStorage so that we don't lose values in the EvmToolSpecTests

Signed-off-by: Jason Frame <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
…d state, and freeze it

Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
…ten for blocks and move account state to new DB segment

Signed-off-by: Matthew Whitehead <[email protected]>
…t block state has been frozen for

Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
…age from the freezer segment

Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
@matthew1001
Copy link
Contributor Author

matthew1001 commented Jan 14, 2025

@garyschulte @matkt I've added some commits that refactor the way archive world state and bonsai context work together. I think it's generally much cleaner now.

There are some small things I think that could still be improved. I've updated BonsaiContext to allow either block number or block hash to be provided - perhaps that should just be one or the other but I'm inclined to leave it as a pragmatic solution for now. But I'm generally pretty happy with the code structure now.

I also added a new class BonsaiArchiveWorldStateProvider which provides an override for getMutable(...) with archive-specific behaviour. In the case where the request is for world state <= 512 blocks ago, it calls the super version of the function which does a traditional bonsai rollback. But for older blocks it uses archive-specific logic to "rollback" to the chain head and set the block context to support flat DB requests at the historic block.

@matthew1001 matthew1001 requested a review from matkt January 14, 2025 11:57
@matthew1001
Copy link
Contributor Author

I've re-run ./gradlew acceptanceTestBftSoak which on this PR includes an archive node as 1 of the 4 QBFT nodes, and makes archive requests to the archive node at the end of the test. Test passed fine.

Signed-off-by: Matthew Whitehead <[email protected]>
@garyschulte
Copy link
Contributor

@garyschulte @matkt I've added some commits that refactor the way archive world state and bonsai context work together. I think it's generally much cleaner now.

There are some small things I think that could still be improved. I've updated BonsaiContext to allow either block number or block hash to be provided - perhaps that should just be one or the other but I'm inclined to leave it as a pragmatic solution for now. But I'm generally pretty happy with the code structure now.

...

Is there any time pressure for this feature? With the latest refactor, I'd like to at minimum archive and non-archive sync a network that predates cancun (where we nerfed SELFDESTRUCT).

Ideally we would have time to full sync mainnet without archive to ensure no regression, but that is a months long process on commodity hardware.

Either way - I will keep 👀 on this PR and kick off regression tests when you signal that it is ready again with your latest commits.

@matthew1001
Copy link
Contributor Author

@garyschulte @matkt I've added some commits that refactor the way archive world state and bonsai context work together. I think it's generally much cleaner now.
There are some small things I think that could still be improved. I've updated BonsaiContext to allow either block number or block hash to be provided - perhaps that should just be one or the other but I'm inclined to leave it as a pragmatic solution for now. But I'm generally pretty happy with the code structure now.

...

Is there any time pressure for this feature? With the latest refactor, I'd like to at minimum archive and non-archive sync a network that predates cancun (where we nerfed SELFDESTRUCT).

Ideally we would have time to full sync mainnet without archive to ensure no regression, but that is a months long process on commodity hardware.

Either way - I will keep 👀 on this PR and kick off regression tests when you signal that it is ready again with your latest commits.

I think almost all of the latest refactoring is isolated to archive-specific logic or classes. The clearStorage() implementation may be the one exception to that, which I'd be happy to revert if we want to merge the logic that we've tested with the full main merge. My concern with holding off too long is that the code-base is evolving and requires a fair amount of effort to keep rebasing/merging the PR with main. Unless we have big concerns that Bonsai (non-archive) has regressed I'd be keen to get this one merged sooner rather than later.

@matthew1001 matthew1001 force-pushed the multi-version-flat-db-rebase branch from 1b145a3 to 3873f8e Compare January 16, 2025 11:23
@matthew1001 matthew1001 force-pushed the multi-version-flat-db-rebase branch 2 times, most recently from 877a5f3 to 7d42770 Compare January 20, 2025 17:52
@matthew1001 matthew1001 force-pushed the multi-version-flat-db-rebase branch from 7d42770 to 633fad4 Compare January 20, 2025 17:52
Signed-off-by: Matthew Whitehead <[email protected]>
@matthew1001 matthew1001 force-pushed the multi-version-flat-db-rebase branch from 7809755 to 3d8d8c7 Compare January 21, 2025 15:35
matthew1001 and others added 2 commits January 21, 2025 15:35
@matthew1001 matthew1001 requested a review from matkt January 22, 2025 09:08
private Optional<BonsaiContext> getStateArchiveContextForRead(
final SegmentedKeyValueStorage storage) {
// For Bonsai archive get the flat DB context to use for reading archive entries
Optional<byte[]> archiveContext = storage.get(TRIE_BRANCH_STORAGE, WORLD_BLOCK_NUMBER_KEY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we cannot find a way to cache it and not read it everytime ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We possibly could. Maybe something to improve upon in the next PR(s) in this area?

@@ -141,7 +143,7 @@ public Optional<MutableWorldState> getMutable(
return cachedWorldStorageManager
.getWorldState(blockHeader.getHash())
.or(() -> cachedWorldStorageManager.getNearestWorldState(blockHeader))
.or(() -> cachedWorldStorageManager.getHeadWorldState(blockchain::getBlockHeader))
.or(() -> cachedWorldStorageManager.getWorldState(chainHeadBlockHeader.getHash()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why removing 'head' ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a race condition where getHeadWorldState can return empty because blockchain.getBlockHeader() returns empty for the world state block hash in rootWorldStateStorage. For archive it's better that we get the latest block that blockchain can provide, even if that's not the absolute latest world state. If you think it's better for DiffBasedWorldStateProvider to return Optional.empty rather than that, I'm happy to revert the change in DiffBasedWorldStateProvider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the need. The problem is that with this modification, if your head is not in the cache, it won't work. So I think it's a bit risky. When exactly do you have the race condition? Who is calling the getMutable at that moment? Added that the head is there as a fallback normally if nothing is in the cache. We take the persisted state.

WORLD_BLOCK_NUMBER_KEY,
Long.toHexString(blockHeader == null ? 0L : blockHeader.getNumber())
.getBytes(StandardCharsets.UTF_8));
stateUpdater.commit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will prefer to remove this one and use the commit ihere https://github.com/hyperledger/besu/pull/7475/files#diff-a914e260cf83b36ae4032f8bb562fd1a794ef1432cd8642e5483cbb7270121c4R235
like that we are sure to have only one commit for everything and we avoid having mix if the next step is failing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we can use a single DB commit because the PUT of the WORLD_BLOCK_NUMBER_KEY needs committing before subsequent state access (in calculatedRootHash) can do a GET of that key.

The latest commit just updates it to create an updater, commit it, then recreate it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calculateRootHash should read the head from the database, so it should be the last block number persisted and not the new one we are trying to persist. The old block number should already be saved in the storage if we have committed the block previously. Therefore, I don't think we need to do it again before calculating the root hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calculateRootHash should read the head from the database

Agreed and it still does. It doesn't use WORLD_BLOCK_NUMBER_KEY to determine which block/state to read. Possibly we should rename WORLD_BLOCK_NUMBER_KEY -> WORLD_ARCHIVE_BLOCK_KEY and reserve it for archive-related use? In the future we could add WORLD_BLOCK_NUMBER_KEY if we thought it was useful and it wouldn't be updated at the same time as WORLD_ARCHIVE_BLOCK_KEY.

The old block number should already be saved in the storage if we have committed the block previously.

That's true, the old block number should already be persisted. But for the new block & flat-db archive keys we take WORLD_BLOCK_NUMBER_KEY and use that as the suffix - so it can't be the previous block number when we call calculateRootHash. For persisting a new block the number in storage needs to be the new block number, not the previous one. So I see this code change as "this is the block number we're about to persist, use it for archive flat-db suffixes". Again, perhaps WORLD_ARCHIVE_BLOCK_KEY would help ensure it isn't misinterpreted elsewhere in the codebase?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it's for persisting the new keys in the database only ?

Copy link
Contributor

@matkt matkt Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's the case why not passing this block number to the updater ? Why I'm trying to avoid that because we had many issues with state inconsistency in the past due to things like this, and I prefer to avoid that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I understand that concern. I think the current design is OK - I don't think we should pass it into the updater - but perhaps I need to change how we use the value of that key:

  • When using WORLD_BLOCK_NUMBER_KEY for archive read activity (i.e. lookup this account in the archive for this state) just use the value of that key from the DB/state
  • When using WORLD_BLOCK_NUMBER_KEY for archive write activity (i.e. put this new account) we use WORLD_BLOCK_NUMBER_KEY+1
    • E.g. we have currently the state for block 10, we're creating block 11, read WORLD_BLOCK_NUMBER_KEY (which is 10) and put archive entries as WORLD_BLOCK_NUMBER_KEY+1 (11)

I've already introduced getStateArchiveContextForRead() and getStateArchiveContextForWrite() in BonsaiArchiveFlatDbStrategy because there are reasons for treating the cases differently (particularly error handling). So I think the above changes would meet your requirement for WORLD_BLOCK_NUMBER_KEY matching what the world state is.

Copy link
Contributor

@matkt matkt Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WORLD_BLOCK_NUMBER_KEY+1 but in case of reorg for example we need to go back from the head 11 to 10 ? I think it will be a problem maybe ? because will write to 12 instead of 10 .

I am not 100% confident about this modification, but I will not block the PR it if you think it is good, knowing that it will only impact the Bonsai archive. I would like to have @garyschulte 's opinion on this.

I would prefer not to commit before and only do it in case of success, but I will go with the majority's opinion if Gary thinks this approach is ok.

on my side I see it more like that

public Updater(
        final BlockHeader blockContext, // block to persist
        final SegmentedKeyValueStorageTransaction composedWorldStateTransaction,
        final KeyValueStorageTransaction trieLogStorageTransaction,
        final FlatDbStrategy flatDbStrategy)
        ..
        
        
        public Updater putAccountInfoState(final Hash accountHash, final Bytes accountValue) {
      if (accountValue.size() == 0) {
        // Don't save empty values
        return this;
      }
      //strategy use it or not
      flatDbStrategy.putFlatAccount(blockContext.getBlockNumber(), composedWorldStateTransaction, accountHash, accountValue);
      return this;
    }
   @Override
 public void persist(final BlockHeader blockHeader) {
   final Optional<BlockHeader> maybeBlockHeader = Optional.ofNullable(blockHeader);
   LOG.atDebug()
       .setMessage("Persist world state for block {}")
       .addArgument(maybeBlockHeader)
       .log();

   final DiffBasedWorldStateUpdateAccumulator<?> localCopy = accumulator.copy();

   boolean success = false;

   final DiffBasedWorldStateKeyValueStorage.Updater stateUpdater =
       worldStateKeyValueStorage.updater(blockHeader);
   Runnable saveTrieLog = () -> {};

   try {
     final Hash calculatedRootHash;
     
     
     ...
     
     
         stateUpdater
           .getWorldStateTransaction()
           .put(TRIE_BRANCH_STORAGE, WORLD_BLOCK_HASH_KEY, blockHeader.getHash().toArrayUnsafe());
       worldStateBlockHash = blockHeader.getHash();
       stateUpdater
               .getWorldStateTransaction()
               .put(TRIE_BRANCH_STORAGE, WORLD_BLOCK_NUMBER_KEY, Bytes.ofUnsignedLong(blockHeader.getNumber()).toArrayUnsafe());

     } else {
       stateUpdater.getWorldStateTransaction().remove(TRIE_BRANCH_STORAGE, WORLD_BLOCK_HASH_KEY);
       worldStateBlockHash = null;
     }
@Override
  public Updater updater(BlockHeader blockContext) {
    return new Updater(
        blockContext,    
        composedWorldStateStorage.startTransaction(),
        trieLogStorage.startTransaction(),
        getFlatDbStrategy());
  }

Copy link
Contributor

@garyschulte garyschulte Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to set the block context optimistically and avoid a commit(), we could just remove the commit and read-through-the-transaction whenever we are looking for the archive block number.

i.e. add a .get(..) method to SegmentedKeyValueStorageTransaction. RocksDBSnapshotTransaction already implements get(), it just isn't exposed in the interface.

I think it would be trivial to add it to SegmentedInMemoryTransaction, where we read from the updated value map and if not present, defer to the parent SegmentedInMemoryKeyValueStorage.get(). There are a few other variants of this, but these are the two main flavors - rocksdb transactions and in-memory maps.

We'd just need to ensure that when we are accessing the "context" block number, we always read through the transaction.

That being said, in this solution we are using the storage layer to communicate what I think are ephemeral query-time parameters. I don't really like persisting a 'latest query parameter' block number to storage. It is brittle and a potential source of contradiction that we have to handle in all of the corner cases like reorgs, block production, and now querying. IMO we have the block number when we need to produce a block, and we have the block number when we need to query a block. Persisting this somewhere and handling its update seems like an unnecessary complication to an already complex storage mechanism.

Copy link
Contributor

@garyschulte garyschulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get through the whole review due to time, but wanted to provide feedback on what I did get to.

.putStorageValueBySlotHash(
accountHash, Hash.wrap(key), Bytes32.leftPad(RLP.decodeValue(value))));
});
if (!worldStateStorageCoordinator.isMatchingFlatMode(FlatDbMode.PARTIAL)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd. the condition is if we are not using PARTIAL flat db, we execute putStorageValueBySlotHash(...).

However in the else, meaning we are using PARTIAL flat mode, we execute the exact same statement if the flat mode is FULL. ? What are we trying to accomplish here? As far as I can tell we will never execute the lambda in the else, and it is the same lambda as above. 🤔

((BonsaiWorldStateKeyValueStorage.Updater) updater)
.putAccountInfoState(Hash.wrap(key), value));
});
if (!worldStateStorageCoordinator.isMatchingFlatMode(FlatDbMode.PARTIAL)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question here. The lambda in the else will never execute and it is identical

&& worldStateStorageCoordinator.isMatchingFlatMode(FlatDbMode.FULL)) {
if (worldStateKeyValueStorage.getDataStorageFormat().isBonsaiFormat()
&& (worldStateStorageCoordinator.isMatchingFlatMode(FlatDbMode.FULL)
|| worldStateStorageCoordinator.isMatchingFlatMode(FlatDbMode.ARCHIVE))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried to serve snap from an archive node for anything besides a trivial state? I suspect that heavily used contracts and accounts will cause a lot of churn and range responses might take longer than the 5 second timeout. I am thinking uniswap and hex contracts on mainnet for example. Worst case, it might be an easy way to DoS an archive instance.

@SuppressWarnings("BannedMethod")
@BeforeEach
public void setup() {
Configurator.setLevel(LogManager.getLogger(ArchiverTests.class).getName(), Level.TRACE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you making assertions on log statements or is this more of a debugging aid that got left in?

WORLD_BLOCK_NUMBER_KEY,
Long.toHexString(blockHeader == null ? 0L : blockHeader.getNumber())
.getBytes(StandardCharsets.UTF_8));
stateUpdater.commit();
Copy link
Contributor

@garyschulte garyschulte Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to set the block context optimistically and avoid a commit(), we could just remove the commit and read-through-the-transaction whenever we are looking for the archive block number.

i.e. add a .get(..) method to SegmentedKeyValueStorageTransaction. RocksDBSnapshotTransaction already implements get(), it just isn't exposed in the interface.

I think it would be trivial to add it to SegmentedInMemoryTransaction, where we read from the updated value map and if not present, defer to the parent SegmentedInMemoryKeyValueStorage.get(). There are a few other variants of this, but these are the two main flavors - rocksdb transactions and in-memory maps.

We'd just need to ensure that when we are accessing the "context" block number, we always read through the transaction.

That being said, in this solution we are using the storage layer to communicate what I think are ephemeral query-time parameters. I don't really like persisting a 'latest query parameter' block number to storage. It is brittle and a potential source of contradiction that we have to handle in all of the corner cases like reorgs, block production, and now querying. IMO we have the block number when we need to produce a block, and we have the block number when we need to query a block. Persisting this somewhere and handling its update seems like an unnecessary complication to an already complex storage mechanism.

@garyschulte
Copy link
Contributor

My concern with holding off too long is that the code-base is evolving and requires a fair amount of effort to keep rebasing/merging the PR with main. Unless we have big concerns that Bonsai (non-archive) has regressed I'd be keen to get this one merged sooner rather than later.

Agreed. This is a big PR and not only is it a nightmare to keep up to date with an evolving storage subsystem, all those merges are potential sources of error. I am keen to get this tested and merged also

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants