Skip to content

can't freeze block issue with fastnode snapshot #189

@galaio

Description

@galaio

Hi guys, I'm reporting an issue that can't freeze block issue with fastnode snapshot and describing my troubleshooting process.

Since BSC v1.5.19, features like pruneancient and prune-block have been replaced with Tail Deletion. I've noticed frequent errors when running fastnode snapshots with this new version. See bnb-chain/bsc#3277:

t=08-19|07:25:48.660 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 56624302"

Investigation revealed a significant discrepancy between the offsets in the fastnode snapshot (previously pruneancient and prune-block metadata, indicating the pruning height) and the block in db. Manually querying pebbledb reveals the following:

header: 57744288 hash: a8d13893ffe30ce22f02e36f85dc3515fa421159bb953e2d7e1fdf1ee29b9793
.....
header: 58104301 hash: 7cc7f0755cc1aa74b4e23efed91cbe786494e6c2f1addff9a5b00dc240ecc95b
offSetOfCurrentAncientFreezer: 57269088
offSetOfLastAncientFreezer: 56624202

This is the prune-block metadata. The pruning height is 57269088, but the oldest block height in pebbledb is 57744288, a difference of 475200. When freeze is performing the migration, it cannot find blocks 57269088 and later, resulting in an error.

When manually setting offSetOfCurrentAncientFreezer: 57744288, restart with v1.5.19 normally:

t=08-22|10:28:10.981 lvl=info msg="Found legacy offset in freezerDB, will reset freezer meta" offset=57744288
t=08-22|10:28:11.049 lvl=info msg="Opened ancient database" database=/server/snapshots/fastnode/data-seed/geth/chaindata/ancient/chain readonly=false tail=57744288 frozen=57744288
t=08-22|10:28:11.094 lvl=debug msg=freezeRangeWithBlobs from=57744288 to=57744301 err=<nil>
t=08-22|10:28:11.140 lvl=debug msg="Deep frozen chain segment" blocks=14 elapsed=83.951ms number=57744301 hash=0x332d5e0c16d6bf967567091065921b2ca7b514edae32b7e98a4fb047d3313ecc
t=08-22|10:28:11.140 lvl=debug msg="Chain freezer prune useless blobs, now ancient data is" from=55892461 to=58104301 cost=1.593µs

We also discovered a similar issue with version v1.5.16 when started without --pruneancient enabled. This issue was caused by discontinuities between offsets and database blocks. Enabling --pruneancient prevented any freezes from occurring due to a previous bug, and blocks accumulated in pebbledb.

Reference: #183

By the way, it only affects freeze logic, blocks will be accumulated in pebbledb, but the node can sync normally.

Could you help provide how you generate fastnode snapshots? It seems that after prune-block, some ancient data is lost when the snapshot is packaged.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions