-
Notifications
You must be signed in to change notification settings - Fork 586
Description
(I'm still trying to reproduce this locally. I got the crash with --hardfork-mode auto enabled - which we still have to rename - but the code looks like it could be susceptible to the same bug without that enabled. I also did not save the exact crash message, unfortunately.).
I was periodically syncing a daemon to devnet, with and without --hardfork-mode auto, and the daemon crashed in this bit of the code while --hardfork-mode auto was enabled:
mina/src/lib/consensus/proof_of_stake.ml
Lines 2663 to 2672 in b0ab82e
| let root_ledger_of_snapshot snapshot snapshot_config = | |
| O1trace.sync_thread "root_ledger_of_snapshot" (fun () -> | |
| match snapshot.ledger with | |
| | Ledger_snapshot.Ledger_root ledger -> | |
| Ok ledger | |
| | Ledger_snapshot.Genesis_epoch_ledger packed -> | |
| Genesis_ledger.Packed.create_root packed | |
| ~config:snapshot_config | |
| ~depth:Context.constraint_constants.ledger_depth () ) | |
| in |
The create_root function threw an exception when trying to sync one of the epoch snapshots because the rocksdb checkpoint failed - the target directory of the checkpoint already existed. In other words, there was an epoch ledger snapshot already at the snapshot_config location while the daemon was still at the genesis epoch snapshot.
This was not failing in my local testing before - it may have started because of #17874. Before that PR, we'd do this in this situation:
mina/src/lib/consensus/proof_of_stake.ml
Lines 2668 to 2676 in 2c70f34
| | Ledger_snapshot.Genesis_epoch_ledger packed -> | |
| let fresh_root_ledger = | |
| Mina_ledger.Ledger.Root.create ~logger | |
| ~config:snapshot_config | |
| ~depth:Context.constraint_constants.ledger_depth | |
| () | |
| in | |
| Genesis_ledger.Packed.populate_root packed | |
| fresh_root_ledger ) |
That Leder.Root.create would open up whatever database is present at that config location. (The code before we made all these root ledger handling changes did the same thing). It would then overwrite the contents of the database with the genesis ledger, and then sync the ledger to the network. Thus, the daemon did not have to care about cleaning up an old epoch ledger database that was lying around.
I'm unsure of a few things:
- If this can be reproduced with
--hardfork-mode auto, or if I can get this to show up without that enabled. (I'm still looking at it). - If the daemon was correctly at the genesis epoch ledger snapshots at the moment it crashed.
We might want to add some code to delete any snapshot backing that might be present at the snapshot_config location before creating a new root from genesis. Though, if this only shows up with --hardfork-mode auto, then this kind of failure might be the result of a bug elsewhere.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status