Skip to content

Conversation

@teor2345
Copy link
Contributor

@teor2345 teor2345 commented Sep 15, 2025

I'm working on fork monitoring, so I was looking at how our fork choice rule works in detail.

I think the current implementation is valid. But the code comments are an over-simplification - they don't clearly describe how the code chooses between forks.

The implementation sums the chain weights into a single u128, which means it is possible (but very unlikely) that a shorter fork can be temporarily chosen as the best fork. This could happen on an era transition, if the solution range differences between the forks are large.

Also, a solution can be accepted by one fork but rejected by others at the start of an era, due to different solution ranges (from different block times).

So downstream consumers can't depend on the longest fork always being chosen every time. But the longest fork will be chosen after more blocks, with increasingly high probability after each new block.

Test fixes

This PR also fixes the fork choice weights for the test runtime, where all blocks had zero weight (because the runtime accepts all solutions).

This is a particular issue in tests where multiple nodes produce blocks, because they will all choose their own blocks, and never reorg.

Feedback requests

Some of the detail in the code comments (or this PR description) might be better in the spec. Please let me know which parts you'd like me to move.

I'll only merge this after we've got both the spec and code changes sorted.

Why audit?

I would like this audited because the fork choice is an important security property, and it needs to be documented accurately. Missing or outdated documentation makes it easier to introduce security issues accidentally.

Next steps

After this has been checked, we need to update the spec in a similar way:
https://github.com/subspace/protocol-specs/blob/0591efc437a095237c01b1bc61212743aea0419e/docs/decex/workflow.md?plain=1#L286-L288

(The fork choice is important for compatible implementations.)

Code contributor checklist:

@teor2345 teor2345 self-assigned this Sep 15, 2025
@teor2345 teor2345 requested a review from nazar-pc as a code owner September 15, 2025 08:43
@teor2345 teor2345 added documentation Improvements or additions to documentation node Node (service library/node app) audit-P3 Low audit priority labels Sep 15, 2025
@immunefi-magnus
Copy link

🛡️ Immunefi PR Reviews

We noticed that your project isn't set up for automatic code reviews. If you'd like this PR reviewed by the Immunefi team, you can request it manually using the link below:

🔗 Send this PR in for review

Once submitted, we'll take care of assigning a reviewer and follow up here.

@teor2345 teor2345 marked this pull request as draft September 15, 2025 13:59
@teor2345 teor2345 marked this pull request as ready for review September 16, 2025 08:18
Copy link
Contributor Author

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been heavily revised, and it's now ready for review again.

There are edge cases, but they're not the ones I originally expected. (And the worst one only happens on the test runtime.)

@teor2345
Copy link
Contributor Author

teor2345 commented Sep 22, 2025

I moved any future rule changes to a ticket: #3703

@vedhavyas vedhavyas added this pull request to the merge queue Oct 14, 2025
Merged via the queue into main with commit 177c423 Oct 14, 2025
16 checks passed
@vedhavyas vedhavyas deleted the fork-choice-doc branch October 14, 2025 09:18
//
// This almost always prioritises:
// - the longest chain (the largest number of solutions), and if there is a tie
// - the strictest solutions (the numerically smallest solution ranges).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solutions do not matter. As you can see above, the calculation of added weight is only based on the solution range in the era, and not the quality (strictness in your terminology) of individual solutions. It was (incorrectly) based on quality of individual solutions long time ago during one of the testnets and it was susceptible to block withholding attacks and maybe something else too. So it is just the heaviest chain rule like Jeremy commented originally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this could be clearer as

Suggested change
// - the strictest solutions (the numerically smallest solution ranges).
// - the strictest solution ranges (the numerically smallest solution ranges).

Comment on lines +702 to +703
// - one fork accepts a solution as within its range, but another with a narrower range
// does not, or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this description makes a lot of sense. Era transition is based on the block number. So if you have different solutions, the parents must be different as well. A different fork will not accept the block not because its solution is not good enough, but rather because the parent is invalid/unknown to begin with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this would be clearer as:

Suggested change
// - one fork accepts a solution as within its range, but another with a narrower range
// does not, or
// - one fork accepts a solution as within its range, but another with a narrower range
// does not accept another solution with similar difficulty, or

It was meant to be a security argument about difficulty, not solution re-use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

audit-P3 Low audit priority documentation Improvements or additions to documentation node Node (service library/node app)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants