Skip to content

sweepbatcher: fix reorg detection #975

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 24, 2025
Merged

Conversation

starius
Copy link
Collaborator

@starius starius commented Jul 9, 2025

The reorg channel is now passed to RegisterSpendNtfn, and waiting for spend notifications remains active even after the transaction receives its first confirmation. The dedicated goroutine previously used to wait for the spend is no longer needed, as we now handle both spend and potential reorg events in the main event loop while the batch is running.

Following this change, RegisterConfirmationsNtfn runs without a reorg channel, as it would only detect deep reorgs that undo the final confirmation - something we can't handle anyway. We can't track fully confirmed swaps indefinitely to guard against such rare reorgs; instead, we can mitigate the risk by increasing the required confirmation depth.

Also changed sweepbatcher to update feerate independently of AddSweep as well. AddSweep may not be called after getting the first confirmation, but feerate updates are still needed in case of reorg. Update test TestFeeRateGrows not to call AddSweep again and make sure feerate is updated itself.

Pull Request Checklist

  • Update release_notes.md if your PR contains major features, breaking changes or bugfixes

@starius starius force-pushed the spend-reorgs branch 2 times, most recently from 7763774 to c7f22b7 Compare July 9, 2025 05:33
@starius starius marked this pull request as ready for review July 9, 2025 05:48
@starius starius requested review from bhandras and hieblmi July 9, 2025 05:49
@starius starius marked this pull request as draft July 9, 2025 17:31
@starius starius marked this pull request as ready for review July 15, 2025 04:17
Copy link
Collaborator

@hieblmi hieblmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM! I just have a question about the behavior of RegisterSpendNtfn in case a reorg happens.

@@ -979,23 +979,26 @@ func (b *batch) Run(ctx context.Context) error {
return fmt.Errorf("handleSpend error: %w", err)
}

case err := <-b.spendErrChan:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to this, but since the Run method is quite long, could we move the code at the beginning for around initialDelay and initialDelayChan into a separate function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initialDelay, initialDelayChan and related variables in the beginning of Run are too connected to each other. In the current version they are all local and it is easy to follow the logic of waiting. How to factor out these variables without sacrificing readability?

I updated the last commit: factored out feerate updates to updateFeeRate method.

spendCtx, &primarySweep.outpoint, primarySweep.htlc.PkScript,
reorgChan := make(chan struct{}, 1)

spendChan, spendErrChan, err := b.chainNotifier.RegisterSpendNtfn(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will RegisterSpendNtfn keep spendChan and spendErrChan open and signal spends/errors after the reorg chan fired?
I assume yes, so this is a nice simplification of the spend-handling flow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I added an itest for LND to check this: lightningnetwork/lnd#10083

}

// minimumSweepFeeRate determines minimum feerate for a sweep.
func minimumSweepFeeRate(ctx context.Context, customFeeRate FeeRateProvider,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great that you ensured that AddSweep and Run in the reorg case call the same fee rate estimates.

Copy link
Member

@bhandras bhandras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉 (just one comment regarding conf target!)

@@ -10,7 +11,6 @@ import (
"github.com/lightninglabs/lndclient"
"github.com/lightningnetwork/lnd/chainntnfs"
"github.com/lightningnetwork/lnd/lnrpc/chainrpc"
"golang.org/x/net/context"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, nice catch!

case <-ctx.Done():
}
}()
b.spendChan = spendChan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simplification. I'd add a comment that this is safe to do as we're always calling monitorSpend from the runloop's goroutine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I added the comment.

return minFeeRate, nil
}

if sweepConfTarget == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LND's walletkit will fail with conftarget of 0 or 1. A way around would be override it to 2 if smaller?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this check in a separate commit. Note that this check is unlikely to fire, since this is already fixed in GetConfTargetAndFeeRate.

starius added 6 commits July 24, 2025 02:19
Since bb837a4 AddSweep loads sweeps so
one of possible errors is sql.ErrTxDone. Full error that is fixed:
> fetchSweeps failed: failed to load sweep 0000000000000000000101:1:
> failed to fetch sweep data for 010101000000:
> failed to fetch loop out for 010101000000:
> sql: transaction has already been committed or rolled back
The reorg channel is now passed to RegisterSpendNtfn, and waiting for spend
notifications remains active even after the transaction receives its first
confirmation. The dedicated goroutine previously used to wait for the spend
is no longer needed, as we now handle both spend and potential reorg events
in the main event loop while the batch is running.

Following this change, RegisterConfirmationsNtfn runs without a reorg channel,
as it would only detect deep reorgs that undo the final confirmation - something
we can't handle anyway. We can't track fully confirmed swaps indefinitely to
guard against such rare reorgs; instead, we can mitigate the risk by increasing
the required confirmation depth.
AddSweep may not be called after getting the first confirmation, but feerate
updates are still needed in case of reorg.

Update test TestFeeRateGrows not to call AddSweep again and make sure feerate
is updated itself.
This is already done in GetConfTargetAndFeeRate. The fix is just a double check.
@starius starius merged commit bce9b5d into lightninglabs:master Jul 24, 2025
4 checks passed
@starius starius deleted the spend-reorgs branch July 24, 2025 06:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants