Skip to content

Fix bandwidth related issues #1478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 24, 2025
Merged

Fix bandwidth related issues #1478

merged 10 commits into from
Apr 24, 2025

Conversation

guggero
Copy link
Member

@guggero guggero commented Apr 17, 2025

Depends on #1462.

Fixes #1471.
Fixes #1275.
Fixes #1374.

@guggero
Copy link
Member Author

guggero commented Apr 22, 2025

Fixed unit test, ready for review.

@guggero guggero marked this pull request as ready for review April 22, 2025 13:28
@guggero guggero requested review from GeorgeTsagk and ffranr April 22, 2025 14:13
@GeorgeTsagk GeorgeTsagk force-pushed the aux-bandwidth-htlcview branch from 8958945 to 3b5aace Compare April 22, 2025 14:50
Copy link
Member

@GeorgeTsagk GeorgeTsagk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great fixes, have some comments, but overall looking good!


// We also need to validate that the HTLC is actually the correct asset
// and arrived through the correct asset channel.
channels, err := s.cfg.LightningClient.ListChannels(ctx, true, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit off-place to call here, but I guess it can serve for a while

Could explore one of the following:
a1) Add a chanID filter to list channels, to get a performance boost (LSPs with 100s of chans would suffer a bit here -- reminder: this is called for each incoming shard, and number of shards is up to payer)
a2) Otherwise a more isolated and quicker approach would be to cache the lndclient.ChannelInfo (no LND changes)
b) Could add this strict assetID-set check over the channel but in a different part in the flow, where the channel info is already present and querying is not needed. We could let this "leak" into the commitment/allocation phase, but handle it gracefully there? Is definitely more involved and requires LND/aux-interface changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah was thinking something similar here. We don't actually store the custom chan data ourselves, so for now we must relay on lnd to deliver it to us. In other areas we pass it within the hook callbacks, but only the channels related to a given context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, I think we can actually get rid of this extra RPC call:

  • With handleInvoiceAccept we're passed the actual invoice that we created.
  • This invoice has an scid in the hop hint.
  • We also have the circuit key here which contains the scid.

Can't we parse out the hop hint earlier in the pipeline, pass it in here, then assert that it matches the circuit key scid? Would need to check if this is the scid alias or not passed in...

Even with that though, we'd still the obtain the custom data from a channel to ensure that it can actually carry a given HTLC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with that though, we'd still the obtain the custom data from a channel to ensure that it can actually carry a given HTLC.

Yeah, we really want to make sure things come in through the correct channel. Otherwise our peer could trick us.

But I added the peer to the ListChannels query, which should already cut down the size of the response by quite a large amount. And then I also added caching as @GeorgeTsagk suggested.
I think that should be fine in terms of performance.

return b.AssetID.Val
})...,
)
if !commitment.HasAllAssetIDs(htlcAssetIDs) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to make sure:

when using group keys the sender will locally have an HTLC asset id which would be the x-coord of the group key, but by the point where the other peer of the channel receives the asset HTLC records the asset coin selection already happened, so we should be expecting only real asset IDs at this point. Correct?

Copy link
Member

@GeorgeTsagk GeorgeTsagk Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't run the itests against this yet, but it will be caught if not true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right. I didn't consider this so things actually failed here. Working on a fix right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps once we land this fix, we should commit to just including the group key in all relevant areas for HTLCs/RFQ? Otherwise we'll always need to keep in mind the little trick to remember that sometimes an asset ID can actually be a group key. Transmitting the group key within these wire messages and TLV blobs will also save us from needing to do a db/universe look up to verify that something that looks like a group key is actually a group key.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can attempt that in a follow-up PR. Though I think it might be quite a bit involved, but we'll see I guess.

@guggero guggero force-pushed the bandwidth-no-assets branch from 09c7f4f to ac55403 Compare April 22, 2025 16:54
@guggero guggero changed the base branch from aux-bandwidth-htlcview to main April 22, 2025 16:54
Copy link
Contributor

@ffranr ffranr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subject to George's comments, LGTM 👍

Could do with an itest in terminal I suppose to ensure that this fix remains effective. Perhaps there is already a PR for that?


// We also need to validate that the HTLC is actually the correct asset
// and arrived through the correct asset channel.
channels, err := s.cfg.LightningClient.ListChannels(ctx, true, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah was thinking something similar here. We don't actually store the custom chan data ourselves, so for now we must relay on lnd to deliver it to us. In other areas we pass it within the hook callbacks, but only the channels related to a given context.


// We also need to validate that the HTLC is actually the correct asset
// and arrived through the correct asset channel.
channels, err := s.cfg.LightningClient.ListChannels(ctx, true, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, I think we can actually get rid of this extra RPC call:

  • With handleInvoiceAccept we're passed the actual invoice that we created.
  • This invoice has an scid in the hop hint.
  • We also have the circuit key here which contains the scid.

Can't we parse out the hop hint earlier in the pipeline, pass it in here, then assert that it matches the circuit key scid? Would need to check if this is the scid alias or not passed in...

Even with that though, we'd still the obtain the custom data from a channel to ensure that it can actually carry a given HTLC.

return b.AssetID.Val
})...,
)
if !commitment.HasAllAssetIDs(htlcAssetIDs) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps once we land this fix, we should commit to just including the group key in all relevant areas for HTLCs/RFQ? Otherwise we'll always need to keep in mind the little trick to remember that sometimes an asset ID can actually be a group key. Transmitting the group key within these wire messages and TLV blobs will also save us from needing to do a db/universe look up to verify that something that looks like a group key is actually a group key.

@guggero guggero force-pushed the bandwidth-no-assets branch 2 times, most recently from 12194ad to ca6de1a Compare April 23, 2025 13:30
guggero added 10 commits April 23, 2025 15:39
To avoid needing to call spew.Sdump() when the trace log level isn't
even being used, we wrap the calls in a function closure instead. That
way the potentially CPU intensive spew only happens when the trace log
level is actually enabled.
To be able to detect certain asset related routing conditions, we'll
want to commit the group key of assets into the funding blob of the
channel.
To easily find out if all asset IDs from a set are actually committed to
a channel, we add helper functions for the different channel messages
that we can encounter in our subsystems (depending on whether we get the
message from a blob in a hook or as JSON over the RPC interface).
We need to make sure an asset HTLC actually comes through the correct
channel that commits to that asset in the first place.
We move a block of code further down to where it's going to be used, so
the next commit(s) will have a more easy to digest diff.
This fixes the first part of the issue: We didn't tell lnd that we
wanted to handle traffic for non-asset channels. But that's wrong,
because then lnd will pick non-asset channels for HTLCs in some
situations.
So we explicitly need to tell lnd there is no bandwidth in non-asset
channels if an asset HTLC should be forwarded (or sent).
This is the third part of the fix: We need to make sure that we don't
pick an asset channel that has the wrong type of assets when telling lnd
what channel it can use.
To debug commitment issues, it's useful to log the exact asset leaf that
is committed. To be able to easily decode (and compare) that commitment,
we also add a unit test that outputs the leaf as JSON.
When re-anchoring a passive asset, we need to reset any time locks it
previously had on it. Because the re-anchoring is a normal transfer with
just a simple signature, we need to clear any previous restrictions.
The ReAnchorPassiveAssets query basically needs to mirror what the
asset.CopySpendTemplate() method does.
@guggero guggero force-pushed the bandwidth-no-assets branch from ca6de1a to 8133f36 Compare April 23, 2025 13:39
@coveralls
Copy link

coveralls commented Apr 23, 2025

Pull Request Test Coverage Report for Build 14619648845

Details

  • 73 of 290 (25.17%) changed or added relevant lines in 10 files are covered.
  • 20 unchanged lines in 9 files lost coverage.
  • Overall coverage increased (+0.06%) to 28.796%

Changes Missing Coverage Covered Lines Changed/Added Lines %
tapcfg/server.go 0 1 0.0%
tapdb/sqlc/transfers.sql.go 0 2 0.0%
tapchannelmsg/custom_channel_data.go 1 6 16.67%
tapsend/send.go 0 6 0.0%
rfqmsg/custom_channel_data.go 0 19 0.0%
tapchannel/aux_invoice_manager.go 40 60 66.67%
server.go 0 27 0.0%
tapchannelmsg/records.go 32 60 53.33%
tapchannel/aux_funding_controller.go 0 48 0.0%
tapchannel/aux_traffic_shaper.go 0 61 0.0%
Files with Coverage Reduction New Missed Lines %
server.go 1 0.0%
tapchannel/aux_invoice_manager.go 1 81.67%
address/mock.go 2 97.39%
asset/group_key.go 2 57.89%
commitment/tap.go 2 72.27%
tapgarden/planter.go 2 60.85%
rfqmsg/records.go 3 64.11%
tapchannel/aux_leaf_signer.go 3 43.43%
tapchannel/aux_traffic_shaper.go 4 0.0%
Totals Coverage Status
Change from base Build 14613178404: 0.06%
Covered Lines: 26734
Relevant Lines: 92838

💛 - Coveralls

@guggero
Copy link
Member Author

guggero commented Apr 23, 2025

Ready for re-review. Fixed all bugs and also updated the corresponding integration tests.

The litd itests should also pass in the next re-try.

@guggero guggero requested a review from GeorgeTsagk April 23, 2025 14:06
Copy link
Member

@GeorgeTsagk GeorgeTsagk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐰

@guggero guggero merged commit 89a3e17 into main Apr 24, 2025
18 checks passed
@guggero guggero deleted the bandwidth-no-assets branch April 24, 2025 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants