Fix bandwidth related issues #1478

guggero · 2025-04-17T15:35:42Z

Depends on #1462.

Fixes #1471.
Fixes #1275.
Fixes #1374.

guggero · 2025-04-22T13:28:19Z

Fixed unit test, ready for review.

GeorgeTsagk

great fixes, have some comments, but overall looking good!

tapchannel/aux_invoice_manager.go

GeorgeTsagk · 2025-04-22T16:20:59Z

tapchannel/aux_invoice_manager.go

+
+	// We also need to validate that the HTLC is actually the correct asset
+	// and arrived through the correct asset channel.
+	channels, err := s.cfg.LightningClient.ListChannels(ctx, true, false)


This feels a bit off-place to call here, but I guess it can serve for a while

Could explore one of the following:
a1) Add a chanID filter to list channels, to get a performance boost (LSPs with 100s of chans would suffer a bit here -- reminder: this is called for each incoming shard, and number of shards is up to payer)
a2) Otherwise a more isolated and quicker approach would be to cache the lndclient.ChannelInfo (no LND changes)
b) Could add this strict assetID-set check over the channel but in a different part in the flow, where the channel info is already present and querying is not needed. We could let this "leak" into the commitment/allocation phase, but handle it gracefully there? Is definitely more involved and requires LND/aux-interface changes.

Hmm yeah was thinking something similar here. We don't actually store the custom chan data ourselves, so for now we must relay on lnd to deliver it to us. In other areas we pass it within the hook callbacks, but only the channels related to a given context.

IIUC, I think we can actually get rid of this extra RPC call:

With handleInvoiceAccept we're passed the actual invoice that we created.

This invoice has an scid in the hop hint.

We also have the circuit key here which contains the scid.

Can't we parse out the hop hint earlier in the pipeline, pass it in here, then assert that it matches the circuit key scid? Would need to check if this is the scid alias or not passed in...

Even with that though, we'd still the obtain the custom data from a channel to ensure that it can actually carry a given HTLC.

Even with that though, we'd still the obtain the custom data from a channel to ensure that it can actually carry a given HTLC.

Yeah, we really want to make sure things come in through the correct channel. Otherwise our peer could trick us.

But I added the peer to the ListChannels query, which should already cut down the size of the response by quite a large amount. And then I also added caching as @GeorgeTsagk suggested.
I think that should be fine in terms of performance.

tapchannel/aux_traffic_shaper.go

GeorgeTsagk · 2025-04-22T16:33:03Z

tapchannel/aux_traffic_shaper.go

+			return b.AssetID.Val
+		})...,
+	)
+	if !commitment.HasAllAssetIDs(htlcAssetIDs) {


just to make sure:

when using group keys the sender will locally have an HTLC asset id which would be the x-coord of the group key, but by the point where the other peer of the channel receives the asset HTLC records the asset coin selection already happened, so we should be expecting only real asset IDs at this point. Correct?

Haven't run the itests against this yet, but it will be caught if not true

Yeah, you're right. I didn't consider this so things actually failed here. Working on a fix right now.

Perhaps once we land this fix, we should commit to just including the group key in all relevant areas for HTLCs/RFQ? Otherwise we'll always need to keep in mind the little trick to remember that sometimes an asset ID can actually be a group key. Transmitting the group key within these wire messages and TLV blobs will also save us from needing to do a db/universe look up to verify that something that looks like a group key is actually a group key.

Yeah, I can attempt that in a follow-up PR. Though I think it might be quite a bit involved, but we'll see I guess.

ffranr

Subject to George's comments, LGTM 👍

Could do with an itest in terminal I suppose to ensure that this fix remains effective. Perhaps there is already a PR for that?

rfqmsg/custom_channel_data.go

tapchannelmsg/records.go

rfqmsg/custom_channel_data.go

Roasbeef · 2025-04-22T22:07:18Z

tapchannel/aux_invoice_manager.go

+
+	// We also need to validate that the HTLC is actually the correct asset
+	// and arrived through the correct asset channel.
+	channels, err := s.cfg.LightningClient.ListChannels(ctx, true, false)


Hmm yeah was thinking something similar here. We don't actually store the custom chan data ourselves, so for now we must relay on lnd to deliver it to us. In other areas we pass it within the hook callbacks, but only the channels related to a given context.

Roasbeef · 2025-04-22T22:09:46Z

tapchannel/aux_invoice_manager.go

+
+	// We also need to validate that the HTLC is actually the correct asset
+	// and arrived through the correct asset channel.
+	channels, err := s.cfg.LightningClient.ListChannels(ctx, true, false)


IIUC, I think we can actually get rid of this extra RPC call:

With handleInvoiceAccept we're passed the actual invoice that we created.

This invoice has an scid in the hop hint.

We also have the circuit key here which contains the scid.

Can't we parse out the hop hint earlier in the pipeline, pass it in here, then assert that it matches the circuit key scid? Would need to check if this is the scid alias or not passed in...

Even with that though, we'd still the obtain the custom data from a channel to ensure that it can actually carry a given HTLC.

tapchannel/aux_traffic_shaper.go

Roasbeef · 2025-04-22T22:22:21Z

tapchannel/aux_traffic_shaper.go

+			return b.AssetID.Val
+		})...,
+	)
+	if !commitment.HasAllAssetIDs(htlcAssetIDs) {


Perhaps once we land this fix, we should commit to just including the group key in all relevant areas for HTLCs/RFQ? Otherwise we'll always need to keep in mind the little trick to remember that sometimes an asset ID can actually be a group key. Transmitting the group key within these wire messages and TLV blobs will also save us from needing to do a db/universe look up to verify that something that looks like a group key is actually a group key.

To avoid needing to call spew.Sdump() when the trace log level isn't even being used, we wrap the calls in a function closure instead. That way the potentially CPU intensive spew only happens when the trace log level is actually enabled.

To be able to detect certain asset related routing conditions, we'll want to commit the group key of assets into the funding blob of the channel.

To easily find out if all asset IDs from a set are actually committed to a channel, we add helper functions for the different channel messages that we can encounter in our subsystems (depending on whether we get the message from a blob in a hook or as JSON over the RPC interface).

We need to make sure an asset HTLC actually comes through the correct channel that commits to that asset in the first place.

We move a block of code further down to where it's going to be used, so the next commit(s) will have a more easy to digest diff.

This fixes the first part of the issue: We didn't tell lnd that we wanted to handle traffic for non-asset channels. But that's wrong, because then lnd will pick non-asset channels for HTLCs in some situations. So we explicitly need to tell lnd there is no bandwidth in non-asset channels if an asset HTLC should be forwarded (or sent).

This is the third part of the fix: We need to make sure that we don't pick an asset channel that has the wrong type of assets when telling lnd what channel it can use.

To debug commitment issues, it's useful to log the exact asset leaf that is committed. To be able to easily decode (and compare) that commitment, we also add a unit test that outputs the leaf as JSON.

When re-anchoring a passive asset, we need to reset any time locks it previously had on it. Because the re-anchoring is a normal transfer with just a simple signature, we need to clear any previous restrictions. The ReAnchorPassiveAssets query basically needs to mirror what the asset.CopySpendTemplate() method does.

coveralls · 2025-04-23T13:49:43Z

Pull Request Test Coverage Report for Build 14619648845

Details

73 of 290 (25.17%) changed or added relevant lines in 10 files are covered.
20 unchanged lines in 9 files lost coverage.
Overall coverage increased (+0.06%) to 28.796%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
tapcfg/server.go	0	1	0.0%
tapdb/sqlc/transfers.sql.go	0	2	0.0%
tapchannelmsg/custom_channel_data.go	1	6	16.67%
tapsend/send.go	0	6	0.0%
rfqmsg/custom_channel_data.go	0	19	0.0%
tapchannel/aux_invoice_manager.go	40	60	66.67%
server.go	0	27	0.0%
tapchannelmsg/records.go	32	60	53.33%
tapchannel/aux_funding_controller.go	0	48	0.0%
tapchannel/aux_traffic_shaper.go	0	61	0.0%

Files with Coverage Reduction	New Missed Lines	%
server.go	1	0.0%
tapchannel/aux_invoice_manager.go	1	81.67%
address/mock.go	2	97.39%
asset/group_key.go	2	57.89%
commitment/tap.go	2	72.27%
tapgarden/planter.go	2	60.85%
rfqmsg/records.go	3	64.11%
tapchannel/aux_leaf_signer.go	3	43.43%
tapchannel/aux_traffic_shaper.go	4	0.0%

Totals
Change from base Build 14613178404:	0.06%
Covered Lines:	26734
Relevant Lines:	92838

💛 - Coveralls

guggero · 2025-04-23T14:06:11Z

Ready for re-review. Fixed all bugs and also updated the corresponding integration tests.

The litd itests should also pass in the next re-try.

GeorgeTsagk

LGTM 🐰

bhandras mentioned this pull request Apr 17, 2025

taprpc: support custom lock ID and expiration in CommitVirtualPsbts #1475

Merged

GeorgeTsagk force-pushed the aux-bandwidth-htlcview branch from 9500e22 to 8958945 Compare April 17, 2025 16:20

guggero force-pushed the bandwidth-no-assets branch from 8906450 to 09c7f4f Compare April 22, 2025 13:26

guggero marked this pull request as ready for review April 22, 2025 13:28

guggero requested review from GeorgeTsagk and ffranr April 22, 2025 14:13

GeorgeTsagk force-pushed the aux-bandwidth-htlcview branch from 8958945 to 3b5aace Compare April 22, 2025 14:50

GeorgeTsagk reviewed Apr 22, 2025

View reviewed changes

guggero force-pushed the bandwidth-no-assets branch from 09c7f4f to ac55403 Compare April 22, 2025 16:54

guggero changed the base branch from aux-bandwidth-htlcview to main April 22, 2025 16:54

ffranr approved these changes Apr 22, 2025

View reviewed changes

rfqmsg/custom_channel_data.go Outdated Show resolved Hide resolved

tapchannelmsg/records.go Outdated Show resolved Hide resolved

Roasbeef reviewed Apr 22, 2025

View reviewed changes

guggero force-pushed the bandwidth-no-assets branch 2 times, most recently from 12194ad to ca6de1a Compare April 23, 2025 13:30

guggero mentioned this pull request Apr 23, 2025

mod+liquidity+test: update to latest lndclient+tapd lightninglabs/loop#924

Merged

1 task

guggero added 10 commits April 23, 2025 15:39

mod+docs+server: bump to latest lnd version

7e07928

server: use log closures for better performance

443de9f

To avoid needing to call spew.Sdump() when the trace log level isn't even being used, we wrap the calls in a function closure instead. That way the potentially CPU intensive spew only happens when the trace log level is actually enabled.

tapchannel+tapchannelmsg: add group key to funding blob

00b5288

To be able to detect certain asset related routing conditions, we'll want to commit the group key of assets into the funding blob of the channel.

tapcfg+tapchannel: validate incoming channel in invoice mgr

12f8993

We need to make sure an asset HTLC actually comes through the correct channel that commits to that asset in the first place.

tapchannel: move code as preparation

6939849

We move a block of code further down to where it's going to be used, so the next commit(s) will have a more easy to digest diff.

tapchannel: validate channel assets for bandwidth

b0f4d23

This is the third part of the fix: We need to make sure that we don't pick an asset channel that has the wrong type of assets when telling lnd what channel it can use.

asset+tapsend: add more debugging statements and helpers

ef98567

To debug commitment issues, it's useful to log the exact asset leaf that is committed. To be able to easily decode (and compare) that commitment, we also add a unit test that outputs the leaf as JSON.

guggero force-pushed the bandwidth-no-assets branch from ca6de1a to 8133f36 Compare April 23, 2025 13:39

guggero requested a review from GeorgeTsagk April 23, 2025 14:06

guggero requested a review from Roasbeef April 23, 2025 14:06

This was referenced Apr 23, 2025

tapd: add itest for grouped asset channel funding lightninglabs/lightning-terminal#987

Merged

can't build latest litd with latest lnd lightninglabs/lightning-terminal#1044

Closed

Allow setting sats/msats to taprpc.AddInvoice #1448

Merged

GeorgeTsagk approved these changes Apr 24, 2025

View reviewed changes

guggero merged commit 89a3e17 into main Apr 24, 2025
18 checks passed

guggero deleted the bandwidth-no-assets branch April 24, 2025 09:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bandwidth related issues #1478

Fix bandwidth related issues #1478

guggero commented Apr 17, 2025 •

edited

Loading

guggero commented Apr 22, 2025

GeorgeTsagk left a comment

GeorgeTsagk Apr 22, 2025

Roasbeef Apr 22, 2025

Roasbeef Apr 22, 2025

guggero Apr 23, 2025

GeorgeTsagk Apr 22, 2025

GeorgeTsagk Apr 22, 2025 •

edited

Loading

guggero Apr 22, 2025

Roasbeef Apr 22, 2025

guggero Apr 23, 2025

ffranr left a comment

Roasbeef Apr 22, 2025

Roasbeef Apr 22, 2025

Roasbeef Apr 22, 2025

coveralls commented Apr 23, 2025 •

edited

Loading

guggero commented Apr 23, 2025

GeorgeTsagk left a comment

Fix bandwidth related issues #1478

Fix bandwidth related issues #1478

Conversation

guggero commented Apr 17, 2025 • edited Loading

guggero commented Apr 22, 2025

GeorgeTsagk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GeorgeTsagk Apr 22, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ffranr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Apr 23, 2025 • edited Loading

Pull Request Test Coverage Report for Build 14619648845

Details

💛 - Coveralls

guggero commented Apr 23, 2025

GeorgeTsagk left a comment

Choose a reason for hiding this comment

guggero commented Apr 17, 2025 •

edited

Loading

GeorgeTsagk Apr 22, 2025 •

edited

Loading

coveralls commented Apr 23, 2025 •

edited

Loading