Attribution data (feature 36/37) #1044

joostjager · 2022-11-21T11:28:26Z

Failure attribution is important to properly penalize nodes after a payment failure occurs. The goal of the penalty is to give the next attempt a better chance at succeeding. In the happy failure flow, the sender is able to determine the origin of the failure and penalizes a single node or pair of nodes.

Unfortunately it is possible for nodes on the route to hide themselves. If they return random data as the failure message, the sender won't know where the failure happened.

This PR proposes a new failure message format that lets each node commit to the failure message. If one of the nodes corrupts the failure message, the sender will be able to identify that node.

For more information, see https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-October/003723.html.

Furthermore, the htlc fail and fulfill flows are extended to convey self-reported htlc hold times to the sender.

Fail flow implementations

LND implementation: lightningnetwork/lnd#9888

LDK implementation: lightningdevkit/rust-lightning#2256

Eclair implementation: ACINQ/eclair#3065

CLN implementation: ElementsProject/lightning#8291

Fulfill flow implementations

LDK implementation: lightningdevkit/rust-lightning#3801

Eclair implementation: ACINQ/eclair#3100

04-onion-routing.md

thomash-acinq · 2022-12-06T14:53:54Z

I've started implementing it in eclair, do you have some test vectors so we can check that we are compatible?
The design seems good to me, but as I've said previously, I think keeping hop payloads and hmacs for 8 nodes only (instead of 27) is enough for almost all use cases and would give us huge size savings.

joostjager · 2022-12-06T15:41:32Z

I don't have test vectors yet, but I can produce them. Will add them to this PR when ready.

Capping the max hops at a lower number is fine to me, but do you have a scenario in mind where this would really make the difference? Or is it to more generally that everything above 8 is wasteful?

joostjager · 2022-12-06T16:53:09Z

@thomash-acinq added a happy fat error test vector.

04-onion-routing.md

joostjager · 2022-12-07T09:27:19Z

09-features.md

@@ -41,6 +41,7 @@ The Context column decodes as follows:
 | 20/21 | `option_anchor_outputs`          | Anchor outputs                                            | IN       | `option_static_remotekey` | [BOLT #3](03-transactions.md)         |
 | 22/23 | `option_anchors_zero_fee_htlc_tx` | Anchor commitment type with zero fee HTLC transactions   | IN       | `option_static_remotekey` | [BOLT #3][bolt03-htlc-tx], [lightning-dev][ml-sighash-single-harmful]|
 | 26/27 | `option_shutdown_anysegwit`         | Future segwit versions allowed in `shutdown`              | IN       |                   | [BOLT #2][bolt02-shutdown]   |
+| 28/29 | `option_fat_error`               | Can generate/relay fat errors in `update_fail_htlc`       | IN       |                   | [BOLT #4][bolt04-fat-errors]   |


I think this big gap in the bits has emerged here because of tentative spec changes that may or may not make it. Not sure why that is necessary. I thought for unofficial extensions, the custom range is supposed to be used?

I can see that with unofficial features deployed in the wild, it is easier to keep the same bit when something becomes official. But not sure if that is worth creating the gap here? An alternative is to deploy unofficial features in the custom range first, and then later recognize both the official and unofficial bit. Slightly more code, but this feature list remains clean.

joostjager · 2022-12-07T09:27:48Z

Added fat error signaling to the PR.

04-onion-routing.md

thomash-acinq · 2022-12-09T17:12:30Z

I've spent a lot of time trying to make the test vector pass and I've finally found what was wrong:
In the spec you write that the hmac covers

failure_len, failuremsg, pad_len and pad.

The first y+1 payloads in payloads. For example, hmac_0_2 would cover all three payloads.

y downstream hmacs that correspond to downstream node positions relative to x. For example, hmac_0_2 would cover hmac_1_1 and hmac_2_0.

implying that we need to concatenate them in that order. But in your code you follow a different order:

// Include payloads including our own.
_, _ = hash.Write(payloads[:(NumMaxHops-position)*payloadLen])

// Include downstream hmacs.
var hmacsIdx = position + NumMaxHops
for j := 0; j < NumMaxHops-position-1; j++ {
	_, _ = hash.Write(
		hmacs[hmacsIdx*sha256.Size : (hmacsIdx+1)*sha256.Size],
	)

	hmacsIdx += NumMaxHops - j - 1
}

// Include message.
_, _ = hash.Write(message)

I think the order message + hop payloads + hmacs is more intuitive as it matches the order of the fields in the packet.

joostjager · 2022-12-09T17:16:40Z

Oh great catch! Will produce a new vector.

joostjager · 2022-12-12T07:34:33Z

@thomash-acinq updated vector

04-onion-routing.md

joostjager · 2023-01-23T15:38:27Z

Updated LND implementation with sender-picked fat error structure parameters: lightningnetwork/lnd#7139

04-onion-routing.md

Implements lightning/bolts#1044

DerEwige · 2025-05-20T12:48:30Z

I’m very much against the intentional slowing down of HTLC resolves.

Every milli second a HTLC is unresolved it poses a risk to the node runner.
One of the common causes of force closures are stuck HTLCs.

A HTLC gets stuck when a node goes offline after the HTLC passed that node.
So the failure or success can not propagate backwards through it.
The chances of a HTLC getting stuck is proportional to the time the HTLC needs to resolve.

So if you intentionally slow down the HTLC resolve speed, you also increase the number of force closures.

I would consider the intentional hold of a HTLC of more than 10ms as an jamming attack.

t-bast · 2025-05-20T13:15:06Z

A HTLC gets stuck when a node goes offline after the HTLC passed that node.
So the failure or success can not propagate backwards through it.
The chances of a HTLC getting stuck is proportional to the time the HTLC needs to resolve.
So if you intentionally slow down the HTLC resolve speed, you also increase the number of force closures.

That's not at all a valid conclusion. HTLCs get stuck and channels force-close because of bugs, not because nodes go temporarily offline or introduce batching/delays. That's completely unrelated.

I 100% agree with you that all bugs that lead to stuck HTLCs or force-closed channels must be fixed and should be the highest priority for all implementations. But that has absolutely nothing to do with whether or not we should introduce delays or measure relay latency in buckets of 100ms.

DerEwige · 2025-05-20T14:14:36Z

That's not at all a valid conclusion. HTLCs get stuck and channels force-close because of bugs, not because nodes go temporarily offline or introduce batching/delays. That's completely unrelated.

I 100% agree with you that all bugs that lead to stuck HTLCs or force-closed channels must be fixed and should be the highest priority for all implementations. But that has absolutely nothing to do with whether or not we should introduce delays or measure relay latency in buckets of 100ms.

I do agree, that it is usually a bug or sometimes negligence, that is the root cause of most force-closes.

I also totally agree that there should be an easy way to measure relay latency.
Currently the only way to measure relay latency is by aggregating a lot of data (100’000+ payments) and calculate the time for each channel from that data.

In fact I do exactly this to generate the list you can find here on the “node speeds” tab
https://docs.google.com/spreadsheets/d/1gn2TY3zDoPdk46yYQfRhGb2_YDepQEn7l3qtHWCxov4
And I would love to have simpler and more accurate way to do this. And 100ms accuracy in the reported time would also be good enough.

But saying that adding a relay delay would not have an impact of the number of force-closes is - in my opinion - wrong.

If a channel with active HTLC goes offline and does not come online again before the HTLC reaches its timeout, this will lead to a force-close.
The chance of a channel going offline with an active HTLC on it increases with the time a HTLC lingers on the channel.

I send out more than 100’000 HTLCs each day.
0.1s extra per hop would add hours of exposure each day.

carlaKC · 2025-05-23T17:49:10Z

Wrote up some thoughts on delving that I think are worth considering when we think about latency measurements and privacy: https://delvingbitcoin.org/t/latency-and-privacy-in-lightning/1723.

joostjager · 2025-05-27T12:56:40Z

I did a rough implementation in LDK of hold times for the success case. It turned out to be straight-forward, mostly reusing code from the failure case. Test vectors in #1261

carlaKC

Logic all looks good, but would like to restructure the Returning Errors section to follow the requirements / rationale structure that we have elsewhere in the spec - as is the two are interspersed.

Also needs to refer to option_attributable_failures rather than "attributable failures" in a few place.

02-peer-protocol.md

carlaKC · 2025-06-02T13:19:45Z

09-features.md

@@ -46,6 +46,7 @@ The Context column decodes as follows:
 | 26/27 | `option_shutdown_anysegwit`       | Future segwit versions allowed in `shutdown`              | IN       |                             | [BOLT #2][bolt02-shutdown]                                            |
 | 28/29 | `option_dual_fund`                | Use v2 of channel open, enables dual funding              | IN       |                             | [BOLT #2](02-peer-protocol.md)                                        |
 | 34/35 | `option_quiesce`                  | Support for `stfu` message                                | IN       |                             | [BOLT #2][bolt02-quiescence]                                          |
+| 36/37 | `option_attributable_failure`      | Can generate/relay attributable failures in `update_fail_htlc`       | IN9      |                   | [BOLT #4][bolt04-attributable-failures]   |


Doesn't need to be in invoices (9)?

It is indeed the question whether there are scenarios where senders would refuse a payment that pays to a node that doesn't support attribution data? If they'd return random data, and their predecessor adds attribution data, blame still lands on the final node pair.

Maybe attribution data is unnecessary for the final hop? 🤔 Of course they still want to pass back something to not stand out as a recipient.

For hold times, they will probably always report zero anyway.

We don't have any instructions in this PR for how to react to the presence / absence of this feature bit in an invoice, so I think we can take it out?

Seems easy enough to come back and add signaling in invoices (+ handling instructions) if we want it in the future?

carlaKC · 2025-06-02T13:22:13Z

04-onion-routing.md

@@ -579,6 +579,9 @@ should add a random delay before forwarding the error. Failures are likely to
 be probing attempts and message timing may help the attacker infer its distance
 to the final recipient.

+Note that nodes on the blinded route return failures through `update_fail_malformed_htlc` and therefore do not and can


nit: note that nodes on -> note that nodes in?

Fixed. Language skills insufficient for me to flag 'on the route' as incorrect.

carlaKC · 2025-06-02T13:26:15Z

04-onion-routing.md

-Upon receiving a return packet, each hop generates its `ammag`, generates the
-pseudo-random byte stream, and applies the result to the return packet before
-return-forwarding it.
+When supported, the erring node should also populate the `attribution_data` field in `update_fail_htlc` consisting of the following data:


I think that we can more formally specify this:

if option_attributable_failure is advertised:

if path_key is not set in the incoming update_add_htlc:

MUST include htlc_hold_times in payload.

carlaKC · 2025-06-02T13:27:08Z

04-onion-routing.md

+1. data:
+    * [`20*u32`:`htlc_hold_times`]
+    * [`210*sha256[..4]`:`truncated_hmacs`]


Duplicated here and in BOLT02 - perhaps just link to the BOLT02 description?

04-onion-routing.md

carlaKC · 2025-06-02T13:35:14Z

04-onion-routing.md

 In addition, each node locally stores data regarding its own sending peer in the
 route, so it knows where to return-forward any eventual return packets.
-The node generating the error message (_erring node_) builds a return packet
+
+## Erring node


General comment on this section: I'm missing the structure of requirements/rationale that we have elsewhere in the specification.

For example, the section for the erring node notes "The sender can use this information to score nodes on latency". To me this should be in the origin node section or all contained in a single rationale at the end. Ditto with the game theory of guessing HMACs.

I see what you mean. I've added a commit trying to address this and improve it a little, but it remains difficult to capture this in requirements and still have explanation too I found.

04-onion-routing.md

brh28

I didn't see the start time and end time of the htlc_hold_time defined. I assume these are upon receiving the update_add_htlc and before sending the update_fulfill_htlc/update_fail_htlc, respectively?

Overall, I like the idea of htlc_hold_time but suspect it'll be gamed by routing nodes.

Roasbeef · 2025-06-06T22:54:14Z

Overall, I like the idea of htlc_hold_time but suspect it'll be gamed by routing nodes.

A motivated sender can always point point what the actual forwarding delays are by bisecting the route. They also always know how long the entire route took as well. You are correct though that it's intended mainly on a best effort basis, the overall assumption is that this can be used to allow a "fast" node to distinguish themselves, and be rewarded for that.

Attribution data is added to both failed and fulfilled HTLCs lightning/bolts#1044

In lightning#1044, we introduced a 12-blocks delay before considering a channel closed when we see a spending confirmation on-chain. This ensures that if the channel was spliced instead of closed, the channel participants are able to broadcast a new `channel_announcement` to the rest of the network. If this new `channel_announcement` is received by renote nodes before the 12-blocks delay, the channel can keep its history in path finding scoring/reputation, which is important. We then realize that 12 blocks probably wasn't enough to allow this to happen: some implementations default to 8 confirmations before sending `splice_locked`, and allow node operators to increase this value. We thus bump this delay to 72 blocks to allow more time before channels are removed from the local graph.

joostjager mentioned this pull request Nov 21, 2022

Lightning Specification Meeting 2022/11/21 #1041

Closed

28 tasks

tnull reviewed Nov 21, 2022

View reviewed changes

04-onion-routing.md Outdated Show resolved Hide resolved

joostjager mentioned this pull request Nov 22, 2022

draft: Staking Credentials token issuance/redemption #1043

Draft

t-bast mentioned this pull request Dec 5, 2022

Lightning Specification Meeting 2022/12/05 #1046

Closed

28 tasks

joostjager mentioned this pull request Dec 6, 2022

Attributable errors lightningnetwork/lightning-onion#60

Open

thomash-acinq mentioned this pull request Dec 6, 2022

Attributable errors ACINQ/eclair#2519

Closed

joostjager force-pushed the fat-errors branch 2 times, most recently from 4b48481 to 24b10d5 Compare December 6, 2022 16:52

joostjager force-pushed the fat-errors branch from 24b10d5 to 76dbf21 Compare December 7, 2022 09:14

joostjager commented Dec 7, 2022

View reviewed changes

04-onion-routing.md Outdated Show resolved Hide resolved

joostjager commented Dec 7, 2022

View reviewed changes

04-onion-routing.md Outdated Show resolved Hide resolved

joostjager commented Dec 7, 2022

View reviewed changes

joostjager force-pushed the fat-errors branch from 76dbf21 to 2de919a Compare December 7, 2022 14:53

joostjager mentioned this pull request Dec 7, 2022

htlcswitch: attributable errors lightningnetwork/lnd#7139

Open

joostjager commented Dec 8, 2022

View reviewed changes

04-onion-routing.md Outdated Show resolved Hide resolved

joostjager force-pushed the fat-errors branch from 2de919a to bcf022b Compare December 12, 2022 07:34

t-bast mentioned this pull request Dec 15, 2022

Lightning Specification Meeting 2022/12/19 #1047

Closed

27 tasks

t-bast mentioned this pull request Dec 30, 2022

Lightning Specification Meeting 2023/01/02 #1048

Closed

27 tasks

t-bast mentioned this pull request Jan 11, 2023

Lightning Specification Meeting 2023/01/16 #1050

Closed

28 tasks

joostjager commented Jan 13, 2023

View reviewed changes

04-onion-routing.md Outdated Show resolved Hide resolved

joostjager commented Jan 23, 2023

View reviewed changes

04-onion-routing.md Outdated Show resolved Hide resolved

joostjager force-pushed the fat-errors branch from bcf022b to 6bf0729 Compare January 24, 2023 13:55

GeorgeTsagk approved these changes May 20, 2025

View reviewed changes

thomash-acinq added a commit to ACINQ/eclair that referenced this pull request May 20, 2025

Attributable failures (#3065)

055695f

Implements lightning/bolts#1044

joostjager mentioned this pull request May 27, 2025

Success hold times #1261

Closed

t-bast mentioned this pull request May 28, 2025

Lightning Specification Meeting 2025/06/02 #1262

Closed

26 tasks

tnull mentioned this pull request May 30, 2025

Add recommendations for receiver-side random delays #1263

Open

fixup: add comment about attribution data on blinded paths

f105212

carlaKC reviewed Jun 2, 2025

View reviewed changes

GeorgeTsagk mentioned this pull request Jun 2, 2025

Attributable failures lightningnetwork/lnd#9888

Open

2 tasks

joostjager added 5 commits June 4, 2025 12:09

fixup: wrap lines

ca3b674

fixup: address comments and improve structure

15f646f

Hold times for fulfilled htlcs

212e758

fixup: rename feature name

a81747f

fixup: clarify no attribution data for success inside the blinded path

a2b3507

joostjager changed the title ~~Attributable failures (feature 36/37)~~ Attributable data (feature 36/37) Jun 5, 2025

joostjager changed the title ~~Attributable data (feature 36/37)~~ Attribution data (feature 36/37) Jun 5, 2025

thomash-acinq mentioned this pull request Jun 5, 2025

Add attribution data to UpdateFulfillHtlc ACINQ/eclair#3100

Merged

brh28 reviewed Jun 6, 2025

View reviewed changes

t-bast mentioned this pull request Jun 11, 2025

Lightning Specification Meeting 2025/06/16 #1267

Open

25 tasks

thomash-acinq added a commit to ACINQ/eclair that referenced this pull request Jun 11, 2025

Add attribution data to UpdateFulfillHtlc (#3100)

100e174

Attribution data is added to both failed and fulfilled HTLCs lightning/bolts#1044

t-bast mentioned this pull request Jun 17, 2025

Increase channel close delay to 72 blocks #1270

Open

specify 100 ms resolution for hold times

bb184bb

thomash-acinq mentioned this pull request Jun 17, 2025

Round hold times to decaseconds ACINQ/eclair#3112

Merged

joostjager mentioned this pull request Jun 17, 2025

Reduce attribution data hold time resolution to 100 ms lightningdevkit/rust-lightning#3868

Merged

Attribution data (feature 36/37) #1044

Are you sure you want to change the base?

Attribution data (feature 36/37) #1044

Uh oh!

Conversation

joostjager commented Nov 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fail flow implementations

Fulfill flow implementations

Uh oh!

Uh oh!

thomash-acinq commented Dec 6, 2022

Uh oh!

joostjager commented Dec 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joostjager commented Dec 6, 2022

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joostjager commented Dec 7, 2022

Uh oh!

Uh oh!

thomash-acinq commented Dec 9, 2022

Uh oh!

joostjager commented Dec 9, 2022

Uh oh!

joostjager commented Dec 12, 2022

Uh oh!

Uh oh!

joostjager commented Jan 23, 2023

Uh oh!

Uh oh!

DerEwige commented May 20, 2025

Uh oh!

t-bast commented May 20, 2025

Uh oh!

DerEwige commented May 20, 2025

Uh oh!

carlaKC commented May 23, 2025

Uh oh!

joostjager commented May 27, 2025

Uh oh!

carlaKC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joostjager Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brh28 left a comment

Choose a reason for hiding this comment

Uh oh!

Roasbeef commented Jun 6, 2025

Uh oh!

Uh oh!

joostjager commented Nov 21, 2022 •

edited

Loading

joostjager commented Dec 6, 2022 •

edited

Loading

joostjager Jun 3, 2025 •

edited

Loading