Skip to content

Commit b63cdeb

Browse files
sukunrtMarcoPolo
andcommitted
add autonat v2 spec (#538)
--------- Co-authored-by: Marco Munizaga <[email protected]>
1 parent 68199b9 commit b63cdeb

7 files changed

+343
-3
lines changed

autonat/README.md

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# NAT Discovery <!-- omit in toc -->
2+
> How we detect if we're behind a NAT.
3+
4+
5+
Specifications:
6+
- [autonat v1](autonat-v1.md)
7+
- [autonat v2](autonat-v2.md)

autonat/autonat-v1.md

-3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,3 @@
1-
# NAT Discovery <!-- omit in toc -->
2-
> How we detect if we're behind a NAT.
3-
41
| Lifecycle Stage | Maturity | Status | Latest Revision |
52
|-----------------|----------------|--------|-----------------|
63
| 3A | Recommendation | Active | r1, 2023-02-16 |
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
@startuml
2+
participant Cli
3+
participant Srv
4+
5+
skinparam sequenceMessageAlign center
6+
skinparam defaultFontName monospaced
7+
8+
9+
== Amplification Attack Prevention ==
10+
11+
Cli -> Srv: [conn1: stream: dial] DialRequest:{nonce: 0xabcd, addrs: (addr1, addr2, addr3)}
12+
Srv -> Cli: [conn1: stream: dial] DialDataRequest:{addrIdx: 1, numBytes: 120k}
13+
Cli -> Srv: [conn1: stream: dial] DialDataResponse:{data: 4k bytes},DialDataResponse:{data: 4k bytes},...
14+
Srv -> Cli: [conn2: stream: dial-back]addr2 DialBack:{nonce: 0xabcd}
15+
Srv -> Cli: [conn1: stream: dial] DialResponse:{status: OK, addrIdx: 1, dialStatus: DIAL_STATUS_OK}
16+
@enduml
Loading

autonat/autonat-v2.md

+297
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,297 @@
1+
# AutonatV2: spec
2+
3+
| Lifecycle Stage | Maturity | Status | Latest Revision |
4+
| --------------- | ------------- | ------ | --------------- |
5+
| 1A | Working Draft | Active | r2, 2023-04-15 |
6+
7+
Authors: [@sukunrt]
8+
9+
Interest Group: [@marten-seemann], [@marcopolo], [@mxinden]
10+
11+
[@sukunrt]: https://github.com/sukunrt
12+
[@marten-seemann]: https://github.com/marten-seemann
13+
[@mxinden]: https://github.com/mxinden
14+
[@marcopolo]: https://github.com/marcopolo
15+
16+
## Overview
17+
18+
A priori, a node cannot know if it is behind a NAT / firewall or if it is
19+
publicly reachable. Moreover, the node may be publicly reachable on some of its
20+
addresses and not on others. Knowing the reachability status of its addresses
21+
is crucial for proper network behavior: the node can avoid advertising
22+
unreachable addresses, reducing unnecessary connection attempts from other
23+
peers. If the node has no publicly accessible addresses, it may proactively
24+
improve its connectivity by locating a relay server, enabling other peers to
25+
connect through a relayed connection.
26+
27+
In `autonat v2` client sends a request with a priority ordered list of addresses
28+
and a nonce. On receiving this request the server dials the first address in the
29+
list that it is capable of dialing and provides the nonce. Upon completion of
30+
the dial, the server responds to the client with the response containing the
31+
dial outcome.
32+
33+
As the server dials _exactly_ one address from the list, `autonat v2` allows
34+
nodes to determine reachability for individual addresses. Using `autonat v2`
35+
nodes can build an address pipeline where they can test individual addresses
36+
discovered by different sources like identify, upnp mappings, circuit addresses
37+
etc for reachability. Having a priority ordered list of addresses provides the
38+
ability to verify low priority addresses. Implementations can generate low
39+
priority address guesses and add them to requests for high priority addresses as
40+
a nice to have. This is especially helpful when introducing a new transport.
41+
Initially, such a transport will not be widely supported in the network.
42+
Requests for verifying such addresses can be reused to get information about
43+
other addresses
44+
45+
The client can verify the server did successfully dial an address of the same
46+
transport as it reported in the response by checking the local address of the
47+
connection on which the nonce was received on.
48+
49+
Compared to `autonat v1` there are three major differences
50+
51+
1. `autonat v1` allowed testing reachability for the node. `autonat v2` allows
52+
testing reachability for an individual address.
53+
2. `autonat v2` provides a mechanism for nodes to verify whether the peer
54+
actually successfully dialled an address.
55+
3. `autonat v2` provides a mechanism for nodes to dial an IP address different
56+
from the requesting node's observed IP address without risking amplification
57+
attacks. `autonat v1` disallowed such dials to prevent amplification attacks.
58+
59+
## AutoNAT V2 Protocol
60+
61+
![Autonat V2 Interaction](autonat-v2.svg)
62+
63+
A client node wishing to determine reachability of its addresses sends a
64+
`DialRequest` message to a server on a stream with protocol ID
65+
`/libp2p/autonat/2/dial-request`. Each `DialRequest` is sent on a new stream.
66+
67+
This `DialRequest` message has a list of addresses and a fixed64 `nonce`. The
68+
list is ordered in descending order of priority for verification. AutoNAT V2 is
69+
primarily for testing reachability on Public Internet. Client SHOULD NOT send any
70+
private address as defined in [RFC
71+
1918](https://datatracker.ietf.org/doc/html/rfc1918#section-3) in the list. The Server SHOULD NOT dial any private address.
72+
73+
Upon receiving this request, the server selects an address from the list to
74+
dial. The server SHOULD use the first address it is willing to dial. The server
75+
MUST NOT dial any address other than this one. If this selected address has an
76+
IP address different from the requesting node's observed IP address, server
77+
initiates the Amplification attack prevention mechanism (see [Amplification
78+
Attack Prevention](#amplification-attack-prevention) ). On completion, the
79+
server proceeds to the next step. If the selected address has the same IP
80+
address as the client's observed IP address, server proceeds to the next step
81+
skipping Amplification Attack Prevention steps.
82+
83+
The server dials the selected address, opens a stream with Protocol ID
84+
`/libp2p/autonat/2/dial-back` and sends a `DialBack` message with the nonce
85+
received in the request. The client on receiving this message replies with
86+
a `DialBackResponse` message with the status set to `OK`. The client MUST
87+
close this stream after sending the response. The dial back response provides
88+
the server assurance that the message was delivered so that it can close the
89+
connection.
90+
91+
Upon completion of the dial back, the server sends a `DialResponse` message to
92+
the client node on the `/libp2p/autonat/2/dial-request` stream. The response
93+
contains `addrIdx`, the index of the address the server selected to dial and
94+
`DialStatus`, a dial status indicating the outcome of the dial back. The
95+
`DialStatus` for an address is set according to [Requirements for
96+
DialStatus](#requirements-for-dialstatus). The response also contains an
97+
appropriate `ResponseStatus` set according to [Requirements For
98+
ResponseStatus](#requirements-for-responsestatus).
99+
100+
The client MUST check that the nonce received in the `DialBack` is the same as
101+
the nonce it sent in the `DialRequest`. If the nonce is different, it MUST
102+
discard this response.
103+
104+
The server MUST close the stream after sending the response. The client MUST
105+
close the stream after receiving the response.
106+
107+
### Requirements for DialStatus
108+
109+
On receiving a `DialRequest`, the server first selects an address that it will
110+
dial.
111+
112+
If server chooses to not dial any of the requested addresses, `ResponseStatus`
113+
is set to `E_DIAL_REFUSED`. The fields `addrIdx` and `DialStatus` are
114+
meaningless in this case. See [Requirements For
115+
ResponseStatus](#requirements-for-responsestatus).
116+
117+
If the server selects an address for dialing, `addrIdx` is set to the
118+
index(zero-based) of the address on the list and the `DialStatus` is set
119+
according to the following consideration:
120+
121+
If the server was unable to connect to the client on the selected address,
122+
`DialStatus` is set to `E_DIAL_ERROR`, indicating the selected address is not
123+
publicly reachable.
124+
125+
If the server was able to connect to the client on the selected address, but an
126+
error occured while sending an nonce on the `/libp2p/autonat/2/dial-back`
127+
stream, `DialStatus` is set to `E_DIAL_BACK_ERROR`. This might happen in case of
128+
resource limited situations on client or server, or when either the client or
129+
the server is misconfigured.
130+
131+
If the server was able to connect to the client and successfully send a nonce on
132+
the `/libp2p/autonat/2/dial-back` stream, `DialStatus` is set to `OK`.
133+
134+
### Requirements for ResponseStatus
135+
136+
The `ResponseStatus` sent by the server in the `DialResponse` message MUST be
137+
set according to the following requirements
138+
139+
`E_REQUEST_REJECTED`: The server didn't serve the request because of rate
140+
limiting, resource limit reached or blacklisting.
141+
142+
`E_DIAL_REFUSED`: The server didn't dial back any address because it was
143+
incapable of dialing or unwilling to dial any of the requested addresses.
144+
145+
`E_INTERNAL_ERROR`: Error not classified within the above error codes occured on
146+
server preventing it from completing the request.
147+
148+
`OK`: The server completed the request successfully. A request is considered
149+
a success when the server selects an address to dial and dials it, successfully or unsuccessfully.
150+
151+
Implementations MUST discard responses with status codes they do not understand.
152+
153+
### Amplification Attack Prevention
154+
155+
![Interaction](autonat-v2-amplification-attack-prevention.svg)
156+
157+
When a client asks a server to dial an address that is not the client's observed
158+
IP address, the server asks the client to send some non trivial amount of bytes
159+
as a cost to dial a different IP address. To make amplification attacks
160+
unattractive, servers SHOULD ask for 30k to 100k bytes. Since most handshakes
161+
cost less than 10k bytes in bandwidth, 30kB is sufficient to make attacks
162+
unattractive.
163+
164+
On receiving a `DialRequest`, the server selects the first address it is capable
165+
of dialing. If this selected address has a IP different from the client's
166+
observed IP, the server sends a `DialDataRequest` message with the selected
167+
address's index(zero-based) and `numBytes` set to a sufficiently large value on
168+
the `/libp2p/autonat/2/dial-request` stream
169+
170+
Upon receiving a `DialDataRequest` message, the client decides whether to accept
171+
or reject the cost of dial. If the client rejects the cost, the client resets
172+
the stream and the `DialRequest` is considered aborted. If the client accepts
173+
the cost, the client starts transferring `numBytes` bytes to the server. The
174+
client transfers these bytes wrapped in `DialDataResponse` protobufs where the
175+
`data` field in each individual protobuf is limited to 4096 bytes in length.
176+
This allows implementations to use a small buffer for reading and sending the
177+
data. Only the size of the `data` field of `DialDataResponse` protobufs is
178+
counted towards the bytes transferred. Once the server has received at least
179+
numBytes bytes, it proceeds to dial the selected address. Servers SHOULD allow
180+
the last `DialDataResponse` message received from the client to be larger than
181+
the minimum required amount. This allows clients to serialize their
182+
`DialDataResponse` message once and reuse it for all Requests.
183+
184+
If an attacker asks a server to dial a victim node, the only benefit the
185+
attacker gets is forcing the server and the victim to do a cryptographic
186+
handshake which costs some bandwidth and compute. The attacker by itself can do
187+
a lot of handshakes with the victim without spending any compute by using the
188+
same key repeatedly. The only benefit of going via the server to do this attack
189+
is not spending bandwidth required for a handshake. So the prevention mechanism
190+
only focuses on bandwidth costs. There is a minor benefit of bypassing IP
191+
blocklists, but that's made unattractive by the fact that servers may ask 5x
192+
more data than the bandwidth cost of a handshake.
193+
194+
#### Related Work
195+
196+
UDP based protocol's, like QUIC and DNS-over-UDP, need to prevent similar amplification attacks caused by IP spoofing. To verify that received packets don't have a spoofed IP, the server sends a random token to the client, which echoes the token back. For example, in QUIC, an attacker can use the victim's IP in the initial packet to make it process a much larger `ServerHello` packet. QUIC servers use a Retry Packet containing a token to validate that the client can receive packets at the address it claims. See [QUIC Address Validation](https://datatracker.ietf.org/doc/html/rfc9000#name-address-validation) for details of the scheme.
197+
198+
## Implementation Suggestions
199+
200+
For any given address, client implementations SHOULD do the following
201+
202+
- Periodically recheck reachability status.
203+
- Query multiple servers to determine reachability.
204+
205+
The suggested heuristic for implementations is to consider an address reachable
206+
if more than 3 servers report a successful dial and to consider an address
207+
unreachable if more than 3 servers report unsuccessful dials. Implementations
208+
are free to use different heuristics than this one
209+
210+
Servers SHOULD NOT reuse their listening port when making a dial back. In case
211+
the client has reused their listen port when dialing out to the server, not
212+
reusing the listen port for attempts prevents accidental hole punches. Clients
213+
SHOULD only rely on the nonce and not on the peerID for verifying the dial back
214+
as the server is free to use a separate peerID for the dial backs.
215+
216+
Servers SHOULD determine whether they have IPv6 and IPv4 connectivity. IPv4 only servers SHOULD refuse requests for dialing IPv6 addresses and IPv6 only
217+
servers SHOULD refuse requests for dialing IPv4 addresses.
218+
219+
## RPC Messages
220+
221+
All RPC messages sent over a stream are prefixed with the message length in
222+
bytes, encoded as an unsigned variable length integer as defined by the
223+
[multiformats unsigned-varint spec][uvarint-spec].
224+
225+
All RPC messages on stream `/libp2p/autonat/2/dial-request` are of type
226+
`Message`. A `DialRequest` message is sent as a `Message` with the `msg` field
227+
set to `DialRequest`. `DialResponse` and `DialDataRequest` are handled
228+
similarly.
229+
230+
On stream `/libp2p/autonat/2/dial-back`, a `DialAttempt` message is sent
231+
directly
232+
233+
```proto3
234+
235+
message Message {
236+
oneof msg {
237+
DialRequest dialRequest = 1;
238+
DialResponse dialResponse = 2;
239+
DialDataRequest dialDataRequest = 3;
240+
DialDataResponse dialDataResponse = 4;
241+
}
242+
}
243+
244+
245+
message DialRequest {
246+
repeated bytes addrs = 1;
247+
fixed64 nonce = 2;
248+
}
249+
250+
251+
message DialDataRequest {
252+
uint32 addrIdx = 1;
253+
uint64 numBytes = 2;
254+
}
255+
256+
257+
enum DialStatus {
258+
UNUSED = 0;
259+
E_DIAL_ERROR = 100;
260+
E_DIAL_BACK_ERROR = 101;
261+
OK = 200;
262+
}
263+
264+
265+
message DialResponse {
266+
enum ResponseStatus {
267+
E_INTERNAL_ERROR = 0;
268+
E_REQUEST_REJECTED = 100;
269+
E_DIAL_REFUSED = 101;
270+
OK = 200;
271+
}
272+
273+
ResponseStatus status = 1;
274+
uint32 addrIdx = 2;
275+
DialStatus dialStatus = 3;
276+
}
277+
278+
279+
message DialDataResponse {
280+
bytes data = 1;
281+
}
282+
283+
284+
message DialBack {
285+
fixed64 nonce = 1;
286+
}
287+
288+
message DialBackResponse {
289+
enum DialBackStatus {
290+
OK = 0;
291+
}
292+
293+
DialBackStatus status = 1;
294+
}
295+
```
296+
297+
[uvarint-spec]: https://github.com/multiformats/unsigned-varint

0 commit comments

Comments
 (0)