|
| 1 | +# AutonatV2: spec |
| 2 | + |
| 3 | +| Lifecycle Stage | Maturity | Status | Latest Revision | |
| 4 | +| --------------- | ------------- | ------ | --------------- | |
| 5 | +| 1A | Working Draft | Active | r2, 2023-04-15 | |
| 6 | + |
| 7 | +Authors: [@sukunrt] |
| 8 | + |
| 9 | +Interest Group: [@marten-seemann], [@marcopolo], [@mxinden] |
| 10 | + |
| 11 | +[@sukunrt]: https://github.com/sukunrt |
| 12 | +[@marten-seemann]: https://github.com/marten-seemann |
| 13 | +[@mxinden]: https://github.com/mxinden |
| 14 | +[@marcopolo]: https://github.com/marcopolo |
| 15 | + |
| 16 | +## Overview |
| 17 | + |
| 18 | +A priori, a node cannot know if it is behind a NAT / firewall or if it is |
| 19 | +publicly reachable. Moreover, the node may be publicly reachable on some of its |
| 20 | +addresses and not on others. Knowing the reachability status of its addresses |
| 21 | +is crucial for proper network behavior: the node can avoid advertising |
| 22 | +unreachable addresses, reducing unnecessary connection attempts from other |
| 23 | +peers. If the node has no publicly accessible addresses, it may proactively |
| 24 | +improve its connectivity by locating a relay server, enabling other peers to |
| 25 | +connect through a relayed connection. |
| 26 | + |
| 27 | +In `autonat v2` client sends a request with a priority ordered list of addresses |
| 28 | +and a nonce. On receiving this request the server dials the first address in the |
| 29 | +list that it is capable of dialing and provides the nonce. Upon completion of |
| 30 | +the dial, the server responds to the client with the response containing the |
| 31 | +dial outcome. |
| 32 | + |
| 33 | +As the server dials _exactly_ one address from the list, `autonat v2` allows |
| 34 | +nodes to determine reachability for individual addresses. Using `autonat v2` |
| 35 | +nodes can build an address pipeline where they can test individual addresses |
| 36 | +discovered by different sources like identify, upnp mappings, circuit addresses |
| 37 | +etc for reachability. Having a priority ordered list of addresses provides the |
| 38 | +ability to verify low priority addresses. Implementations can generate low |
| 39 | +priority address guesses and add them to requests for high priority addresses as |
| 40 | +a nice to have. This is especially helpful when introducing a new transport. |
| 41 | +Initially, such a transport will not be widely supported in the network. |
| 42 | +Requests for verifying such addresses can be reused to get information about |
| 43 | +other addresses |
| 44 | + |
| 45 | +The client can verify the server did successfully dial an address of the same |
| 46 | +transport as it reported in the response by checking the local address of the |
| 47 | +connection on which the nonce was received on. |
| 48 | + |
| 49 | +Compared to `autonat v1` there are three major differences |
| 50 | + |
| 51 | +1. `autonat v1` allowed testing reachability for the node. `autonat v2` allows |
| 52 | + testing reachability for an individual address. |
| 53 | +2. `autonat v2` provides a mechanism for nodes to verify whether the peer |
| 54 | + actually successfully dialled an address. |
| 55 | +3. `autonat v2` provides a mechanism for nodes to dial an IP address different |
| 56 | + from the requesting node's observed IP address without risking amplification |
| 57 | + attacks. `autonat v1` disallowed such dials to prevent amplification attacks. |
| 58 | + |
| 59 | +## AutoNAT V2 Protocol |
| 60 | + |
| 61 | + |
| 62 | + |
| 63 | +A client node wishing to determine reachability of its addresses sends a |
| 64 | +`DialRequest` message to a server on a stream with protocol ID |
| 65 | +`/libp2p/autonat/2/dial-request`. Each `DialRequest` is sent on a new stream. |
| 66 | + |
| 67 | +This `DialRequest` message has a list of addresses and a fixed64 `nonce`. The |
| 68 | +list is ordered in descending order of priority for verification. AutoNAT V2 is |
| 69 | +primarily for testing reachability on Public Internet. Client SHOULD NOT send any |
| 70 | +private address as defined in [RFC |
| 71 | +1918](https://datatracker.ietf.org/doc/html/rfc1918#section-3) in the list. The Server SHOULD NOT dial any private address. |
| 72 | + |
| 73 | +Upon receiving this request, the server selects an address from the list to |
| 74 | +dial. The server SHOULD use the first address it is willing to dial. The server |
| 75 | +MUST NOT dial any address other than this one. If this selected address has an |
| 76 | +IP address different from the requesting node's observed IP address, server |
| 77 | +initiates the Amplification attack prevention mechanism (see [Amplification |
| 78 | +Attack Prevention](#amplification-attack-prevention) ). On completion, the |
| 79 | +server proceeds to the next step. If the selected address has the same IP |
| 80 | +address as the client's observed IP address, server proceeds to the next step |
| 81 | +skipping Amplification Attack Prevention steps. |
| 82 | + |
| 83 | +The server dials the selected address, opens a stream with Protocol ID |
| 84 | +`/libp2p/autonat/2/dial-back` and sends a `DialBack` message with the nonce |
| 85 | +received in the request. The client on receiving this message replies with |
| 86 | +a `DialBackResponse` message with the status set to `OK`. The client MUST |
| 87 | +close this stream after sending the response. The dial back response provides |
| 88 | +the server assurance that the message was delivered so that it can close the |
| 89 | +connection. |
| 90 | + |
| 91 | +Upon completion of the dial back, the server sends a `DialResponse` message to |
| 92 | +the client node on the `/libp2p/autonat/2/dial-request` stream. The response |
| 93 | +contains `addrIdx`, the index of the address the server selected to dial and |
| 94 | +`DialStatus`, a dial status indicating the outcome of the dial back. The |
| 95 | +`DialStatus` for an address is set according to [Requirements for |
| 96 | +DialStatus](#requirements-for-dialstatus). The response also contains an |
| 97 | +appropriate `ResponseStatus` set according to [Requirements For |
| 98 | +ResponseStatus](#requirements-for-responsestatus). |
| 99 | + |
| 100 | +The client MUST check that the nonce received in the `DialBack` is the same as |
| 101 | +the nonce it sent in the `DialRequest`. If the nonce is different, it MUST |
| 102 | +discard this response. |
| 103 | + |
| 104 | +The server MUST close the stream after sending the response. The client MUST |
| 105 | +close the stream after receiving the response. |
| 106 | + |
| 107 | +### Requirements for DialStatus |
| 108 | + |
| 109 | +On receiving a `DialRequest`, the server first selects an address that it will |
| 110 | +dial. |
| 111 | + |
| 112 | +If server chooses to not dial any of the requested addresses, `ResponseStatus` |
| 113 | +is set to `E_DIAL_REFUSED`. The fields `addrIdx` and `DialStatus` are |
| 114 | +meaningless in this case. See [Requirements For |
| 115 | +ResponseStatus](#requirements-for-responsestatus). |
| 116 | + |
| 117 | +If the server selects an address for dialing, `addrIdx` is set to the |
| 118 | +index(zero-based) of the address on the list and the `DialStatus` is set |
| 119 | +according to the following consideration: |
| 120 | + |
| 121 | +If the server was unable to connect to the client on the selected address, |
| 122 | +`DialStatus` is set to `E_DIAL_ERROR`, indicating the selected address is not |
| 123 | +publicly reachable. |
| 124 | + |
| 125 | +If the server was able to connect to the client on the selected address, but an |
| 126 | +error occured while sending an nonce on the `/libp2p/autonat/2/dial-back` |
| 127 | +stream, `DialStatus` is set to `E_DIAL_BACK_ERROR`. This might happen in case of |
| 128 | +resource limited situations on client or server, or when either the client or |
| 129 | +the server is misconfigured. |
| 130 | + |
| 131 | +If the server was able to connect to the client and successfully send a nonce on |
| 132 | +the `/libp2p/autonat/2/dial-back` stream, `DialStatus` is set to `OK`. |
| 133 | + |
| 134 | +### Requirements for ResponseStatus |
| 135 | + |
| 136 | +The `ResponseStatus` sent by the server in the `DialResponse` message MUST be |
| 137 | +set according to the following requirements |
| 138 | + |
| 139 | +`E_REQUEST_REJECTED`: The server didn't serve the request because of rate |
| 140 | +limiting, resource limit reached or blacklisting. |
| 141 | + |
| 142 | +`E_DIAL_REFUSED`: The server didn't dial back any address because it was |
| 143 | +incapable of dialing or unwilling to dial any of the requested addresses. |
| 144 | + |
| 145 | +`E_INTERNAL_ERROR`: Error not classified within the above error codes occured on |
| 146 | +server preventing it from completing the request. |
| 147 | + |
| 148 | +`OK`: The server completed the request successfully. A request is considered |
| 149 | +a success when the server selects an address to dial and dials it, successfully or unsuccessfully. |
| 150 | + |
| 151 | +Implementations MUST discard responses with status codes they do not understand. |
| 152 | + |
| 153 | +### Amplification Attack Prevention |
| 154 | + |
| 155 | + |
| 156 | + |
| 157 | +When a client asks a server to dial an address that is not the client's observed |
| 158 | +IP address, the server asks the client to send some non trivial amount of bytes |
| 159 | +as a cost to dial a different IP address. To make amplification attacks |
| 160 | +unattractive, servers SHOULD ask for 30k to 100k bytes. Since most handshakes |
| 161 | +cost less than 10k bytes in bandwidth, 30kB is sufficient to make attacks |
| 162 | +unattractive. |
| 163 | + |
| 164 | +On receiving a `DialRequest`, the server selects the first address it is capable |
| 165 | +of dialing. If this selected address has a IP different from the client's |
| 166 | +observed IP, the server sends a `DialDataRequest` message with the selected |
| 167 | +address's index(zero-based) and `numBytes` set to a sufficiently large value on |
| 168 | +the `/libp2p/autonat/2/dial-request` stream |
| 169 | + |
| 170 | +Upon receiving a `DialDataRequest` message, the client decides whether to accept |
| 171 | +or reject the cost of dial. If the client rejects the cost, the client resets |
| 172 | +the stream and the `DialRequest` is considered aborted. If the client accepts |
| 173 | +the cost, the client starts transferring `numBytes` bytes to the server. The |
| 174 | +client transfers these bytes wrapped in `DialDataResponse` protobufs where the |
| 175 | +`data` field in each individual protobuf is limited to 4096 bytes in length. |
| 176 | +This allows implementations to use a small buffer for reading and sending the |
| 177 | +data. Only the size of the `data` field of `DialDataResponse` protobufs is |
| 178 | +counted towards the bytes transferred. Once the server has received at least |
| 179 | +numBytes bytes, it proceeds to dial the selected address. Servers SHOULD allow |
| 180 | +the last `DialDataResponse` message received from the client to be larger than |
| 181 | +the minimum required amount. This allows clients to serialize their |
| 182 | +`DialDataResponse` message once and reuse it for all Requests. |
| 183 | + |
| 184 | +If an attacker asks a server to dial a victim node, the only benefit the |
| 185 | +attacker gets is forcing the server and the victim to do a cryptographic |
| 186 | +handshake which costs some bandwidth and compute. The attacker by itself can do |
| 187 | +a lot of handshakes with the victim without spending any compute by using the |
| 188 | +same key repeatedly. The only benefit of going via the server to do this attack |
| 189 | +is not spending bandwidth required for a handshake. So the prevention mechanism |
| 190 | +only focuses on bandwidth costs. There is a minor benefit of bypassing IP |
| 191 | +blocklists, but that's made unattractive by the fact that servers may ask 5x |
| 192 | +more data than the bandwidth cost of a handshake. |
| 193 | + |
| 194 | +#### Related Work |
| 195 | + |
| 196 | +UDP based protocol's, like QUIC and DNS-over-UDP, need to prevent similar amplification attacks caused by IP spoofing. To verify that received packets don't have a spoofed IP, the server sends a random token to the client, which echoes the token back. For example, in QUIC, an attacker can use the victim's IP in the initial packet to make it process a much larger `ServerHello` packet. QUIC servers use a Retry Packet containing a token to validate that the client can receive packets at the address it claims. See [QUIC Address Validation](https://datatracker.ietf.org/doc/html/rfc9000#name-address-validation) for details of the scheme. |
| 197 | + |
| 198 | +## Implementation Suggestions |
| 199 | + |
| 200 | +For any given address, client implementations SHOULD do the following |
| 201 | + |
| 202 | +- Periodically recheck reachability status. |
| 203 | +- Query multiple servers to determine reachability. |
| 204 | + |
| 205 | +The suggested heuristic for implementations is to consider an address reachable |
| 206 | +if more than 3 servers report a successful dial and to consider an address |
| 207 | +unreachable if more than 3 servers report unsuccessful dials. Implementations |
| 208 | +are free to use different heuristics than this one |
| 209 | + |
| 210 | +Servers SHOULD NOT reuse their listening port when making a dial back. In case |
| 211 | +the client has reused their listen port when dialing out to the server, not |
| 212 | +reusing the listen port for attempts prevents accidental hole punches. Clients |
| 213 | +SHOULD only rely on the nonce and not on the peerID for verifying the dial back |
| 214 | +as the server is free to use a separate peerID for the dial backs. |
| 215 | + |
| 216 | +Servers SHOULD determine whether they have IPv6 and IPv4 connectivity. IPv4 only servers SHOULD refuse requests for dialing IPv6 addresses and IPv6 only |
| 217 | +servers SHOULD refuse requests for dialing IPv4 addresses. |
| 218 | + |
| 219 | +## RPC Messages |
| 220 | + |
| 221 | +All RPC messages sent over a stream are prefixed with the message length in |
| 222 | +bytes, encoded as an unsigned variable length integer as defined by the |
| 223 | +[multiformats unsigned-varint spec][uvarint-spec]. |
| 224 | + |
| 225 | +All RPC messages on stream `/libp2p/autonat/2/dial-request` are of type |
| 226 | +`Message`. A `DialRequest` message is sent as a `Message` with the `msg` field |
| 227 | +set to `DialRequest`. `DialResponse` and `DialDataRequest` are handled |
| 228 | +similarly. |
| 229 | + |
| 230 | +On stream `/libp2p/autonat/2/dial-back`, a `DialAttempt` message is sent |
| 231 | +directly |
| 232 | + |
| 233 | +```proto3 |
| 234 | +
|
| 235 | +message Message { |
| 236 | + oneof msg { |
| 237 | + DialRequest dialRequest = 1; |
| 238 | + DialResponse dialResponse = 2; |
| 239 | + DialDataRequest dialDataRequest = 3; |
| 240 | + DialDataResponse dialDataResponse = 4; |
| 241 | + } |
| 242 | +} |
| 243 | +
|
| 244 | +
|
| 245 | +message DialRequest { |
| 246 | + repeated bytes addrs = 1; |
| 247 | + fixed64 nonce = 2; |
| 248 | +} |
| 249 | +
|
| 250 | +
|
| 251 | +message DialDataRequest { |
| 252 | + uint32 addrIdx = 1; |
| 253 | + uint64 numBytes = 2; |
| 254 | +} |
| 255 | +
|
| 256 | +
|
| 257 | +enum DialStatus { |
| 258 | + UNUSED = 0; |
| 259 | + E_DIAL_ERROR = 100; |
| 260 | + E_DIAL_BACK_ERROR = 101; |
| 261 | + OK = 200; |
| 262 | +} |
| 263 | +
|
| 264 | +
|
| 265 | +message DialResponse { |
| 266 | + enum ResponseStatus { |
| 267 | + E_INTERNAL_ERROR = 0; |
| 268 | + E_REQUEST_REJECTED = 100; |
| 269 | + E_DIAL_REFUSED = 101; |
| 270 | + OK = 200; |
| 271 | + } |
| 272 | +
|
| 273 | + ResponseStatus status = 1; |
| 274 | + uint32 addrIdx = 2; |
| 275 | + DialStatus dialStatus = 3; |
| 276 | +} |
| 277 | +
|
| 278 | +
|
| 279 | +message DialDataResponse { |
| 280 | + bytes data = 1; |
| 281 | +} |
| 282 | +
|
| 283 | +
|
| 284 | +message DialBack { |
| 285 | + fixed64 nonce = 1; |
| 286 | +} |
| 287 | +
|
| 288 | +message DialBackResponse { |
| 289 | + enum DialBackStatus { |
| 290 | + OK = 0; |
| 291 | + } |
| 292 | +
|
| 293 | + DialBackStatus status = 1; |
| 294 | +} |
| 295 | +``` |
| 296 | + |
| 297 | +[uvarint-spec]: https://github.com/multiformats/unsigned-varint |
0 commit comments