discv5: NAT traversal via Rendezvous protocol [WIP] #207

pipermerriam · 2022-05-05T18:50:02Z

This issue proposes a mechanism for NAT traversal via UDP hole punching.

This issue borrows from ethereum/portal-network-specs#144 which in-turn borrows from https://blog.ipfs.io/2022-01-20-libp2p-hole-punching/

Participants

This mechanism involves communication between three nodes:

Initiator: a node that is behind a NAT, trying to establish a session with the "receiver" node
Receiver: a node that is behind a NAT
Rendezvous: a node that is able to communicate with both "initiator" and "receiver"

Detecting whether you are behind a NAT

Borrowed from: https://twurst.com/articles/stun-without-trust.html#org92b7214

A node in the network should maintain a set E which contains all of the (ip_address, port) values for outbound packets that have been sent by this node.

When receiving a packet, a node should check whether the packet's (ip_address, port) are contained in the set E.

If a node receives a packet such that the (ip_address, port) are not in E then the node is not behind a NAT
If a node does not receive any packets with (ip_address, port) values that are not in E within a reasonable amount of time, then the node should assume that they are behind a NAT.

We suggest 2 minutes as a reasonable amount of time before determining that the node is behind a NAT.

For practical purposes, an LRU cache should be used to constrain the overall size of the set E

Signalling whether you are behind a NAT

We define a new field in the ENR with the key "nat".

If the node does not know whether it is behind a NAT, this key should be omitted from the ENR
If the node is not behind a NAT, the value of this key should be set to 0
If the node wishes to signal that it is behind a NAT, the value of this key should be set to 1

Traversing the NAT

We define two new message types:

RELAYREQUEST
RELAYRESPONSE

# RELAYREQUEST
relay_request := SSZContainer(from_node_id: uint256, to_node_id: uint256)

# RELAYRESPONSE
relay_response := SSZContainer(response: uint8)

The rendezvous protocol works as follows:

The "initiator" node learns about the "receiver" node through a FINDNODES/FOUNDNODES interaction with the "rendezvous" node.
The "initiator sends a RELAYREQUEST to the "rendevous" node with payload: {from_node_enr: initiator_enr, to_node_id: receiver_node_id}
The "rendezvous" node, upon receiving the RELAYREQUEST from the "initiator" node, sends the same RELAYREQUEST message to the "receiver" node.
The "receiver" node, upon receiving the RELAYREQUEST from the "rendezvous" node, responds with a RELAYRESPONSE with the payload {response: 1} to signal that they have accepted this request. They may alternately respond with {response: 0} if they wish to reject the request. The "receiver" node will also send a PING message to the "initiator" node (this triggers the receiver's NAT to allow and route incoming packets from the initiator's ip/port).
The "rendezvous" node, upon receiving the RELAYRESPONSE from the "receiver" node, accepting the request, will then send the same RELAYRESPONSE message to the "initiator".
The "initiator" node, upon receiving the RELAYRESPONSE accepting the connection, should then send a PING message to the "receiver" node. (this triggers the initiator's NAT to allow and route incoming pckets from the receiver's ip/port)

TODO: diagram message flow... define edge cases like timeouts and how nodes should behave.

TODO- finish definition of the protocol and convert this to a PR towards the spec so that people can comment on individual lines.

The text was updated successfully, but these errors were encountered:

emhane · 2022-07-08T14:54:55Z

So this means the PING from the "receiver" to the "initiator" is dropped but places the entry in the "receiver's" state table for the "initiator's" PING to the "receiver" to be successful as long as it comes in less than 30 seconds, the timeout of a UDP state table entry in many routers, i.e. the time it takes for the RELAYRESPONSE to reach the "initiator" should be less than 30 seconds? The WHOAREYOU challenge of the "receiver" sent in response then uses the state table entry that the "initiator's" PING places in its state table to finalise the hole punching?

AgeManning · 2022-07-12T08:56:07Z

Nice!

A few thoughts:

I'm a big fan of SSZ, we use it everywhere (in eth2 and lighthouse land) except discv5. Discv5 uses RLP still. I would suggest we stick to one or the other to avoid extra dependencies. Either use RLP here, or shift other other encodings in discv5 to SSZ.
In the relay_response is there are reason its a uint8 vs a bool? Do we have more than two responses?

@emhane - I agree. Typically the round-trip type for requests in discv5 is small, not longer than a few seconds usually, so hoping via an intermediary should be < 30s. I think the initial PING sent by the receiver sets up its IP/port mapping allowing future packets from the initiator. This will probably get dropped if the initiator is itself behind a NAT, but will be received if it is not. In either case the initiator can then establish a handshake with the receiver. In our case, we will have to handle the case where one of our messages gets dropped (but this is implementation specific).

emhane · 2022-07-25T20:08:00Z

What if the body of the RELAYREQUEST is changed to from_node_enr: Enr, to_node_id: NodeId? Sending the enr of the initiator in the body will supply the receiver with the information it needs to send the PING request to the initiator in step 4.

sambacha · 2022-09-15T11:03:06Z

Why not consider wireguard for this?

pipermerriam · 2022-09-15T16:01:29Z

What if the body of the RELAYREQUEST is changed to from_node_enr: Enr, to_node_id: NodeId? Sending the enr of the initiator in the body will supply the receiver with the information it needs to send the PING request to the initiator in step 4.

Yes, this seems appropriate. I now see that without this, the "receiver" node will not necessarily have enough information to send the PING to the initiator, which would mean they would end up needing to do a lookup for them in the network to find their ENR. 👍

emhane · 2022-10-02T11:45:11Z

I have implemented your protocol outline @pipermerriam with the changes in @AgeManning 's comment above. Furthermore I changed

{response: 2} is in the body of the RELAYRESPONSE assembled by the rendezvous node in case the request to the receiver fails. This means a RELAYREQUEST is always responded, or the initiator knows the rendezvous it chose is faulty.
The nat field contains the IP address and the nat field exists 'exclusive or' the ip field (this is @AgeManning 's good idea). The ip field has precedence over the nat field if a node messes up its ENR and sets both. If a node is behind an asymmetric NAT, the udp/udp6 field is used to indicate the port at which it can be hole-punched. If the node is behind a symmetric NAT the udp/udp6 field is removed.
A node, once it has learnt it is behind a NAT, is responsible for PINGing its peers often enough to keep the hole punched.
The NAT traversal protocol is a feature a node can choose to disable (the discv5 topics protocol is also a feature in my implementation for sigpi).

emhane · 2022-10-03T13:13:56Z

I'm changing the to_node_id into to_node_enr in my implementation because otherwise a node has to store the enr of a peer that is potentially behind a NAT so that it knows where to send the hole-punch-ping upon a RELAYRESPONSE with body {response: 1}. This struct to store these ENRs of peers potentially behind a NAT has no obvious capacity limit. It is better if the hole-punch-ping is stateless. The RELAYREQUEST is still small in size in comparison to a NODES response.

pipermerriam mentioned this issue May 6, 2022

Rendezvous protocol ethereum/portal-network-specs#144

Closed

salwailiyas referenced this issue May 16, 2022

enr.md: add note about padding in v4 ID scheme

25e504b

emhane mentioned this issue Jul 21, 2022

NAT traversal via Rendezvous protocol sigp/discv5#133

Closed

acolytec3 mentioned this issue Jul 28, 2022

Rendezvous protocol support ChainSafe/discv5#199

Open

emhane mentioned this issue Mar 16, 2023

NAT traversal ethereum/trin#596

Closed

8 tasks

fjl mentioned this issue Mar 31, 2023

discv5: protocol version v5.2 #226

Open

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

discv5: NAT traversal via Rendezvous protocol [WIP] #207

discv5: NAT traversal via Rendezvous protocol [WIP] #207

pipermerriam commented May 5, 2022 •

edited

Loading

emhane commented Jul 8, 2022 •

edited

Loading

AgeManning commented Jul 12, 2022

emhane commented Jul 25, 2022

sambacha commented Sep 15, 2022

pipermerriam commented Sep 15, 2022

emhane commented Oct 2, 2022

emhane commented Oct 3, 2022

discv5: NAT traversal via Rendezvous protocol [WIP] #207

discv5: NAT traversal via Rendezvous protocol [WIP] #207

Comments

pipermerriam commented May 5, 2022 • edited Loading

Participants

Detecting whether you are behind a NAT

Signalling whether you are behind a NAT

Traversing the NAT

emhane commented Jul 8, 2022 • edited Loading

AgeManning commented Jul 12, 2022

emhane commented Jul 25, 2022

sambacha commented Sep 15, 2022

pipermerriam commented Sep 15, 2022

emhane commented Oct 2, 2022

emhane commented Oct 3, 2022

pipermerriam commented May 5, 2022 •

edited

Loading

emhane commented Jul 8, 2022 •

edited

Loading