You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The SD-WAN/SASE/SSE set of mechanisms use encapsulation to direct traffic to intermediate nodes in the network. This reduces the effective payload MTU of end-user applications. Similarly traffic between intermediate nodes in the solution uses encapsulation, for security, to carry meta data and to direct traffic.
Internally in the data center the MTU is high enough (9K) to support arbitrary long encapsulation. The path MTU between the end-user and the data center, and the path MTU between data-centres is typically limited to 1500 bytes.
Today, the problem has been dealt with by a combination of:
- manual configuration of a smaller MTU at the end-user site
- TCP MSS clamping
- IP fragmentation
- Path MTU discovery (sending packet too big ICMP messages back to the host)
These mechanisms all have undesired side-effects. ICMP error messages are not robust and often not acted upon by the application. TCP MSS clamping, works fine, but only for TCP. IP fragmentation means the 5-tuple isn't available in every packet, which both increases drop probability, affects ECMP and requires stateful devices to do virtual reassembly.
Traffic between data centres and between end-users and a data centre is carried
over point to point tunnels. The tunnel acts as a pseudo-wire. The idea is to pr
ovide a substrate within the tunnel that offers a higher MTU (at least 1500) tha
n the underlaying path MTU.
If one thinks of the underlaying path MTU to have a fixed cell size, e.g. 1280 b
ytes. Then the idea is to use a combination of packet coalescing, and tunnel sub
strate segmentation and reassembly to fill these cells. The hypotheses is that by doing so it is possible to provide a higer inner MTU on the link, while at the
same time not increase the number of packets.
Given that the tunnel substrate runs between two endpoints, there is no issue with mis-identification of fragments as one has with 'normal' IP fragmentation.
The approach is uniquely suited to VPP. Where the mechanism, called "Packet Vectorisation" is run on the interface output vector in VPP.
If the underlaying path MTU is 1280 bytes, the tunnel overhead is 40 bytes, and
the output vector consists of 5 packets of sizes {40, 40, 1500, 1500, 40}, then
the resulting packets with tunnel encapsulation to be sent are: {1280, 1280, 560}
The tunnel sublayer is within a UDP packet. UDP is used to ensure ECMP load-balancing. A fixed destination port and a per-vector randomized source port.
There is an initial tunnel sublayer header containing a 32-bit sequence number, and a flag indicating whether the very first chunk contains a fragment or not.
For each packet segment there is a chunk header containing a total length field.
All the chunks except the first and last one are expected to be full packets rather than fragments.
The determination whether the given inner chunk contains the full packet or not is taken based on parsing of the inner packet payload, and thus using the IPv4 or IPv6 total length field from the header (subject to change).
Implementation:
there are two packet paths: transmit and receive. The transmit packet path implements a tunnel interface similar to that of wireguard, etc. Upon getting the buffer to send, it adds the chunk header and temporarily buffers the block in anticipation of possibly tacking on the additional packets. After the processing of the current vector is done, any of the pending blocks get the encapsulation header and are sent out - therefore the added delay is minimal. Each tunnelled packet gets its sequence number.
The RX path performs the reverse operation. First operation is checking of the received sequence number vs. the last received sequence number. If they are not strictly adjacent, then the fragment chunks which are pending the reassembly are discarded, if any, and the first chunk from the newly received packet is discarded as well. Else a reassembly attempt is made. If there are remaining chunks in the packet and the reassembly is not successful, then the unreassembled data is discarded. and reassembly state reset
After this, all the subsequent inner chunks are decapsulated (with the check that they do form the full packets). The first chunk which does not form the full upper layer PDU is stored as a fragment in progress, and terminates the processing.
Unit testing:
Since the unit testing framework uses Scapy, the testing of a new custom packet format protocol requires the development of a new packet dissector / assembler for Scapy. The relevant documentation is at https://scapy.readthedocs.io/en/latest/build_dissect.html
This code for testing the TX path needs to parse the packets sent by the VPP data plane, and verify the sanity of the packet format, for various combinations and sizes of the payloads. For the RX path the testing code needs to create various combinations of the tunnelled payloads in scapy code and then inject them into the VPP data path, and verify that the decapsulation process behaves as expected. Also there need to be negative tests, e.g. the ones supplying incorrect chunk lengths, etc., to ensure that the tunnel code is robust enough to the accidental wrong data.
Security & spoofing:
The presented protocol design assumes the trusted underlay, insofar that it does not attempt to verify whether the packet is spoofed or not. This is not different from any other "simple" tunnelling protocol like GRE/VXLAN/Geneve, etc. This protocol is specifically designed to be layered atop any other tunnelling protocol to augment the overall tunnelling properties.
Description
The SD-WAN/SASE/SSE set of mechanisms use encapsulation to direct traffic to intermediate nodes in the network. This reduces the effective payload MTU of end-user applications. Similarly traffic between intermediate nodes in the solution uses encapsulation, for security, to carry meta data and to direct traffic.
Internally in the data center the MTU is high enough (9K) to support arbitrary long encapsulation. The path MTU between the end-user and the data center, and the path MTU between data-centres is typically limited to 1500 bytes.
Today, the problem has been dealt with by a combination of:
- manual configuration of a smaller MTU at the end-user site
- TCP MSS clamping
- IP fragmentation
- Path MTU discovery (sending packet too big ICMP messages back to the host)
These mechanisms all have undesired side-effects. ICMP error messages are not robust and often not acted upon by the application. TCP MSS clamping, works fine, but only for TCP. IP fragmentation means the 5-tuple isn't available in every packet, which both increases drop probability, affects ECMP and requires stateful devices to do virtual reassembly.
References:
https://datatracker.ietf.org/doc/draft-templin-intarea-parcels/
https://datatracker.ietf.org/doc/html/draft-templin-6man-omni-interface
https://datatracker.ietf.org/doc/rfc9347/
Traffic between data centres and between end-users and a data centre is carried
over point to point tunnels. The tunnel acts as a pseudo-wire. The idea is to pr
ovide a substrate within the tunnel that offers a higher MTU (at least 1500) tha
n the underlaying path MTU.
If one thinks of the underlaying path MTU to have a fixed cell size, e.g. 1280 b
ytes. Then the idea is to use a combination of packet coalescing, and tunnel sub
strate segmentation and reassembly to fill these cells. The hypotheses is that by doing so it is possible to provide a higer inner MTU on the link, while at the
same time not increase the number of packets.
Given that the tunnel substrate runs between two endpoints, there is no issue with mis-identification of fragments as one has with 'normal' IP fragmentation.
The approach is uniquely suited to VPP. Where the mechanism, called "Packet Vectorisation" is run on the interface output vector in VPP.
If the underlaying path MTU is 1280 bytes, the tunnel overhead is 40 bytes, and
the output vector consists of 5 packets of sizes {40, 40, 1500, 1500, 40}, then
the resulting packets with tunnel encapsulation to be sent are: {1280, 1280, 560}
The tunnel sublayer is within a UDP packet. UDP is used to ensure ECMP load-balancing. A fixed destination port and a per-vector randomized source port.
There is an initial tunnel sublayer header containing a 32-bit sequence number, and a flag indicating whether the very first chunk contains a fragment or not.
For each packet segment there is a chunk header containing a total length field.
All the chunks except the first and last one are expected to be full packets rather than fragments.
The determination whether the given inner chunk contains the full packet or not is taken based on parsing of the inner packet payload, and thus using the IPv4 or IPv6 total length field from the header (subject to change).
Implementation:
there are two packet paths: transmit and receive. The transmit packet path implements a tunnel interface similar to that of wireguard, etc. Upon getting the buffer to send, it adds the chunk header and temporarily buffers the block in anticipation of possibly tacking on the additional packets. After the processing of the current vector is done, any of the pending blocks get the encapsulation header and are sent out - therefore the added delay is minimal. Each tunnelled packet gets its sequence number.
The RX path performs the reverse operation. First operation is checking of the received sequence number vs. the last received sequence number. If they are not strictly adjacent, then the fragment chunks which are pending the reassembly are discarded, if any, and the first chunk from the newly received packet is discarded as well. Else a reassembly attempt is made. If there are remaining chunks in the packet and the reassembly is not successful, then the unreassembled data is discarded. and reassembly state reset
After this, all the subsequent inner chunks are decapsulated (with the check that they do form the full packets). The first chunk which does not form the full upper layer PDU is stored as a fragment in progress, and terminates the processing.
Unit testing:
Since the unit testing framework uses Scapy, the testing of a new custom packet format protocol requires the development of a new packet dissector / assembler for Scapy. The relevant documentation is at https://scapy.readthedocs.io/en/latest/build_dissect.html
This code for testing the TX path needs to parse the packets sent by the VPP data plane, and verify the sanity of the packet format, for various combinations and sizes of the payloads. For the RX path the testing code needs to create various combinations of the tunnelled payloads in scapy code and then inject them into the VPP data path, and verify that the decapsulation process behaves as expected. Also there need to be negative tests, e.g. the ones supplying incorrect chunk lengths, etc., to ensure that the tunnel code is robust enough to the accidental wrong data.
Security & spoofing:
The presented protocol design assumes the trusted underlay, insofar that it does not attempt to verify whether the packet is spoofed or not. This is not different from any other "simple" tunnelling protocol like GRE/VXLAN/Geneve, etc. This protocol is specifically designed to be layered atop any other tunnelling protocol to augment the overall tunnelling properties.
Assignee
Unassigned
Reporter
Ole Trøan
Comments
No comments.
Original issue: https://jira.fd.io/browse/VPP-2112
The text was updated successfully, but these errors were encountered: