Skip to content
Binary file removed meeting_slides/2018_05_17_P4_Memory_Use_Cases.pdf
Binary file not shown.
95 changes: 88 additions & 7 deletions telemetry/specs/INT.mdk
Original file line number Diff line number Diff line change
Expand Up @@ -624,19 +624,69 @@ corruption at preceding hops.

## Header Location

We describe three encapsulation formats in this specification, covering
We describe five encapsulation formats in this specification, covering
different deployment scenarios, with and without network virtualization:

1. *INT over TCP/UDP* - A shim header is inserted following TCP/UDP
2. "INT over IPv6" - INT Headers are carried in the IPv6 packets as Hop-by-Hop option.
3. *INT over TCP/UDP* - A shim header is inserted following TCP/UDP
header. INT Headers are carried between this shim header and TCP/UDP payload.
This approach doesn’t rely on any tunneling/virtualization mechanism and is
versatile to apply INT to both native and virtualized traffic.
2. *INT over VXLAN* - VXLAN generic protocol extensions
4. *INT over VXLAN* - VXLAN generic protocol extensions
(draft-ietf-nvo3-vxlan-gpe) are used to carry INT Headers between
the VXLAN header and the encapsulated VXLAN payload.
3. *INT over Geneve* - Geneve is an extensible tunneling framework, allowing
5. *INT over Geneve* - Geneve is an extensible tunneling framework, allowing
Geneve options to be defined for INT Headers.

### INT over IPv6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add INT over IPv6 after the other three encaps? Specially because the text in the paragraph is referring to scenarios where "INT over VXLAN or Geneve is not helpful"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed what was done earlier. TCP/UDP was listed first and it referenced encaps. I just stuck to that. I am fine with changing the order.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be woefully out of date on IPv6 extension header behavior, but regarding the option '"INT over IPv6" - INT Headers are carried in the IPv6 packets as Hop-by-Hop option.', I had thought that switches in practice have to punt packets with an IPv6 Hop-by-Hop extension header to the slow path, e.g. software forwarding on a general purpose CPU.

I did a quick search and found that RFC 7045 (published Dec 2013) says this in Section 2.2 "Hop-by-Hop Options":

The IPv6 Hop-by-Hop Options header SHOULD be processed by
intermediate forwarding nodes as described in [RFC2460]. However, it
is to be expected that high-performance routers will either ignore it
or assign packets containing it to a slow processing path. Designers
planning to use a hop-by-hop option need to be aware of this likely
behaviour.

Is there really a desire to put INT data into a header that will likely result in slow path processing in the network?


INT in IPv6 can be supported by encapsulating the INT Metadata Header and
Metadata in "option data" field of the Hop-by-Hop Options header. In order
for INT to work in IPv6 networks, INT must be explicitly enabled per interface
on every node within the INT domain. Unless a particular interface is explicity
enabled (i.e. explicity configured) for INT, a router MUST drop packets which
contain extension headers carrying INT Metadata Header and Metadata. This
ensures that INT data does not unitentionally get forwarded outside the
INT domain.

IPv6 Hop-by-Hop Option format for carrying INT Header and
Metadata:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Type | Opt Data Len |Reserved (MBZ) | INT TYPE |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<-+
| Variable Option Data (INT Metadata Headers and Metadata) | |
. . |
. . N
. . T
. . |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<-+

* Option Type: 8-bit identifier of the type of option.

001xxxxxx 8-bit identifier of the type of option. xxxxxx=TBD_IANA_INT_HOP_BY_HOP_OPTION_IPV6.
001xxxxxx 8-bit identifier of the type of option. xxxxxx=TBD_IANA_INT_DESTINATION_OPTION_IPV6.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the IANA registry, there are a total of 32 code points of which 17 have already been allocated. The registration procedure is IESG Approval, IETF Review or Standards Action. IOAM is asking for 4 code points, which seems unlikely. The chances for INT to get any code points are not high.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two options:

  1. Go with the single experimental use hop-by-hop-options codepoint (0b11110), and split the "Reserved (MBZ)" field into 8 bits reserved followed by 8 bits of INT type.
  2. Wait for IOAM to get a codepoint and use that. Split the "Reserved (MBZ)" field into 8 bits reserved followed by 8 bits of IOAM type. Assign relatively high IOAM type codepoints for INT hop-by-hop option and INT destination option.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see another problem with the corresponding IETF IOAM IPv6 draft. The text says that "a router MUST drop packets which contain extension headers carrying IOAM data-fields", to "ensure that the IOAM data does not unintentionally get forwarded outside the IOAM domain." However, they asked for an Option Type codepoint starting with "00", which means when the option type is unrecognized, "skip over this option and continue processing the header". If the text is correct, then they should ask for any of the other codepoint prefixes "01" (discard the packet), "10" (discard and send ICMP parameter problem, code 2, back to the packet's source address), or "11" (discard and send ICMP only if the packet's destination address was not a multicast address).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will close loop with IETF and address this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever we do, two codepoints will not fly. At a minimum we would have to go with TBD_IANA_INT_OPTION_IPV6 (not distinguishing between INT hop-by-hop and INT destination), which would later get resolved to either experimental hop-by-hop options codeopint or whatever IOAM has assigned. If we go with IOAM then the INT Type values might need to be shifted to avoid conflicts.

I also wonder if we should use xxx or yyy for the first 3 bits as well given the other open issue I stated above.


* Opt Data Len: 8-bit unsigned integer. Length of the Reserved and Option Data field of this
option, in octets.

* Reserved (MBZ): 16 bit field, must be filled with zeroes upon transmission and ignored upon

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 bit field

reception.

* Type: This field indicates the type of INT Metadata Header and Metadata following.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The figure above labels the field as "INT TYPE". We should make this consistent, I guess with "INT Type"?

Two Type values are used: one for the hop-by-hop header type and the other for
the destination header type (See Section [#sec-int-header-types]).

* Variable Option Data: Variable length field. INT Metadata Header and Metadata, multiple of
four octets in length.

The INT IPv6 options defined here have alignment requirements. Specifically, they require 4n alignment.
This ensures that 4 octet fields of the INT metadata, such as Hop Latency, are aligned at a multiple-of-4
offset from the start of the Hop-by-Hop Options header. In INT v2.0, there are 4-octets in the
shim header and 12-octets in the fixed header. In order to maintain IPv6 extension header 8-octet
alignment...padding requirement TBD

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add some text to clarify the following regarding INT within IPv6:

  1. What is the use-case for INT as IPv6 destination option? INT has its own destination header type, meant for the INT sink. In general, INT sink and IPv6 destination may not be the same.
  2. How does an INT switch handle the padding following INT data? Each hop inserts Hop-ML worth of metadata which is a multiple of 4 bytes, but not necessarily a multiple of 8 bytes. What happens if each hop is inserting an odd multiple of 4 bytes which is not a multiple of 8 bytes (say 20B). In such a case, at hop 1, we need 4 bytes of padding. Hop 2 can remove the padding inserted by hop 1 and comply. Hop 3 can insert 4 bytes padding again. Or do we say that each hop adds 4B of padding if HopML is an odd multiple of 4B? That would be wasteful.
  3. Regardless of what we do for Reporting physical and logical port ID in telemetry metadata #2, when HopML is an odd multiple of 4, each hop needs to push metadata at the top of the stack and do some manipulation (add/remove padding) at end of the stack. So INT behavior is different here. In other encapsulations, each hop simply inserts at the head of the metadata stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us discuss the padding issue in person.


### INT over TCP/UDP

In case the traffic being monitored is not encapsulated by any virtualization
Expand Down Expand Up @@ -940,7 +990,6 @@ INT Metadata Header and Metadata Stack:
`
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver = 2|Res|D|E|M| Reserved | Hop ML |RemainingHopCnt|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Expand Down Expand Up @@ -1110,7 +1159,6 @@ For INT over Geneve it is 8 bytes subtracted from (length in Geneve tunnel
option header \* 4).



# Examples

This section shows example INT Headers with two hosts (Host1 and Host2),
Expand Down Expand Up @@ -1220,7 +1268,40 @@ INT Metadata Header and Metadata Stack, followed by TCP payload:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| TCP payload |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
`


## Example with INT over IPv6 using Hop-by-Hop option

The format of the IPv6 packet with Hop-by-Hop option for INT-MD
(Embedded Metadata) where there are no other Hop-by-Hop option present
is shown below:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| Traffic Class | Flow Label |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload Length | Nxt HDR = HbyH| Hop Limit |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| (Outer) Source IPv6 Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| (Outer) Destination IPv6 Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<-+
| Nxt HDR = IPv6| HbyH Ext Len | Padding|(MBZ) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Type | Opt Data Len |Reserved (MBZ) | INT TYPE |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<-+
|Ver = 2|Rep|D|E|M| Reserved | Hop ML |RemainingHopCnt| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Instruction Bitmap | Domain Specific ID | I
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ N
| DS Flags | DS Instruction | T
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Variable Option Data (INT DATA) | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<-+
| Payload Original Packet |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


## Example with INT over VXLAN GPE

Expand Down