Restart retransmit timer on new data ACK #1018

simonborje · 2024-12-14T23:58:07Z

This is an attempt to address the issue described in #949, inspired by the patch and discussion in #953.

I don't want to step on anybody's toes - if you want to continue working on the original PR I don't mind 🙂.

Basically, what I've tried to do is to implement rfc6298 5.3 based on my understanding of that and how the code works. I added a test for the retransmission timer restart but I'm not sure if it is enough.

For testing I simulated a high latency link with 1000ms RTT and wrote a simple smoltcp client that sends a single byte every 100ms with Nagle disabled so that there is always outstanding data. Without the patch retransmissions keep happening, while the patched version only has a few initial retransmissions before they stop occurring. I don't have access to a real setup like the one in the original issue so I can't say for sure how much it helps in such situations.

codecov · 2024-12-15T00:00:28Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.83%. Comparing base (2ecd77f) to head (13774ff).
Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1018      +/-   ##
==========================================
+ Coverage   80.81%   80.83%   +0.02%     
==========================================
  Files          81       81              
  Lines       28385    28419      +34     
==========================================
+ Hits        22939    22973      +34     
  Misses       5446     5446

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

HKalbasi · 2024-12-15T20:36:51Z

If you want to test it you can use mininet. It can simulate networks with various host and switches, connected with links with custom delay and bandwidth. I can share my mininet config if you are interested.

simonborje · 2024-12-16T19:19:23Z

If you want to test it you can use mininet. It can simulate networks with various host and switches, connected with links with custom delay and bandwidth. I can share my mininet config if you are interested.

That would be really helpful, thank you :)

tomDev5 · 2024-12-23T07:36:49Z

As @Dirbaio mentioned in the matrix channel, the removal of delay: delay * 2, looks a bit suspicious.
From what I gather, mostly from the RFC-based comments in the code, this part (line 2527) should double the delay:

        if !self.seq_to_transmit(cx) && repr.segment_len() > 0 && !self.timer.is_retransmit() {
            // RFC 6298: (5.1) If we've transmitted all data we could (and there was
            // something at all, data or flag, to transmit, not just an ACK), start the
            // retransmit timer if it is not already running.
            self.timer
                .set_for_retransmit(cx.now(), self.rtte.retransmission_timeout());
        }

While this part (line 1838) should not, as it specifically calls for restarting the timer on retransmission:

            // RFC 6298: (5.2) ACK of all outstanding data turn off the retransmit timer.
            // (5.3) ACK of new data in ESTABLISHED state restart the retransmit timer.
            (State::Established, TcpControl::None) => {
                if ack_all {
                    self.timer.set_for_idle(cx.now(), self.keep_alive);
                } else if ack_len > 0 {
                    self.timer
                        .set_for_retransmit(cx.now(), self.rtte.retransmission_timeout());
                }
            }

If I am correct, it should be an easy fix (maybe an reset_retransmission: bool parameter)
I don't have a setup to test that this still fixes the issue, @simonborje / @HKalbasi if you could post here / create a repository of a test that would be great

simonborje · 2024-12-23T10:45:33Z

I was a bit unsure about the delay: delay * 2 part and initially thought it might be responsible for something like exponential backoff, however when testing it I couldn't see that it actually did anything useful.

I see that there is now a new PR #1020 for fixing exponential backoff that modifies the same parts as this PR, so perhaps I should wait for that to land and then rebase this PR.

HKalbasi · 2024-12-24T11:40:47Z

Here is my mininet config:

import subprocess

from mininet.cli import CLI
from mininet.link import TCLink
from mininet.net import Mininet
from mininet.node import OVSController
from mininet.topo import Topo

LEFT_HOST_NAME: str = 'left_h'
RIGHT_HOST_NAME: str = 'right_h'

class HostInTheMiddleTopo(Topo):

    def build(self):

        left_host = self.addHost(LEFT_HOST_NAME)
        right_host = self.addHost(RIGHT_HOST_NAME)

        self.addLink(left_host, right_host, bw=100, delay="1ms")


def get_interface(host: str, index: int):
    return f"{host}-eth{index}"


def disable_offload(host, offload: str, index: int):
    host.cmd(f"ethtool -K {get_interface(host.name, index)} {offload} off")

def run():
    net = Mininet(topo=HostInTheMiddleTopo(),
                  controller=OVSController, link=TCLink)

    left_host = net[LEFT_HOST_NAME]
    right_host = net[RIGHT_HOST_NAME]

    for host in [left_host, right_host, middle_host]:
        for name in ["tx", "rx"]:
            disable_offload(host, name, 0)

    left_host.cmd("tcpdump -w left.pcap &")
    right_host.cmd("tcpdump -w right.pcap &")

    # left_host.cmd(f"nc -l 0.0.0.0 8000 > /dev/null &")
    # left_host.cmd(f"yes | nc -l 0.0.0.0 8000 &")
    # right_host.cmd("ifstat -i right_h-eth0 > speedd.txt &")
    # left_host.cmd(f"sudo RUST_LOG=trace ./target/release/smol-speed > out.txt 2> err.txt &")

    net.start()
    CLI(net)
    net.stop()

if __name__ == "__main__":
    run()

You can run it by python3 net.py. It will give you a cli that have access to two machines connected with a link with bandwidth 100 Mbit/s and delay 1ms. You can run bash commands on each machine, for example left_h ip a or right_h ./run_my_program_in_background &. To test smoltcp with it, you should use a smoltcp::phy::RawSocket("left_h-eth0, Medium::Ethernet) and you can connect to it from right host using netcat, both from the python code and in the cli. You can also add more hosts and switches to make more complex network topologies.

HKalbasi mentioned this pull request Dec 15, 2024

Reset retransmit timer on ack regardless of the current timer state #953

Closed

Restart retransmit timer on new data ACK

13774ff

Dirbaio force-pushed the retransmit_timer_patch branch from 546aba9 to 13774ff Compare December 27, 2024 17:56

Dirbaio enabled auto-merge December 27, 2024 17:56

Dirbaio added this pull request to the merge queue Dec 27, 2024

Merged via the queue into smoltcp-rs:main with commit 37a6e1c Dec 27, 2024
12 checks passed

simonborje deleted the retransmit_timer_patch branch December 27, 2024 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restart retransmit timer on new data ACK #1018

Restart retransmit timer on new data ACK #1018

simonborje commented Dec 14, 2024

codecov bot commented Dec 15, 2024 •

edited

Loading

HKalbasi commented Dec 15, 2024

simonborje commented Dec 16, 2024

tomDev5 commented Dec 23, 2024

simonborje commented Dec 23, 2024

HKalbasi commented Dec 24, 2024

Restart retransmit timer on new data ACK #1018

Restart retransmit timer on new data ACK #1018

Conversation

simonborje commented Dec 14, 2024

codecov bot commented Dec 15, 2024 • edited Loading

Codecov Report

HKalbasi commented Dec 15, 2024

simonborje commented Dec 16, 2024

tomDev5 commented Dec 23, 2024

simonborje commented Dec 23, 2024

HKalbasi commented Dec 24, 2024

codecov bot commented Dec 15, 2024 •

edited

Loading