Skip to content

Conversation

@karlri
Copy link
Contributor

@karlri karlri commented Jun 21, 2025

This change seems to increase async wifi performance quite nicely. It helps with http response times and with ping times. I do not know of any adverse effects. I'm requesting that someone test this on xtensa. It would be very interesting to hear what results others get with this. If it seems promising we could refine this further. Think of this as a request for testing and comments on the approach for now. It could probably be improved if we could know:

  1. Size of the TX queue.
  2. If the interrupt that woke us from waiti needs to be serviced by the wifi task.

Test program:

controller
    .set_power_saving(esp_wifi::config::PowerSaveMode::None)
    .unwrap();
    
loop {
    let mut socket = TcpSocket::new(stack, &mut rx_buffer, &mut tx_buffer);
    let mut buf = [0; 512];

    let local_endpoint = (config.address.address(), 80);
    socket.accept(local_endpoint).await.unwrap();
    socket.read(&mut buf).await.unwrap();

    socket
        .write_all(b"HTTP/1.1 200 OK\r\nContent-Length: 6\r\nConnection: close\r\n\r\nhi m8\n")
        .await
        .unwrap();
    socket.flush().await.unwrap();
}

All tests performed on esp32s3. All tests with 1m between AP and esp32s3 in STA mode.

.cargo/config.toml esp32s3

[env]
ESP_WIFI_CONFIG_TICK_RATE_HZ = "1"
ESP_HAL_EMBASSY_CONFIG_LOW_POWER_WAIT = "true"
ESP_HAL_EMBASSY_CONFIG_LOW_POWER_WAIT_WIFI_PERF_OPT = "true"
ESP_WIFI_CONFIG_PHY_ENABLE_USB = "false"

Results:

64 bytes from 10.42.0.226: icmp_seq=2 ttl=64 time=0.640 ms
64 bytes from 10.42.0.226: icmp_seq=3 ttl=64 time=0.525 ms
64 bytes from 10.42.0.226: icmp_seq=4 ttl=64 time=0.737 ms
64 bytes from 10.42.0.226: icmp_seq=5 ttl=64 time=1.19 ms
64 bytes from 10.42.0.226: icmp_seq=6 ttl=64 time=1.15 ms
64 bytes from 10.42.0.226: icmp_seq=7 ttl=64 time=1.17 ms
64 bytes from 10.42.0.226: icmp_seq=8 ttl=64 time=0.654 ms
64 bytes from 10.42.0.226: icmp_seq=9 ttl=64 time=0.999 ms
64 bytes from 10.42.0.226: icmp_seq=10 ttl=64 time=0.591 ms

curl http://10.42.0.226  0,00s user 0,00s system 44% cpu 0,009 total
curl http://10.42.0.226  0,00s user 0,00s system 41% cpu 0,009 total
curl http://10.42.0.226  0,00s user 0,00s system 51% cpu 0,006 total
curl http://10.42.0.226  0,00s user 0,00s system 48% cpu 0,008 total

.cargo/config.toml esp32s3 1KHz optimization disabled for comparison

[env]
ESP_WIFI_CONFIG_TICK_RATE_HZ = "1000"
ESP_HAL_EMBASSY_CONFIG_LOW_POWER_WAIT = "true"
ESP_HAL_EMBASSY_CONFIG_LOW_POWER_WAIT_WIFI_PERF_OPT = "false"
ESP_WIFI_CONFIG_PHY_ENABLE_USB = "false"

Results:

64 bytes from 10.42.0.226: icmp_seq=1 ttl=64 time=2.45 ms
64 bytes from 10.42.0.226: icmp_seq=2 ttl=64 time=1.85 ms
64 bytes from 10.42.0.226: icmp_seq=3 ttl=64 time=1.53 ms
64 bytes from 10.42.0.226: icmp_seq=4 ttl=64 time=1.25 ms
64 bytes from 10.42.0.226: icmp_seq=5 ttl=64 time=0.968 ms
64 bytes from 10.42.0.226: icmp_seq=6 ttl=64 time=0.851 ms
64 bytes from 10.42.0.226: icmp_seq=7 ttl=64 time=2.31 ms
64 bytes from 10.42.0.226: icmp_seq=8 ttl=64 time=1.46 ms
64 bytes from 10.42.0.226: icmp_seq=9 ttl=64 time=1.64 ms
64 bytes from 10.42.0.226: icmp_seq=10 ttl=64 time=9.85 ms
64 bytes from 10.42.0.226: icmp_seq=11 ttl=64 time=0.831 ms

curl http://10.42.0.226  0,00s user 0,00s system 33% cpu 0,010 total
curl http://10.42.0.226  0,00s user 0,00s system 32% cpu 0,011 total
curl http://10.42.0.226  0,00s user 0,00s system 30% cpu 0,011 total
curl http://10.42.0.226  0,00s user 0,00s system 36% cpu 0,010 total

Keep in mind that tie curl timing includes loading the curl executable so the improvement is bigger than it might seem.

Thank you for your contribution!

We appreciate the time and effort you've put into this pull request.
To help us review it efficiently, please ensure you've gone through the following checklist:

Submission Checklist 📝

  • I have updated existing examples or added new ones (if applicable).
  • I have used cargo xtask fmt-packages command to ensure that all changed code is formatted correctly.
  • My changes were added to the CHANGELOG.md in the proper section.
  • I have added necessary changes to user code to the Migration Guide.
  • My changes are in accordance to the esp-rs developer guidelines

Extra:

Pull Request Details 📖

Description

Please provide a clear and concise description of your changes, including the motivation behind these changes. The context is crucial for the reviewers.

Testing

Describe how you tested your changes.

Before going into low power sleep, yield to the wifi task.
This should cause it to send outbound packets before sleeping.
It should also cause inbound packets to be processed earlier
since if we wake from waiti because of a wifi interrupt we will
repoll without progress and then run the wifi stack which will
generate work for the executor.
@bugadani
Copy link
Contributor

Interesting idea! I believe what you want can be implemented in user-code by creating a task that is always ready to run:

#[embassy_executor::task]
async fn force_run_esp_wifi() {
    loop {
        yield_task(); // force an esp-wifi context switch
        embassy_futures::yield_now().await; // keep the task ready to run
    }
}

@karlri
Copy link
Contributor Author

karlri commented Jun 23, 2025

I think it would result in busy polling since that future is always ready to run if i understand correctly. That, again if I understand correctly, would prevent the executor from ever attempting to wait (low power sleep waiting for interrupt with the waiti instruction).

The key idea about my proposal is both improving performance and improving power consumption by reducing the context switching rate when there is little I/O.

I believe there is no place a user can fix this using the public API but again I might be wrong.

If there was an api like
executor.set_idle_hook(yeild_task); which would run the idle hook only if there is nothing useful to do at the moment, just before sleeping, then that could be used.

@bugadani
Copy link
Contributor

if there is nothing useful to do at the moment

We don't know if there is something to do, or not. That information is encoded in the SIGNAL_WORK_THREAD_MODE flags. poll returns after going through the task queue once, regardless of whether any tasks were replaced into the ready queue.

executor.set_idle_hook(yeild_task);

Because this would be called after every single poll, it would hurt performance a bit. Yielding to esp-wifi also hurts performance in the same way. The amount of work done for each poll should be kept to the absolute minimum. This isn't a big problem if users need to opt-in to the feature, but I'm hoping we can do something cheaper that benefits everyone.

}

#[cfg(low_power_wait_wifi_perf_opt)]
if cpu == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if it's a good idea to introduce more places where esp-wifi is hard coded to run on core 0.

@bugadani
Copy link
Contributor

I'm thinking that if we added a fn run_with_idle_callback entry point to the executor (that would take the init fn, as well as another closure that would be called once), we can avoid adding cost to users who don't need the idle hook. We'd need an alternative wait_impl I think, but that shouldn't be too bad.

@bugadani
Copy link
Contributor

bugadani commented Jul 3, 2025

With #3737 merged, it should now be possible to implement this in user code. Maybe we should make the yield function public in esp-wifi, if the builtin scheduler is used, so that people aren't forced to reimplement it.

@MabezDev
Copy link
Member

Closing in favour of using the hook mechanic implemented in #3737.

@MabezDev MabezDev closed this Jul 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants