Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory corruption when saving network settings #2313

Open
4 tasks done
schlimmchen opened this issue Sep 28, 2024 · 32 comments
Open
4 tasks done

Memory corruption when saving network settings #2313

schlimmchen opened this issue Sep 28, 2024 · 32 comments
Labels
bug Something isn't working

Comments

@schlimmchen
Copy link
Contributor

What happened?

I noticed a bunch of ESP reboots after exceptions and started digging. One of them I could isolate and pin down to also occur in this project: When saving the network settings, there is some kind of memory corruption.

To Reproduce Bug

Save the network settings by clicking the save button in the web UI. Repeat until you observe a random crash.

Expected Behavior

Graceful application of network settings.

Install Method

Self-Compiled

What git-hash/version of OpenDTU?

3559007

What firmware variant (PIO Environment) are you using?

generic_esp32s3_usb

Relevant log/trace output

No response

Anything else?

Example 1
Nothing received, resend whole request
TX AlarmData 868.00 MHz --> 15 93 10 07 59 80 16 72 56 80 11 00 66 F8 73 EA 00 00 00 00 00 00 00 00 BB 5E 09 
Setting Hostname... done
Interrupt received
Configuring WiFi STA using existing credentials... done
Configuring WiFi STA DHCP IP... [ 27042][E][WiFiClient.cpp:275] connect(): socket error on fd 49, errno: 113, "Software caused connection abort"
done

assert failed: heap_caps_free heap_caps.c:381 (heap != NULL && "free() target pointer is outside heap areas")


Backtrace: 0x40377b32:0x3fcb8110 0x4037d1d1:0x3fcb8130 0x40383ddd:0x3fcb8150 0x40378382:0x3fcb8280 0x40383e0d:0x3fcb82a0 0x4203dfe5:0x3fcb82c0 0x4203dff5:0x3fcb82e0 0x4202bd4d:0x3fcb8300 0x4202e422:0x3fcb8320 0x42
013d4e:0x3fcb8340 0x42013d66:0x3fcb8360 0x42013d71:0x3fcb8380 0x4202d706:0x3fcb83a0 0x4202f67d:0x3fcb83c0 0x4202ba12:0x3fcb83e0 0x4202ba1f:0x3fcb8400 0x420c89ea:0x3fcb8420 0x420c8d79:0x3fcb8450 0x420c908f:0x3fcb84
80




ELF file SHA256: 664fedd146fc71fc
Example 2
Admin AP remaining seconds: 110 / 120
Setting Hostname... done
[114955][E][WiFiUdp.cpp:221] parsePacket(): could not receive daConfiguring WiFi STA using ta: 9
existing credentials... done
Configuring WiFi STA DHCP IP... done
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x4202bd48  PS      : 0x00060f30  A0      : 0x8202e425  A1      : 0x3fcb8360  
A2      : 0x3fcc3fe0  A3      : 0x00000000  A4      : 0x00000000  A5      : 0x3fcc21c8  
A6      : 0x3fcf0b24  A7      : 0x00000000  A8      : 0x8202bd5d  A9      : 0x3fcb8340  
A10     : 0x00000018  A11     : 0x3fcc322c  A12     : 0x00000000  A13     : 0x00000001  
A14     : 0x00060920  A15     : 0x00000001  SAR     : 0x00000005  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x00000000  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0xffffffff  


Backtrace: 0x4202bd45:0x3fcb8360 0x4202e422:0x3fcb8380 0x42013d4e:0x3fcb83a0 0x42013d66:0x3fcb83c0 0x42013d71:0x3fcb83e0 0x4202d706:0x3fcb8400 0x4202f67d:0x3fcb8420 0x4202ba12:0x3fcb8440 0x4202ba1f:0x3fcb8460 0x42
0c89ea:0x3fcb8480 0x420c8d79:0x3fcb84b0 0x420c908f:0x3fcb84e0




ELF file SHA256: 664fedd146fc71fc
Example 3
All missing
Nothing received, resend whole request
TX RealTimeRunData 868.00 MHz --> 15 93 10 07 59 80 16 72 56 80 0B 00 66 F8 74 A5 00 00 00 00 00 00 00 00 88 84 B2 
Setting Hostname... done
[ 17184][E][WiFiUdp.cpp:221] parsePacket(): could not receive daConfiguring WiFi STA using ta: 9
existing credentials... done
Configuring WiFi STA DHCP IP... [ 17195][E][WiFiClient.cpp:275] connect(): socket error on fd 49, errno: 113, "Software caused connection abort"
done

assert failed: tcp_update_rcv_ann_wnd IDF/components/lwip/lwip/src/core/tcp.c:951 (new_rcv_ann_wnd <= 0xffff)


Backtrace: 0x40377b32:0x3fcedb80 0x4037d1d1:0x3fcedba0 0x40383ddd:0x3fcedbc0 0x4205c85e:0x3fcedcf0 0x4205c8f2:0x3fcedd10 0x420c8135:0x3fcedd30 0x420594cd:0x3fcedd50




ELF file SHA256: 664fedd146fc71fc

Please confirm the following

  • I believe this issue is a bug that affects all users of OpenDTU, not something specific to my installation.
  • I have already searched for relevant existing issues and discussions before opening this report.
  • I have updated the title field above with a concise description.
  • I have double checked that my inverter does not contain a W in the model name (like HMS-xxxW) as they are not supported.
@schlimmchen schlimmchen added the bug Something isn't working label Sep 28, 2024
@schlimmchen
Copy link
Contributor Author

schlimmchen commented Sep 28, 2024

In a quick and dirty test I made NetworkSettings execute enableAdminMode() and applyConfig() from inside its loop() (to have these functions execute synchronously to the main loop()), but it seems the issue is not fully resolved:

Log
RX Period End
All missing
Nothing received, resend whole request
TX RealTimeRunData 868.00 MHz --> 15 93 10 07 59 80 16 72 56 80 0B 00 66 F8 76 F4 00 00 00 00 00 00 00 00 D1 D4 E8 
Disconnected from MQTT.
Disconnect reason:TCP_DISCONNECTED

assert failed: tcp_sent IDF/components/lwip/lwip/src/core/tcp.c:2131 (invalid socket state for sent callback)


Backtrace: 0x40377b32:0x3fcb8300 0x4037d1d1:0x3fcb8320 0x40383ddd:0x3fcb8340 0x4205c9e2:0x3fcb8470 0x420c8d5a:0x3fcb8490 0x420c90bf:0x3fcb84c0




ELF file SHA256: d80c092b282e219f

It seems I can't reproduce the issue when I wait for the device to print that it got it's IP address before hitting the "Save" button again, at least with the change I described in this comment.

Edit: I found this: #2298 (comment) which could be related.

@stefan123t
Copy link
Contributor

stefan123t commented Sep 28, 2024

@schlimmchen yes I have read that issue #2298 too. Basically your fix sounds good, but there seems to be an RF task running in parallel when I look at the Example Logs you provided. Here the IRQ subroutine may be triggered which could “interrupt” [sic] your inline code path.
Can we disable RF comms before we save the config, or is that part of the enableAdminMode() already ? Maybe we have to disable the IRQ before and flush the RF buffer afterwards ?
Maybe add a semaphore to enableAdminMode ?

@tbnobody
Copy link
Owner

It would be helpfull to show use the log output in vscode as it automatically integrates the exception parser and shows a proper stack trace with readable symbols.

@tbnobody
Copy link
Owner

When testing the migration to arduino core 3 I also realized that it immediatly crashes in WiFi.disconnect(true, true); didn't have time to debug this any further.

@stefan123t
Copy link
Contributor

@schlimmchen is this solved with your other fix from today that prevents allocating a m = new AuthenticationMiddleware(); for every Live API request or is this still an other issue ?

@tbnobody
Copy link
Owner

IMHO this is a different issue as it only occours when pressing save.

@stefan123t
Copy link
Contributor

see #2360

@schlimmchen
Copy link
Contributor Author

see #2360

Yeah, maybe... Maybe not? Maybe #2360 will only mask the issue?

I know I need to send backtraces... In case I can't reproduce this with #2360, I will instead close this, regardless of whether the root cause was fixed or concealed.

@schlimmchen
Copy link
Contributor Author

Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x4202d894  PS      : 0x00060b30  A0      : 0x8202fd05  A1      : 0x3fcb8900
A2      : 0x3fcbbcd8  A3      : 0x00000000  A4      : 0x00000000  A5      : 0x3fcb9e54
A6      : 0x3fcf0b24  A7      : 0x00000000  A8      : 0x8202d8a9  A9      : 0x3fcb88e0
A10     : 0x00000018  A11     : 0x3fcba658  A12     : 0x00000000  A13     : 0x00000001
A14     : 0x00060520  A15     : 0x00000001  SAR     : 0x0000001d  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0xffffffff

Backtrace: 0x4202d891:0x3fcb8900 0x4202fd02:0x3fcb8920 0x42027fda:0x3fcb8940 0x4202800a:0x3fcb8960 0x42028015:0x3fcb8980 0x4202efe6:0x3fcb89a0 0x42030cad:0x3fcb89c0 0x4202cb17:0x3fcb89e0 0x4202cb23:0x3fcb8a00 0x420cdbb2:0x3fcb8a20 0x420cdf41:0x3fcb8a50 0x420ce257:0x3fcb8a80

ELF file SHA256: 9fc0483beb1c1f20

0x4202d891: AsyncWebHeader::~AsyncWebHeader() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:137
 (inlined by) void __gnu_cxx::new_allocator<std::_List_node<AsyncWebHeader> >::destroy<AsyncWebHeader>(AsyncWebHeader*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/ext/new_allocator.h:140
 (inlined by) void std::allocator_traits<std::allocator<std::_List_node<AsyncWebHeader> > >::destroy<AsyncWebHeader>(std::allocator<std::_List_node<AsyncWebHeader> >&, AsyncWebHeader*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/alloc_traits.h:487
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_clear() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/list.tcc:77
0x4202fd02: std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~_List_base() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:507
 (inlined by) std::__cxx11::list<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~list() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:835
 (inlined by) AsyncWebServerResponse::~AsyncWebServerResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:766
0x42027fda: AsyncAbstractResponse::~AsyncAbstractResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebResponseImpl.h:47
0x4202800a: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x42028015: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x4202efe6: AsyncWebServerRequest::~AsyncWebServerRequest() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:53 (discriminator 1)
0x42030cad: AsyncWebServer::_handleDisconnect(AsyncWebServerRequest*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebServer.cpp:138 (discriminator 1)
0x4202cb17: AsyncWebServerRequest::_onDisconnect() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:204
0x4202cb23: std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#3}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:41
 (inlined by) _M_invoke at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:297
0x420cdbb2: std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:687
0x420cdf41: AsyncClient::_error(signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:945
0x420ce257: AsyncClient::_s_error(void*, signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:1391
 (inlined by) _handle_async_event at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:184
 (inlined by) _async_service_task at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:207

Nope... Got this on the first try using v24.10.15 including the write guard.

The backtrace seems to suggest that the Async TCP Server can't handle the underlying network connection breaking, or to be more specific: Once the connection breaks, handling the disconnect triggers some kind of issue.

assert failed: tlsf_free heap_tlsf.c:965 (!block_is_free(block) && "block already marked as free")

Backtrace: 0x40377dc6:0x3fcb8650 0x4037d241:0x3fcb8670 0x40383f55:0x3fcb8690 0x4038307d:0x3fcb87c0 0x40383a48:0x3fcb87e0 0x40383b76:0x3fcb8800 0x4037863d:0x3fcb8820 0x40383f85:0x3fcb8840 0x420ca799:0x3fcb8860 0x4202d8a6:0x3fcb8880 0x4202fd02:0x3fcb88a0 0x42027fda:0x3fcb88c0 0x4202800a:0x3fcb88e0 0x42028015:0x3fcb8900 0x4202efe6:0x3fcb8920 0x42030cad:0x3fcb8940 0x4202cb17:0x3fcb8960 0x4202cb23:0x3fcb8980 0x420cdbb2:0x3fcb89a0 0x420cdf41:0x3fcb89d0 0x420ce257:0x3fcb8a00

ELF file SHA256: 9fc0483beb1c1f20
0x40377dc6: panic_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/panic.c:408
0x4037d241: esp_system_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/esp_system.c:137
0x40383f55: __assert_func at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/assert.c:85
0x4038307d: block_merge_prev at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_tlsf.c:343
 (inlined by) tlsf_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_tlsf.c:967
0x40383a48: multi_heap_free_impl at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap.c:212
0x40383b76: multi_heap_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:266
0x4037863d: heap_caps_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c:382
0x40383f85: free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/heap.c:39
0x420ca799: operator delete(void*) at /builds/idf/crosstool-NG/.build/xtensa-esp32s3-elf/src/gcc/libstdc++-v3/libsupc++/del_op.cc:49
0x4202d8a6: __gnu_cxx::new_allocator<std::_List_node<AsyncWebHeader> >::deallocate(std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/ext/new_allocator.h:125
 (inlined by) std::allocator_traits<std::allocator<std::_List_node<AsyncWebHeader> > >::deallocate(std::allocator<std::_List_node<AsyncWebHeader> >&, std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/alloc_traits.h:462
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_put_node(std::_List_node<AsyncWebHeader>*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:454
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_clear() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/list.tcc:81
0x4202fd02: std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~_List_base() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:507
 (inlined by) std::__cxx11::list<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~list() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:835
 (inlined by) AsyncWebServerResponse::~AsyncWebServerResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:766
0x42027fda: AsyncAbstractResponse::~AsyncAbstractResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebResponseImpl.h:47
0x4202800a: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x42028015: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x4202efe6: AsyncWebServerRequest::~AsyncWebServerRequest() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:53 (discriminator 1)
0x42030cad: AsyncWebServer::_handleDisconnect(AsyncWebServerRequest*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebServer.cpp:138 (discriminator 1)
0x4202cb17: AsyncWebServerRequest::_onDisconnect() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:204
0x4202cb23: std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#3}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:41
 (inlined by) _M_invoke at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:297
0x420cdbb2: std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:687
0x420cdf41: AsyncClient::_error(signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:945
0x420ce257: AsyncClient::_s_error(void*, signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:1391
 (inlined by) _handle_async_event at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:184
 (inlined by) _async_service_task at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:207

Hm, similar context. Maybe a double free?

Guru Meditation Error: Core  0 panic'ed (LoadStoreError). Exception was unhandled.

Core  0 register dump:
PC      : 0x40383cd0  PS      : 0x00060133  A0      : 0x80383b5f  A1      : 0x3fcb87d0
A2      : 0x3fcade10  A3      : 0x00000001  A4      : 0x00000004  A5      : 0xbaad5678
A6      : 0x00060920  A7      : 0x00000001  A8      : 0x43b9e1ff  A9      : 0x00000003
A10     : 0x00000001  A11     : 0x3fcade08  A12     : 0xabba1234  A13     : 0xabba1234
A14     : 0x00060120  A15     : 0x00000001  SAR     : 0x00000015  EXCCAUSE: 0x00000003
EXCVADDR: 0x43b9e1ff  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0xffffffff

Backtrace: 0x40383ccd:0x3fcb87d0 0x40383b5c:0x3fcb8800 0x4037863d:0x3fcb8820 0x40383f85:0x3fcb8840 0x420ca799:0x3fcb8860 0x4202d8a6:0x3fcb8880 0x4202fd02:0x3fcb88a0 0x42027fda:0x3fcb88c0 0x4202800a:0x3fcb88e0 0x42028015:0x3fcb8900 0x4202efe6:0x3fcb8920 0x42030cad:0x3fcb8940 0x4202cb17:0x3fcb8960 0x4202cb23:0x3fcb8980 0x420cdbb2:0x3fcb89a0 0x420cdf41:0x3fcb89d0 0x420ce257:0x3fcb8a00

ELF file SHA256: 9fc0483beb1c1f20
0x40383ccd: verify_allocated_region at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:116
0x40383b5c: multi_heap_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:258
0x4037863d: heap_caps_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c:382
0x40383f85: free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/heap.c:39
0x420ca799: operator delete(void*) at /builds/idf/crosstool-NG/.build/xtensa-esp32s3-elf/src/gcc/libstdc++-v3/libsupc++/del_op.cc:49
0x4202d8a6: __gnu_cxx::new_allocator<std::_List_node<AsyncWebHeader> >::deallocate(std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/ext/new_allocator.h:125
 (inlined by) std::allocator_traits<std::allocator<std::_List_node<AsyncWebHeader> > >::deallocate(std::allocator<std::_List_node<AsyncWebHeader> >&, std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/alloc_traits.h:462
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_put_node(std::_List_node<AsyncWebHeader>*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:454
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_clear() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/list.tcc:81
0x4202fd02: std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~_List_base() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:507
 (inlined by) std::__cxx11::list<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~list() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:835
 (inlined by) AsyncWebServerResponse::~AsyncWebServerResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:766
0x42027fda: AsyncAbstractResponse::~AsyncAbstractResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebResponseImpl.h:47
0x4202800a: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x42028015: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x4202efe6: AsyncWebServerRequest::~AsyncWebServerRequest() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:53 (discriminator 1)
0x42030cad: AsyncWebServer::_handleDisconnect(AsyncWebServerRequest*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebServer.cpp:138 (discriminator 1)
0x4202cb17: AsyncWebServerRequest::_onDisconnect() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:204
0x4202cb23: std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#3}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:41
 (inlined by) _M_invoke at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:297
0x420cdbb2: std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:687
0x420cdf41: AsyncClient::_error(signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:945
0x420ce257: AsyncClient::_s_error(void*, signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:1391
 (inlined by) _handle_async_event at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:184
 (inlined by) _async_service_task at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:207

One more for good measure. Again, similar context. Maybe @mathieucarbou is willing to take a look?

@stefan123t
Copy link
Contributor

stefan123t commented Oct 22, 2024

Yeah maybe same context, the plot thickens as they say:

espasyncwebserver.h:766
webresponseimpl.h:47
asyncjson.h:58
asyncjson.h:58
WebRequest.cpp:53 (discriminator 1)
WebServer.cpp:138 (discriminator 1)
webrequest.cpp:204

While the webrequest seems to cope with AsyncWebServer::_handleDisconnect() during AsyncWebServerRequest::_onDisconnect()

Can we provoke this situation somehow?

I assume maybe something like a WLAN disconnect / Application Level Firewall or a tcpdump may tell us a bit more what happens on the network and where in the code / process this breaks.

@mathieucarbou
Copy link

mathieucarbou commented Oct 22, 2024

One more for good measure. Again, similar context. Maybe @mathieucarbou is willing to take a look?

I'm sorry I won't be of a great help on this at the moment (too few info)... Having a reproductible use case outside of the app would help.

Although we see AsyncTCP and ESPAsycnWS in the stack following user interaction it does not mean that the issue is there, but the normal processing of ESPAsyncWS could be impacted by other things running.

There are a lof of similar ones reported in https://github.com/espressif/esp-idf.

Questions, in the process to try isolate the cause:

  • is it a normal esp32dev board ?
  • you have made sure to not have heavy tasks running concurrently ?
  • no timers, interrupt handlers or code in running from iram ?
  • no psram ?
  • no heavy task allocating and deallocating on heap frequently ?
  • you checked the heap ?
  • how exactly this "save" work ? can you point to the code ? this is a normal upload handler saving a json config on disk ?
  • also to test: if this is fs related, same issue could happen during an OTA update then ? Or you are disabling some running task during an OTA to not disrupt it ?

@mathieucarbou
Copy link

Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x4202d894  PS      : 0x00060b30  A0      : 0x8202fd05  A1      : 0x3fcb8900
A2      : 0x3fcbbcd8  A3      : 0x00000000  A4      : 0x00000000  A5      : 0x3fcb9e54
A6      : 0x3fcf0b24  A7      : 0x00000000  A8      : 0x8202d8a9  A9      : 0x3fcb88e0
A10     : 0x00000018  A11     : 0x3fcba658  A12     : 0x00000000  A13     : 0x00000001
A14     : 0x00060520  A15     : 0x00000001  SAR     : 0x0000001d  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0xffffffff

Backtrace: 0x4202d891:0x3fcb8900 0x4202fd02:0x3fcb8920 0x42027fda:0x3fcb8940 0x4202800a:0x3fcb8960 0x42028015:0x3fcb8980 0x4202efe6:0x3fcb89a0 0x42030cad:0x3fcb89c0 0x4202cb17:0x3fcb89e0 0x4202cb23:0x3fcb8a00 0x420cdbb2:0x3fcb8a20 0x420cdf41:0x3fcb8a50 0x420ce257:0x3fcb8a80

ELF file SHA256: 9fc0483beb1c1f20

0x4202d891: AsyncWebHeader::~AsyncWebHeader() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:137
 (inlined by) void __gnu_cxx::new_allocator<std::_List_node<AsyncWebHeader> >::destroy<AsyncWebHeader>(AsyncWebHeader*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/ext/new_allocator.h:140
 (inlined by) void std::allocator_traits<std::allocator<std::_List_node<AsyncWebHeader> > >::destroy<AsyncWebHeader>(std::allocator<std::_List_node<AsyncWebHeader> >&, AsyncWebHeader*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/alloc_traits.h:487
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_clear() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/list.tcc:77
0x4202fd02: std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~_List_base() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:507
 (inlined by) std::__cxx11::list<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~list() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:835
 (inlined by) AsyncWebServerResponse::~AsyncWebServerResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:766
0x42027fda: AsyncAbstractResponse::~AsyncAbstractResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebResponseImpl.h:47
0x4202800a: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x42028015: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x4202efe6: AsyncWebServerRequest::~AsyncWebServerRequest() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:53 (discriminator 1)
0x42030cad: AsyncWebServer::_handleDisconnect(AsyncWebServerRequest*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebServer.cpp:138 (discriminator 1)
0x4202cb17: AsyncWebServerRequest::_onDisconnect() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:204
0x4202cb23: std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#3}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:41
 (inlined by) _M_invoke at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:297
0x420cdbb2: std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:687
0x420cdf41: AsyncClient::_error(signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:945
0x420ce257: AsyncClient::_s_error(void*, signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:1391
 (inlined by) _handle_async_event at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:184
 (inlined by) _async_service_task at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:207

Nope... Got this on the first try using v24.10.15 including the write guard.

The backtrace seems to suggest that the Async TCP Server can't handle the underlying network connection breaking, or to be more specific: Once the connection breaks, handling the disconnect triggers some kind of issue.

assert failed: tlsf_free heap_tlsf.c:965 (!block_is_free(block) && "block already marked as free")

Backtrace: 0x40377dc6:0x3fcb8650 0x4037d241:0x3fcb8670 0x40383f55:0x3fcb8690 0x4038307d:0x3fcb87c0 0x40383a48:0x3fcb87e0 0x40383b76:0x3fcb8800 0x4037863d:0x3fcb8820 0x40383f85:0x3fcb8840 0x420ca799:0x3fcb8860 0x4202d8a6:0x3fcb8880 0x4202fd02:0x3fcb88a0 0x42027fda:0x3fcb88c0 0x4202800a:0x3fcb88e0 0x42028015:0x3fcb8900 0x4202efe6:0x3fcb8920 0x42030cad:0x3fcb8940 0x4202cb17:0x3fcb8960 0x4202cb23:0x3fcb8980 0x420cdbb2:0x3fcb89a0 0x420cdf41:0x3fcb89d0 0x420ce257:0x3fcb8a00

ELF file SHA256: 9fc0483beb1c1f20
0x40377dc6: panic_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/panic.c:408
0x4037d241: esp_system_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/esp_system.c:137
0x40383f55: __assert_func at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/assert.c:85
0x4038307d: block_merge_prev at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_tlsf.c:343
 (inlined by) tlsf_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_tlsf.c:967
0x40383a48: multi_heap_free_impl at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap.c:212
0x40383b76: multi_heap_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:266
0x4037863d: heap_caps_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c:382
0x40383f85: free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/heap.c:39
0x420ca799: operator delete(void*) at /builds/idf/crosstool-NG/.build/xtensa-esp32s3-elf/src/gcc/libstdc++-v3/libsupc++/del_op.cc:49
0x4202d8a6: __gnu_cxx::new_allocator<std::_List_node<AsyncWebHeader> >::deallocate(std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/ext/new_allocator.h:125
 (inlined by) std::allocator_traits<std::allocator<std::_List_node<AsyncWebHeader> > >::deallocate(std::allocator<std::_List_node<AsyncWebHeader> >&, std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/alloc_traits.h:462
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_put_node(std::_List_node<AsyncWebHeader>*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:454
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_clear() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/list.tcc:81
0x4202fd02: std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~_List_base() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:507
 (inlined by) std::__cxx11::list<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~list() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:835
 (inlined by) AsyncWebServerResponse::~AsyncWebServerResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:766
0x42027fda: AsyncAbstractResponse::~AsyncAbstractResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebResponseImpl.h:47
0x4202800a: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x42028015: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x4202efe6: AsyncWebServerRequest::~AsyncWebServerRequest() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:53 (discriminator 1)
0x42030cad: AsyncWebServer::_handleDisconnect(AsyncWebServerRequest*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebServer.cpp:138 (discriminator 1)
0x4202cb17: AsyncWebServerRequest::_onDisconnect() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:204
0x4202cb23: std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#3}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:41
 (inlined by) _M_invoke at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:297
0x420cdbb2: std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:687
0x420cdf41: AsyncClient::_error(signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:945
0x420ce257: AsyncClient::_s_error(void*, signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:1391
 (inlined by) _handle_async_event at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:184
 (inlined by) _async_service_task at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:207

Hm, similar context. Maybe a double free?

Guru Meditation Error: Core  0 panic'ed (LoadStoreError). Exception was unhandled.

Core  0 register dump:
PC      : 0x40383cd0  PS      : 0x00060133  A0      : 0x80383b5f  A1      : 0x3fcb87d0
A2      : 0x3fcade10  A3      : 0x00000001  A4      : 0x00000004  A5      : 0xbaad5678
A6      : 0x00060920  A7      : 0x00000001  A8      : 0x43b9e1ff  A9      : 0x00000003
A10     : 0x00000001  A11     : 0x3fcade08  A12     : 0xabba1234  A13     : 0xabba1234
A14     : 0x00060120  A15     : 0x00000001  SAR     : 0x00000015  EXCCAUSE: 0x00000003
EXCVADDR: 0x43b9e1ff  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0xffffffff

Backtrace: 0x40383ccd:0x3fcb87d0 0x40383b5c:0x3fcb8800 0x4037863d:0x3fcb8820 0x40383f85:0x3fcb8840 0x420ca799:0x3fcb8860 0x4202d8a6:0x3fcb8880 0x4202fd02:0x3fcb88a0 0x42027fda:0x3fcb88c0 0x4202800a:0x3fcb88e0 0x42028015:0x3fcb8900 0x4202efe6:0x3fcb8920 0x42030cad:0x3fcb8940 0x4202cb17:0x3fcb8960 0x4202cb23:0x3fcb8980 0x420cdbb2:0x3fcb89a0 0x420cdf41:0x3fcb89d0 0x420ce257:0x3fcb8a00

ELF file SHA256: 9fc0483beb1c1f20
0x40383ccd: verify_allocated_region at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:116
0x40383b5c: multi_heap_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:258
0x4037863d: heap_caps_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c:382
0x40383f85: free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/heap.c:39
0x420ca799: operator delete(void*) at /builds/idf/crosstool-NG/.build/xtensa-esp32s3-elf/src/gcc/libstdc++-v3/libsupc++/del_op.cc:49
0x4202d8a6: __gnu_cxx::new_allocator<std::_List_node<AsyncWebHeader> >::deallocate(std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/ext/new_allocator.h:125
 (inlined by) std::allocator_traits<std::allocator<std::_List_node<AsyncWebHeader> > >::deallocate(std::allocator<std::_List_node<AsyncWebHeader> >&, std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/alloc_traits.h:462
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_put_node(std::_List_node<AsyncWebHeader>*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:454
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_clear() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/list.tcc:81
0x4202fd02: std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~_List_base() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:507
 (inlined by) std::__cxx11::list<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~list() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:835
 (inlined by) AsyncWebServerResponse::~AsyncWebServerResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:766
0x42027fda: AsyncAbstractResponse::~AsyncAbstractResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebResponseImpl.h:47
0x4202800a: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x42028015: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x4202efe6: AsyncWebServerRequest::~AsyncWebServerRequest() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:53 (discriminator 1)
0x42030cad: AsyncWebServer::_handleDisconnect(AsyncWebServerRequest*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebServer.cpp:138 (discriminator 1)
0x4202cb17: AsyncWebServerRequest::_onDisconnect() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:204
0x4202cb23: std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#3}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:41
 (inlined by) _M_invoke at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:297
0x420cdbb2: std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:687
0x420cdf41: AsyncClient::_error(signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:945
0x420ce257: AsyncClient::_s_error(void*, signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:1391
 (inlined by) _handle_async_event at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:184
 (inlined by) _async_service_task at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:207

One more for good measure. Again, similar context. Maybe @mathieucarbou is willing to take a look?

These stack traces are nearly the same:

  • lwip layer sends LWIP_TCP_ERROR event

  • which triggers AsyncTCP error handler

  • which triggers ESPAsyncWS disconnect handler, which:

    1. delete the request (in _handleDisconnect())
    2. then delete the client (in the onDisconnect cb)

    The failure always happens when freeing the response attached to the request:

AsyncWebServerRequest::~AsyncWebServerRequest() {
  _headers.clear();

  _pathParams.clear();

  if (_response != NULL) {
    delete _response;
  }

For that to happen, the created response pointer (in this case AsyncJsonResponse ) has to be sent already (request->send(response);), because this is the step that is attaching the response pointer to he request object.

So the request->send(response); was called, then the lwip error occurred.

This is important to note that request->send(response); does not send the request on the network!, because the middleware chain continues to be processed (some middleware can act on the response headers or decide to replace the response attached to a request). The commit on the network is done once the middleware chain is finished.

This is important to consider because some code like this is brittle and relied probably on the fact that the response was sent and received by the browser, which is not the case:

In WebApi_config.cpp:

    WebApi.sendJsonResponse(request, response, __FUNCTION__, __LINE__);

    Utils::removeAllFiles();
    RestartHelper.triggerRestart();

The 2 list lines will be executed, and the request will be sent once the middleware chain and request handler have finished.

Another one:

    WebApi.sendJsonResponse(request, response, __FUNCTION__, __LINE__);

    Battery.updateSettings();
    MqttHandleBatteryHass.forceUpdate();

    // potentially make SoC thresholds auto-discoverable
    MqttHandlePowerLimiterHass.forceUpdate();

Here, the last lines will execute BEFORE the request is sent.

All the things executed AFTER a response is attached to a request have to be carefully written in order to not have some impacts on any pointer the response could still reference.

In the case of a json response, in some use cases, ArduinoJson won't do a copy but just point to the existing pointers (but this is not the issue we saw).

In the issue we saw above, the lwip layer fails (wifi disconnect or something else) which triggers the response deletion.

  • It would be helpful to put some logs in the OpendTU handlers to know which handler created the response which fails to be deleted

  • It also would be helpful to put some logs in the response destructor to check whether it is called twice and from who if possible.

  • It would be helpful to put some logs before delete is called on a response. There is only some places where a response is deleted:

  • In void AsyncWebServerRequest::send(AsyncWebServerResponse* response) { (when a response is replaced by another one)

  • In AsyncWebServerRequest::~AsyncWebServerRequest() {

  • In these 2 places:

void AsyncWebServerRequest::_onPoll() {
  // os_printf("p\n");
  if (_response != NULL && _client != NULL && _client->canSend()) {
    if (!_response->_finished()) {
      _response->_ack(this, 0, 0);
    } else {
      AsyncWebServerResponse* r = _response;
      _response = NULL;
      delete r;

      _client->close();
    }
  }
}

void AsyncWebServerRequest::_onAck(size_t len, uint32_t time) {
  // os_printf("a:%u:%u\n", len, time);
  if (_response != NULL) {
    if (!_response->_finished()) {
      _response->_ack(this, len, time);
    } else if (_response->_finished()) {
      AsyncWebServerResponse* r = _response;
      _response = NULL;
      delete r;

      _client->close();
    }
  }
}
``

I suspect this might be the issue...

@mathieucarbou
Copy link

mathieucarbou commented Oct 23, 2024

@schlimmchen :

In WebRequest.cpp: could you please add, just after the delete: _response = NULL;

AsyncWebServerRequest::~AsyncWebServerRequest() {
  _headers.clear();

  _pathParams.clear();

  if (_response != NULL) {
    delete _response;
    _response = NULL;
  }

or (better):

  AsyncWebServerResponse* r = _response
  _response = NULL;
  delete r;

To be sure the response is not used by any of the callbacks above.

@schlimmchen
Copy link
Contributor Author

Thanks for looking into this. I see you spent quite some of your time, thanks!

I could not yet fully understand your longer comment. I'll re-read it again later. What I do understand is that one has to be careful when sending a response (actually queuing sending a response) but executing code in the same context, which runs before the response was actually sent. I know about the example you gave, where the ESP is restarted. I did not dare to question it, but indeed it seems that sending the response and restarting the ESP (be it with a delay or not) is something of a race. What we would actually like to do is wait for the response to be sent, then trigger the reboot.

The reason I asked whether you would want to have a look is that I suspect that you are interested in making ESPAsyncWebServer rebust against the issue we are looking at, even if we are using it in a questionable manner. Assuming that something can be done in the lib...

The changes you proposed unfortunatly don't prevent the issue from being triggered.

[SysCORRUPT HEAP: Bad tail at 0x44862726. Expected 0xbaad5678 got 0x00000000

assert failed: multi_heap_free multi_heap_poisoning.c:259 (head != NULL)

Backtrace: 0x403780f2:0x3fcbf7f0 0x4037d651:0x3fcbf810 0x4038441d:0x3fcbf830 0x40384035:0x3fcbf960 0x40378969:0x3fcbf980 0x4038444d:0x3fcbf9a0 0x421063e1:0x3fcbf9c0 0x4205d64e:0x3fcbf9e0 0x4205fa3e:0x3fcbfa00 0x42057a1e:0x3fcbfa20 0x42057a4e:0x3fcbfa40 0x42057a59:0x3fcbfa60 0x4205ed7a:0x3fcbfa80 0x420609e9:0x3fcbfaa0 0x4205c8bf:0x3fcbfac0 0x4205c8cb:0x3fcbfae0 0x42109af6:0x3fcbfb00 0x42109e85:0x3fcbfb30 0x4210a19b:0x3fcbfb60

ELF file SHA256: 5c3b1417a0d70e10
0x403780f2: panic_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/panic.c:408
0x4037d651: esp_system_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/esp_system.c:137
0x4038441d: __assert_func at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/assert.c:85
0x40384035: multi_heap_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:259 (discriminator 1)
0x40378969: heap_caps_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c:382
0x4038444d: free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/heap.c:39
0x421063e1: operator delete(void*) at /builds/idf/crosstool-NG/.build/xtensa-esp32s3-elf/src/gcc/libstdc++-v3/libsupc++/del_op.cc:49
0x4205d64e: __gnu_cxx::new_allocator<std::_List_node<AsyncWebHeader> >::deallocate(std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/ext/new_allocator.h:125
 (inlined by) std::allocator_traits<std::allocator<std::_List_node<AsyncWebHeader> > >::deallocate(std::allocator<std::_List_node<AsyncWebHeader> >&, std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/alloc_traits.h:462
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_put_node(std::_List_node<AsyncWebHeader>*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:454
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_clear() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/list.tcc:81
0x4205fa3e: std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~_List_base() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:507
 (inlined by) std::__cxx11::list<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~list() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:835
 (inlined by) AsyncWebServerResponse::~AsyncWebServerResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:766
0x42057a1e: AsyncAbstractResponse::~AsyncAbstractResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebResponseImpl.h:47
0x42057a4e: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x42057a59: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x4205ed7a: AsyncWebServerRequest::~AsyncWebServerRequest() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:55
0x420609e9: AsyncWebServer::_handleDisconnect(AsyncWebServerRequest*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebServer.cpp:138 (discriminator 1)
0x4205c8bf: AsyncWebServerRequest::_onDisconnect() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:206
0x4205c8cb: std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#3}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:41
 (inlined by) _M_invoke at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:297
0x42109af6: std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:687
0x42109e85: AsyncClient::_error(signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:945
0x4210a19b: AsyncClient::_s_error(void*, signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:1391
 (inlined by) _handle_async_event at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:184
 (inlined by) _async_service_task at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:207
Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.

Core  1 register dump:
PC      : 0x4205d63c  PS      : 0x00060630  A0      : 0x8205fa45  A1      : 0x3fcbfa30
A2      : 0x3fcc1ce0  A3      : 0x00000000  A4      : 0x00000000  A5      : 0x3fcc1ff8
A6      : 0x3fcf0b24  A7      : 0x00000000  A8      : 0x8205d651  A9      : 0x3fcbfa10
A10     : 0x00000018  A11     : 0x3fcc11d4  A12     : 0x00000000  A13     : 0x00000001
A14     : 0x00060020  A15     : 0x00000001  SAR     : 0x00000015  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0xffffffff

Backtrace: 0x4205d639:0x3fcbfa30 0x4205fa42:0x3fcbfa50 0x42057a1e:0x3fcbfa70 0x42057a4e:0x3fcbfa90 0x42057a59:0x3fcbfab0 0x4205ed7a:0x3fcbfad0 0x420609ed:0x3fcbfaf0 0x4205c8bf:0x3fcbfb10 0x4205c8cb:0x3fcbfb30 0x42109afa:0x3fcbfb50 0x42109e89:0x3fcbfb80 0x4210a19f:0x3fcbfbb0

ELF file SHA256: 4fbad0611da5a1f1
0x4205d639: AsyncWebHeader::~AsyncWebHeader() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:137
 (inlined by) void __gnu_cxx::new_allocator<std::_List_node<AsyncWebHeader> >::destroy<AsyncWebHeader>(AsyncWebHeader*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/ext/new_allocator.h:140
 (inlined by) void std::allocator_traits<std::allocator<std::_List_node<AsyncWebHeader> > >::destroy<AsyncWebHeader>(std::allocator<std::_List_node<AsyncWebHeader> >&, AsyncWebHeader*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/alloc_traits.h:487
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_clear() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/list.tcc:77
0x4205fa42: std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~_List_base() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:507
 (inlined by) std::__cxx11::list<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~list() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:835
 (inlined by) AsyncWebServerResponse::~AsyncWebServerResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:766
0x42057a1e: AsyncAbstractResponse::~AsyncAbstractResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebResponseImpl.h:47
0x42057a4e: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x42057a59: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x4205ed7a: AsyncWebServerRequest::~AsyncWebServerRequest() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:53 (discriminator 1)
0x420609ed: AsyncWebServer::_handleDisconnect(AsyncWebServerRequest*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebServer.cpp:138 (discriminator 1)
0x4205c8bf: AsyncWebServerRequest::_onDisconnect() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:207
0x4205c8cb: std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#3}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:41
 (inlined by) _M_invoke at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:297
0x42109afa: std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:687
0x42109e89: AsyncClient::_error(signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:945
0x4210a19f: AsyncClient::_s_error(void*, signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:1391
 (inlined by) _handle_async_event at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:184
 (inlined by) _async_service_task at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:207

For the record: I edited the file .pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp in-place and re-compiled, which gave me:

[...]
Building in release mode
Compiling .pio/build/generic_esp32s3_usb/lib487/ESPAsyncWebServer/WebRequest.cpp.o
Archiving .pio/build/generic_esp32s3_usb/lib487/libESPAsyncWebServer.a
Indexing .pio/build/generic_esp32s3_usb/lib487/libESPAsyncWebServer.a
Linking .pio/build/generic_esp32s3_usb/firmware.elf
Retrieving maximum program size .pio/build/generic_esp32s3_usb/firmware.elf
[...]

I assume that's okay for a quick check?

@mathieucarbou
Copy link

I assume that's okay for a quick check?

Yes! That will do.

In AsyncWebServer::_handleDisconnect: could you please add some log_e() or ESP_LOGE calls to log some info about the request object: method, path for example. To know which request it is, from which handler.

Except if you know it already ?

@mathieucarbou
Copy link

I have released mathieucarbou/ESPAsyncWebServer @ 3.3.19 which includes a bunch of cleanup around file hierarchy and virtual destructors but pretty sure it won't impact the issue you saw.

Let mw know when you'll have more logs to pinpoint what causes the issue :-)

@stefan123t
Copy link
Contributor

@mathieucarbou thanks for your pointers to look closely at, this is my little bed-time crime story 🔍 🧐 🎩 for the evening

I am still trying to understand what exactly happens,
when you make that copy of the pointer to the _response object,
set it to NULL and then delete the reference ?

  AsyncWebServerResponse* r = _response
  _response = NULL;
  delete r;

You do that exactly the same way in the following three locations:

But before you send it you do something else:

void AsyncWebServerRequest::send(AsyncWebServerResponse* response) {
  if (_sent)
    return;
  if (_response)
    delete _response;
  _response = response;
  if (_response == NULL) {
    _client->close(true);
    _onDisconnect();
    _sent = true;
    return;
  }
  if (!_response->_sourceValid())
    send(500);
}

I also noticed that it always complains about being unable to free the heap memory for a std::list item of type _List_base<AsyncWebHeader> _headers when trying to call the List's destructor ::~list() of the AsyncWebServerResponse.

    std::list<AsyncWebHeader> _headers;

Though I found only explicit code for clearing the memory of such a std::list<AsyncWebHeader> _headers in the AsyncWebServerRequest.

    // Remove a header from the request.
    // It will free the memory and prevent the header to be seen during request processing.
    bool removeHeader(const char* name);
    // Remove all request headers.
    void removeHeaders() { _headers.clear(); }

Could it be that we are left with a dangling pointer to this / actually no valid AsyncWebHeader list through the above NULL and delete operations ?

@mathieucarbou
Copy link

mathieucarbou commented Oct 23, 2024

set it to NULL and then delete the reference ?

we are in the destructor, so the idea is to set the ref to the response to null ASAP in case we have some code elsewhere that could still see this pointer (this is the case for the 2 other callbacks).

Then, once this is done, we can trigger the object deletion (which can take some time), but at least during this time the response in the request will be null.

You do that exactly the same way in the following three locations:

Yes, this is to have the pointer set to null asap, then free after.

void AsyncWebServerRequest::send(AsyncWebServerResponse* response) {
  if (_sent)
    return;
  if (_response)
    delete _response;
  _response = response;
  if (_response == NULL) {
    _client->close(true);
    _onDisconnect();
    _sent = true;
    return;
  }
  if (!_response->_sourceValid())
    send(500);
}

This code is just a feature to swap the response by another one. You are not using that. A middleware could decide to change a response that was set by a handler. So if a response was set, we delete it, then set the new one. This operation happens during the middleware chain processing (just after the handler and before the requests is sent on the network). So it is not the cause of the issue here.

I also noticed that it always complains about being unable to free the heap memory for a std::list item of type _List_base _headers when trying to call the List's destructor ::~list() of the AsyncWebServerResponse.

That is exactly what I also find strange...
Deleting the response object is a long process (as you see in the call stack), so I suspect that somewhere another deletion is triggered... That's my only explanation for now because this list: std::list<AsyncWebHeader> _headers; is really a dumb list, as dumb as having std::list<String> _headers;. AsyncWebHeader is just a holder object with 2 strings inside.

Could it be that we are left with a dangling pointer to this / actually no valid AsyncWebHeader list through the above NULL and delete operations ?

Not in the case of this list: AsyncWebHeader does not need to any destructor because this is a holder object of 2 strings and the list is storing objects by copying the value in a new instance held in the node (which is freed at node destruction).

But the way a linked list work, is by pointing to the next structure, so when the object is freed from memory, each node are freed and this is a longer operation, compared to just remove an array from memory if we got a vector instead. The issue with a vector is that it requires reallocation.

@schlimmchen should try to add some logs to know what is being executed (which request when it fails and also log before the delete calls on a response), also, point to the new version :-) We will have more info.

It is possible that when the lwip layer sends an error (following network issue, com broken, etc) then there is a concurrency issue happening.

@tbnobody
Copy link
Owner

@schlimmchen could you maybe test this patch?

diff --git a/src/NetworkSettings.cpp b/src/NetworkSettings.cpp
index f545ac37..3067281f 100644
--- a/src/NetworkSettings.cpp
+++ b/src/NetworkSettings.cpp
@@ -28,8 +28,6 @@ void NetworkSettingsClass::init(Scheduler& scheduler)
     WiFi.setScanMethod(WIFI_ALL_CHANNEL_SCAN);
     WiFi.setSortMethod(WIFI_CONNECT_AP_BY_SIGNAL);

-    WiFi.disconnect(true, true);
-
     WiFi.onEvent(std::bind(&NetworkSettingsClass::NetworkEvent, this, _1, _2));

     if (PinMapping.isValidW5500Config()) {
@@ -101,7 +99,6 @@ void NetworkSettingsClass::NetworkEvent(const WiFiEvent_t event, WiFiEventInfo_t
         MessageOutput.printf("WiFi disconnected: %" PRIu8 "\r\n", info.wifi_sta_disconnected.reason);
         if (_networkMode == network_mode::WiFi) {
             MessageOutput.println("Try reconnecting");
-            WiFi.disconnect(true, false);
             WiFi.begin();
             raiseEvent(network_event::NETWORK_DISCONNECTED);
         }

It reverts #2117 ....... at least for me, the reconnect works more reliable without these two lines and I also don't get an exception any more when changing network settings.

@schlimmchen
Copy link
Contributor Author

Interesting. One click of the button and the ESP32 restarted. So I guess this is still reproducible. Second try needed a couple more clicks, but still very easy.

After applying the patch, I got these:

assert failed: heap_caps_free heap_caps.c:381 (heap != NULL && "free() target pointer is outside heap areas")
---
assert failed: multi_heap_free multi_heap_poisoning.c:259 (head != NULL)
---
assert failed: tcp_update_rcv_ann_wnd IDF/components/lwip/lwip/src/core/tcp.c:951 (new_rcv_ann_wnd <= 0xffff)

Looks to me like the issue persists despite the patch...

I only clicked the Save button repeatedly in the network settings UI. Hardware: OpenDTU Fusion with PoE Add-On and wired Ethernet in use. Software: OpenDTU Development build. Should I retry without wired Ethernet and with vanilla OpenDTU?

@mathieucarbou
Copy link

Not again 😱 !

@schlimmchen : would you be able to add the complete stack traces again and point me to the current project and code again ?

Did you verify in the pio build that you didn't have some duplicate asynctcp libs (following the organization change to ESP32Async) ?

@schlimmchen
Copy link
Contributor Author

I can do more tests and provide full backtraces, using the upstream project's code. Not today, but eventually.

Did you verify in the pio build that you didn't have some duplicate asynctcp libs (following the organization change to ESP32Async) ?

Errr... I am still on the "old" libs. Thomas did not publish the updated platformio.ini yet. And I don't know what needs to be done, to be honest. Can you help me out? I only see this:

lib_deps =
    mathieucarbou/ESPAsyncWebServer @ 3.6.0

Update this to the lib at the org and that is it? Do I need to clean something? I would expect that platformio does the rest...

@mathieucarbou
Copy link

Now it is:

lib_deps =
    ESP32Async/ESPAsyncWebServer @ 3.6.0

It is just that with the transitive resolving, I wanted to know if you happen to have some duplicates of AsyncTCP lib in the libdeps folder ;-)

@tbnobody
Copy link
Owner

tbnobody commented Jan 23, 2025

To be honest, I am not even sure if this issue is related to the webserver or asynctcp.... I applied the patch mentioned above and did not get any stack traces any more. could also be a random behavior on my side.

Edit: Ok you are right, when pressing the save button a lot of times, the crash happens again

23:31:21.276 > assert failed: tcp_update_rcv_ann_wnd IDF/components/lwip/lwip/src/core/tcp.c:951 (new_rcv_ann_wnd <= 0xffff)

23:31:21.285 > 

23:31:21.285 > 

23:31:21.285 > Backtrace: 0x40083bf5:0x3ffb2db0 0x4008e5a5:0x3ffb2dd0 0x40093e95:0x3ffb2df0 0x40113c32:0x3ffb2f20 0x40113ce0:0x3ffb2f40 0x4016f8f6:0x3ffb2f60 0x4010e938:0x3ffb2f80

23:31:21.504 >   #0  0x40083bf5 in panic_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/panic.c:408
23:31:21.504 >   #1  0x4008e5a5 in esp_system_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/esp_system.c:137
23:31:21.504 >   #2  0x40093e95 in __assert_func at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/assert.c:85
23:31:21.504 >   #3  0x40113c32 in tcp_update_rcv_ann_wnd at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/tcp.c:951

23:31:21.504 >       (inlined by) tcp_update_rcv_ann_wnd at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/tcp.c:931
23:31:21.504 >   #4  0x40113ce0 in tcp_recved at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/tcp.c:991
23:31:21.504 >   #5  0x4016f8f6 in _tcp_recved_api(tcpip_api_call_data*) at .pio/libdeps/generic_esp32/AsyncTCP/src/AsyncTCP.cpp:563
23:31:21.504 >   #6  0x4010e938 in tcpip_thread_handle_msg at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/api/tcpip.c:172

23:31:21.504 >       (inlined by) tcpip_thread at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/api/tcpip.c:154

@mathieucarbou
Copy link

mathieucarbou commented Jan 23, 2025

Edit: Ok you are right, when pressing the save button a lot of times, the crash happens again

When you say that: are you actually triggering concurrent ajax requests ? Or the JS layer makes sure they are sent in sequence and wait for the first one to return before issuing the second one ?

I am asking that because if you have a long-running operation somewhere, could the second request start to execute and when building the json payload, have an effect on the first one ?

It's like you are missing some stack trace - there is no lines around the app code.

@tbnobody
Copy link
Owner

Now I waited ~10sec before pressing the button again (after this time the function has definetly completed its job). Got a little bit different stack trace

23:38:50.919 > Backtrace: 0x40083bf5:0x3ffd7070 0x4008e5a5:0x3ffd7090 0x40093e95:0x3ffd70b0 0x40093a91:0x3ffd71e0 0x40084165:0x3ffd7200 0x40093ec5:0x3ffd7220 0x400f40a5:0x3ffd7240 0x400f40b5:0x3ffd7260 0x400e98c5:0x3ffd7280 0x400e9916:0x3ffd72a0 0x400e9946:0x3ffd72c0 0x400e9951:0x3ffd72e0 0x400edce2:0x3ffd7300 0x400ef681:0x3ffd7320 0x400eba2f:0x3ffd7340 0x400eba3b:0x3ffd7360 0x40170236:0x3ffd7380 0x401705c1:0x3ffd73b0 0x4017084b:0x3ffd73e0

23:38:51.523 >   #0  0x40083bf5 in panic_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/panic.c:408
23:38:51.523 >   #1  0x4008e5a5 in esp_system_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/esp_system.c:137
23:38:51.523 >   #2  0x40093e95 in __assert_func at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/assert.c:85
23:38:51.523 >   #3  0x40093a91 in multi_heap_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:259 (discriminator 1)
23:38:51.523 >   #4  0x40084165 in heap_caps_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c:382
23:38:51.523 >   #5  0x40093ec5 in free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/heap.c:39
23:38:51.523 >   #6  0x400f40a5 in String::invalidate() at C:/Users/Thomas/.platformio/packages/framework-arduinoespressif32/cores/esp32/WString.cpp:158
23:38:51.523 >   #7  0x400f40b5 in String::~String() at C:/Users/Thomas/.platformio/packages/framework-arduinoespressif32/cores/esp32/WString.cpp:142
23:38:51.523 >   #8  0x400e98c5 in AsyncWebHeader::~AsyncWebHeader() at .pio/libdeps/generic_esp32/ESPAsyncWebServer/src/ESPAsyncWebServer.h:143

23:38:51.523 >       (inlined by) void __gnu_cxx::new_allocator<std::_List_node<AsyncWebHeader> >::destroy<AsyncWebHeader>(AsyncWebHeader*) at c:\users\thomas\.platformio\packages\[email protected]+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\ext/new_allocator.h:140

23:38:51.523 >       (inlined by) void std::allocator_traits<std::allocator<std::_List_node<AsyncWebHeader> > >::destroy<AsyncWebHeader>(std::allocator<std::_List_node<AsyncWebHeader> >&, AsyncWebHeader*) at c:\users\thomas\.platformio\packages\[email protected]+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/alloc_traits.h:487

23:38:51.523 >       (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_clear() at c:\users\thomas\.platformio\packages\[email protected]+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/list.tcc:77

23:38:51.523 >       (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~_List_base() at c:\users\thomas\.platformio\packages\[email protected]+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/stl_list.h:507

23:38:51.523 >       (inlined by) std::__cxx11::list<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~list() at c:\users\thomas\.platformio\packages\[email protected]+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/stl_list.h:835

23:38:51.523 >       (inlined by) AsyncWebServerResponse::~AsyncWebServerResponse() at .pio/libdeps/generic_esp32/ESPAsyncWebServer/src/ESPAsyncWebServer.h:784
23:38:51.523 >   #9  0x400e9916 in AsyncAbstractResponse::~AsyncAbstractResponse() at .pio/libdeps/generic_esp32/ESPAsyncWebServer/src/WebResponseImpl.h:70
23:38:51.523 >   #10 0x400e9946 in AsyncJsonResponse::~AsyncJsonResponse() at .pio/libdeps/generic_esp32/ESPAsyncWebServer/src/AsyncJson.h:58
23:38:51.523 >   #11 0x400e9951 in AsyncJsonResponse::~AsyncJsonResponse() at .pio/libdeps/generic_esp32/ESPAsyncWebServer/src/AsyncJson.h:58
23:38:51.523 >   #12 0x400edce2 in AsyncWebServerRequest::~AsyncWebServerRequest() at .pio/libdeps/generic_esp32/ESPAsyncWebServer/src/WebRequest.cpp:54 (discriminator 1)
23:38:51.523 >   #13 0x400ef681 in AsyncWebServer::_handleDisconnect(AsyncWebServerRequest*) at .pio/libdeps/generic_esp32/ESPAsyncWebServer/src/WebServer.cpp:137 (discriminator 1)
23:38:51.523 >   #14 0x400eba2f in AsyncWebServerRequest::_onDisconnect() at .pio/libdeps/generic_esp32/ESPAsyncWebServer/src/WebRequest.cpp:225
23:38:51.523 >   #15 0x400eba3b in std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#3}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at .pio/libdeps/generic_esp32/ESPAsyncWebServer/src/WebRequest.cpp:41

23:38:51.523 >       (inlined by) _M_invoke at c:\users\thomas\.platformio\packages\[email protected]+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/std_function.h:297
23:38:51.523 >   #16 0x40170236 in std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at c:\users\thomas\.platformio\packages\[email protected]+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/std_function.h:687
23:38:51.523 >   #17 0x401705c1 in AsyncClient::_error(signed char) at .pio/libdeps/generic_esp32/AsyncTCP/src/AsyncTCP.cpp:1039
23:38:51.523 >   #18 0x4017084b in AsyncClient::_s_error(void*, signed char) at .pio/libdeps/generic_esp32/AsyncTCP/src/AsyncTCP.cpp:1503

23:38:51.523 >       (inlined by) _handle_async_event at .pio/libdeps/generic_esp32/AsyncTCP/src/AsyncTCP.cpp:284

23:38:51.523 >       (inlined by) _async_service_task at .pio/libdeps/generic_esp32/AsyncTCP/src/AsyncTCP.cpp:307

@tbnobody
Copy link
Owner

Now with time stamps.... Every time you see WiFi got ip: 192.168.20.146 the async callback should be finished....

23:43:47.970 > Setting Hostname... done
23:43:47.990 > [296282][E][WiFiUdp.cpp:221] parsePacket(): could not receive data: 128
23:43:47.995 > Configuring WiFi STA using existing credentials... done
23:43:48.001 > Configuring WiFi STA DHCP IP... done
23:43:48.089 > WiFi got ip: 192.168.20.146
23:43:57.732 > Admin AP remaining seconds: 10 / 180
23:43:58.145 > [306438][E][WiFiUdp.cpp:221] parsePacket(): could not receive data: 128
23:43:58.150 > Setting Hostname... done
23:43:58.166 > [306458][E][WiFiUdp.cpp:221] parsePacket(): could not receive data: 9
23:43:58.171 > Configuring WiFi STA using existing credentials... done
23:43:58.177 > Configuring WiFi STA DHCP IP... done
23:43:58.312 > WiFi got ip: 192.168.20.146
23:44:07.741 > Admin AP remaining seconds: 10 / 180
23:44:10.876 > [319169][E][WiFiUdp.cpp:221] parsePacket(): could not receive data: 128
23:44:10.882 > Setting Hostname... done
23:44:10.889 > [319182][E][WiFiUdp.cpp:221] parsePacket(): could not receive data: 128
23:44:10.895 > Configuring WiFi STA using existing credentials... done
23:44:10.901 > Configuring WiFi STA DHCP IP... done
23:44:11.009 > WiFi got ip: 192.168.20.146
23:44:18.841 > [327133][E][WiFiUdp.cpp:221] parsePacket(): could not receive data: 128
23:44:18.846 > Setting Hostname... done
23:44:18.858 > [327152][E][WiFiUdp.cpp:221] parsePacket(): could not receive data: 128
23:44:18.865 > Configuring WiFi STA using existing credentials... done
23:44:18.870 > Configuring WiFi STA DHCP IP... done
23:44:19.003 > WiFi got ip: 192.168.20.146
23:44:24.734 > [333026][E][WiFiUdp.cpp:221] parsePacket(): could not receive data: 128
23:44:24.740 > Setting Hostname... done
23:44:24.753 > [333045][E][WiFiUdp.cpp:221] parsePacket(): could not receive data: 128
23:44:24.759 > Configuring WiFi STA using existing credentials... done
23:44:24.764 > Configuring WiFi STA DHCP IP... done
23:44:24.862 > WiFi got ip: 192.168.20.146
23:44:31.778 > [340071][E][WiFiUdp.cpp:221] parsePacket(): could not receive data: 128
23:44:31.785 > Setting Hostname... done
23:44:31.798 > [340091][E][WiFiUdp.cpp:221] parsePacket(): could not receive data: 128
23:44:31.804 > Configuring WiFi STA using existing credentials... done
23:44:31.809 > Configuring WiFi STA DHCP IP... done
23:44:31.919 > WiFi got ip: 192.168.20.146
23:44:38.474 > [346766][E][WiFiUdp.cpp:221] parsePacket(): could not receive data: 128
23:44:38.480 > Setting Hostname... done
23:44:38.489 > [346781][E][WiFiUdp.cpp:221] parsePacket(): could not receive data: 128
23:44:38.495 > Configuring WiFi STA using existing credentials... done
23:44:38.500 > Configuring WiFi STA DHCP IP... done
23:44:38.519 >
23:44:38.520 > assert failed: tcp_update_rcv_ann_wnd IDF/components/lwip/lwip/src/core/tcp.c:951 (new_rcv_ann_wnd <= 0xffff)
23:44:38.528 >
23:44:38.528 >
23:44:38.528 > Backtrace: 0x40083bf5:0x3ffb2db0 0x4008e5a5:0x3ffb2dd0 0x40093e95:0x3ffb2df0 0x40113c32:0x3ffb2f20 0x40113ce0:0x3ffb2f40 0x4016f8f6:0x3ffb2f60 0x4010e938:0x3ffb2f80
23:44:38.722 >   #0  0x40083bf5 in panic_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/panic.c:408    
23:44:38.722 >   #1  0x4008e5a5 in esp_system_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/esp_system.c:137
23:44:38.722 >   #2  0x40093e95 in __assert_func at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/assert.c:85      
23:44:38.722 >   #3  0x40113c32 in tcp_update_rcv_ann_wnd at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/tcp.c:951
23:44:38.722 >       (inlined by) tcp_update_rcv_ann_wnd at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/tcp.c:931
23:44:38.722 >   #4  0x40113ce0 in tcp_recved at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/tcp.c:991
23:44:38.722 >   #5  0x4016f8f6 in _tcp_recved_api(tcpip_api_call_data*) at .pio/libdeps/generic_esp32/AsyncTCP/src/AsyncTCP.cpp:563
23:44:38.722 >   #6  0x4010e938 in tcpip_thread_handle_msg at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/api/tcpip.c:172
23:44:38.722 >       (inlined by) tcpip_thread at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/api/tcpip.c:154
23:44:38.722 >
23:44:38.722 >
23:44:38.722 >

@mathieucarbou
Copy link

I don't understand... Why there are so many network interface resets with the same IP ?

@tbnobody
Copy link
Owner

tbnobody commented Jan 23, 2025

Every time I press the save button the wifi parameters will be applied again (ssid, password), the wifi mode will be updated (when saving the config, the internal access point will be activated) and the ip configuration will be applied (dhcp or static)
Therefor also a new dhcp lease will be obtained.

Tomorrow I will check what happens if I execute this network specific stuff in the main loop.

@mathieucarbou
Copy link

mathieucarbou commented Jan 23, 2025

Before resetting the network layer, are you making sure to stop all network services, server included, and start them back again ? Or, easier, just save and restart the ESP ?

@mathieucarbou
Copy link

I am able to reproduce also with the code from main branch...

@mathieucarbou
Copy link

mathieucarbou commented Jan 24, 2025

I found the issue and how to fix it.

I think it is a combination of several problems:

  1. It's a good practise to stop the network services before reconfigure the interfaces, which is what's done here. So calling _server.end(); and _server.begin();
  2. There is a conflict of ownership for the network reconfiguration: it is both handled by the loop in NetworkSettings.cpp and the calls to reconfigure at the end of WebApi_network.cpp
  3. a network interface reconfiguration is triggerred while within the context of an async_request (so a server socket is binded, plus a client pcb is allocated for the request), so the request client was erroring when the request was sent back and at the same time the network reconfiguration was happening.

Possible refactorings:

  1. Do not call any network reconfiguration from within the context of a request: let it end first => can be moved in the NetworkSettings.cpp loop
  2. add calls to server end() and begin()

Something like that (patch for main), but to be improved because I have statically exposed things:

diff --git a/include/NetworkSettings.h b/include/NetworkSettings.h
index 90d3962b..43effc31 100644
--- a/include/NetworkSettings.h
+++ b/include/NetworkSettings.h
@@ -56,6 +56,8 @@ public:
 
     bool onEvent(DtuNetworkEventCb cbEvent, const network_event event = network_event::NETWORK_EVENT_MAX);
     void raiseEvent(const network_event event);
+    
+    static uint32_t reconfigure;
 
 private:
     void loop();
diff --git a/include/WebApi.h b/include/WebApi.h
index 6e85bafd..6deb15f1 100644
--- a/include/WebApi.h
+++ b/include/WebApi.h
@@ -33,6 +33,9 @@ public:
     void init(Scheduler& scheduler);
     void reload();
 
+    void beginServer() { _server.begin(); }
+    void endServer() { _server.end(); }
+
     static bool checkCredentials(AsyncWebServerRequest* request);
     static bool checkCredentialsReadonly(AsyncWebServerRequest* request);
 
diff --git a/src/MqttSettings.cpp b/src/MqttSettings.cpp
index 39e76e22..83106c52 100644
--- a/src/MqttSettings.cpp
+++ b/src/MqttSettings.cpp
@@ -6,6 +6,8 @@
 #include "Configuration.h"
 #include "MessageOutput.h"
 
+#include "WebApi.h"
+
 MqttSettingsClass::MqttSettingsClass()
 {
 }
@@ -16,6 +18,7 @@ void MqttSettingsClass::NetworkEvent(network_event event)
     case network_event::NETWORK_GOT_IP:
         MessageOutput.println("Network connected");
         performConnect();
+        WebApi.beginServer();
         break;
     case network_event::NETWORK_DISCONNECTED:
         MessageOutput.println("Network lost connection");
diff --git a/src/NetworkSettings.cpp b/src/NetworkSettings.cpp
index f545ac37..a0fe5e81 100644
--- a/src/NetworkSettings.cpp
+++ b/src/NetworkSettings.cpp
@@ -12,6 +12,10 @@
 #include <ESPmDNS.h>
 #include <ETH.h>
 
+#include "WebApi.h"
+
+uint32_t NetworkSettingsClass::reconfigure = 0;
+
 NetworkSettingsClass::NetworkSettingsClass()
     : _loopTask(TASK_IMMEDIATE, TASK_FOREVER, std::bind(&NetworkSettingsClass::loop, this))
     , _apIp(192, 168, 4, 1)
@@ -205,6 +209,12 @@ String NetworkSettingsClass::getApName() const
 
 void NetworkSettingsClass::loop()
 {
+    if(reconfigure && reconfigure < millis()) {
+        reconfigure = 0;
+        enableAdminMode();
+        applyConfig();
+    }
+
     if (_ethConnected) {
         if (_networkMode != network_mode::Ethernet) {
             // Do stuff when switching to Ethernet mode
@@ -278,6 +288,8 @@ void NetworkSettingsClass::loop()
 
 void NetworkSettingsClass::applyConfig()
 {
+    WebApi.endServer();
+
     setHostname();
     if (!strcmp(Configuration.get().WiFi.Ssid, "")) {
         return;
diff --git a/src/WebApi_network.cpp b/src/WebApi_network.cpp
index 51db32e4..4c0b4881 100644
--- a/src/WebApi_network.cpp
+++ b/src/WebApi_network.cpp
@@ -204,6 +204,6 @@ void WebApiNetworkClass::onNetworkAdminPost(AsyncWebServerRequest* request)
 
     WebApi.sendJsonResponse(request, response, __FUNCTION__, __LINE__);
 
-    NetworkSettings.enableAdminMode();
-    NetworkSettings.applyConfig();
+    // delay reconfigure in 500ms
+    NetworkSettings.reconfigure = millis() + 500;
 }

Note: a UI spinner would be required though.

But frankly, reconfiguring important things like the network should IMO trigger an ESP reboot for safety...

Even with such refactoring, we can see that the code is doing weird things because it goes again through the captive portal: on mac, when clicking on save, it goes in AP_STA mode and the captive portal reopens, while at the same time the DHCP triggers.

If already connected to WiFi, we should stay in WiFi mode I think, and AP mode should only spawned after a timeout.

I know how these things are complicated: I've written a network manager handling AP / CP / WiFi / ETH here: https://github.com/mathieucarbou/MycilaESPConnect and it is hard to have something configurable while the app keeps running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants