-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network stack never recovers without a hard reset after receiving closely-spaced larger UDP packets #2899
Comments
As I already wrote in the other thread - this is most probably not solely UDP related, as I'm getting the same behavior even with TCP. And packet cadence is the only thing that seems to matter here. The problem looks to be related to packet reception / buffering (memory leak?), as the free heap size drops a non-trivial amount when this permanent "choke" happens.. A few pointers: Oh yea - and the issue should be best moved over to the ESP-IDF repo, as the source of the problem most probably lies there. |
Yes, you're right. I should move this to ESP-IDF. Yeah, I noticed the fragmentation and surmised in #2871 that this might be the area where some memory corruption occurs. (Or, hrm... maybe I didn't, but I meant to.) |
I just filed this bug: espressif/esp-idf#3646 |
@r1dd1ck Do you feel like weighing in to the esp-idf bug with things you've tried? I can't be the only one seeing these issues... |
So the commenter at espressif/esp-idf#3646 is suggesting that the iperf example handles large bandwidth just fine, and that it's possible the Arduino sdkconfig options need to be tweaked. |
@ssilverman But the point is, that it should not be possible to |
@ssilverman Your test works fine after reverting this commit or by changing your test https://gist.github.com/negativekelvin/877d8f285e61583956ee3d0f8c8072bf/revisions |
Changing my test to use the copy instead of the reference, per that gist change, results in this:
As well, looking at the source my PlatformIO install is using: AsyncUDP.cpp has code similar to before that commit, where the |
Hmm well it is discussed here #2685 Your code runs forever for me with that change, latest Arduino as idf component |
I'm running the 1.0.2 core from the Arduino IDE. Which core and IDE are you using? Before reading the following, I'd like you to try this: Use the latest Arduino 1.8.9 with the 1.0.2 ESP32 core. Or PlatformIO with the latest ESP32 core. Then change the And I'm not sure what you mean by "latest Arduino as idf component"? Are you saying you built your own core and running it from PlatformIO/Arduino, or are you saying you built arduino-esp32 and are running it independently from somewhere else? Help me out here so that we can figure this out. "It works for me" doesn't help solve the main issue that the network stack crashes with the publicly-released code. I've tried both changes. That More specifically, what are the specific steps you're doing and I'm not, such that you see it crashing, then you make a change then you rebuild something (I'm not sure what), then you run it, and then it works. What are these steps? This will more constructively lead to solving this. If it runs for you and not for me, clearly we're doing something different. I'd like to figure out what that is; this is the point of this bug report, to find out why this is failing on the publicly-released version and to find a fix. The bottom line is that the network stack completely crashes. Having said all that, thank you for your time in helping track this down. :) |
My setup is this As stated in #2685 there is a relationship between the commit and copy/reference. When I first tried your code, it was freezing because the pbufs never got freed. This is because my codebase included that commit. When I made the change, it ran forever with no problems. You may be experiencing some other issue based on your version of the codebase. |
Thank you for those getting started links. I'm 100% certain that all the public release stuff (Arduino 1.8.9 plus 1.0.2 of the ESP32 core; latest PlatformIO plus latest ESP32 core downloaded via the IDE) experiences a network stack freeze. That's why I filed this bug. If it's since been fixed, I'll try to confirm with the latest |
@ssilverman if you want to use the latest arduino-esp32 in PIO you can use this in your platformio.ini file: and it will pull the tip of this repo. |
I can confirm that the staging version of Here's the bash script I'm using to send 1400-byte packets 20 times a second: while true; do echo -n $(printf '.%.0s' {1..1400}) > /dev/udp/192.168.1.9/8000; sleep 0.05; done Simply run this for a little bit of time (changing the IP address, of course, and running with the companion ESP32 program above), shut it down, and then the ESP32 is unresponsive to the network until a hard reset. I'm really confused. I tried on several different ESP32's and with several different network setups. This can't be just me. Is nobody else seeing this (well, hardly anybody)? |
@ssilverman |
Yeah it is definitely an sdkconfig problem. Use the sdkconfig from Arduino = freeze, use the default esp-idf settings = ok. I cannot figure out which setting is the problem though. |
Thank you both! |
I think it is this: CONFIG_ESP32_WIFI_RX_BA_WIN is not supposed to be higher than ESP32_WIFI_STATIC_RX_BUFFER_NUM which is set to 8 |
ping @me-no-dev can you update sdkconfig to reset |
@negativekelvin and it does not "freeze" ... which gives? 🙄 Meanwhile, after some excessive stress testing, I can confirm that compiling with the ESP-IDF defaults really does "fix" the freezing WiFi stack issue 👍 The |
I was also having this basic problem. In my case I was sending a lot of data over TCP on port 23 (pretending to be telnet) and it would lock things up if I ramped the data rate up too high. I could see a WINDOW error in wireshark saying that it exhausted the TCP window. With the fix from this issue it no longer does that. So, it does appear to be a reasonable fix. I would also like to see it merged. |
Could this also be part of the unexplained lockup in ESPAsyncWebServer? |
Wroom32 - Arduino IDE - 1.03-rc1: Found a temporary fix: |
[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
[STALE_DEL] This stale issue has been automatically closed. Thank you for your contributions. |
Hardware:
Board: Adafruit ESP32 Feather
Core Installation version: Core v1.0.2 in Arduino 1.8.9, [email protected] and [email protected] in PlatformIO
IDE name: Arduino v1.8.9 and PlatformIO v3.6.7 in VSCode v1.35.1
Flash Frequency: 80MHz
PSRAM enabled: Don't know
Upload Speed: 921600
Computer OS: Mac OSX v10.14.5
Description:
The network stack never recovers if it receives just tens of larger UDP packets greater than 1024 bytes. This detail got lost in the discussion of #2871, so I'm creating this new bug.
I see this problem almost every time I send somewhere between 20-80 larger UDP packets in non-softAP mode. I see this problem sometimes but rarely when using softAP mode. Sometimes, if I repeatedly run the program, see the network stack crash, and then hard reset the device, I won't see the network stack failure and it will recover from seeing a bunch of large packets. But only sometimes.
To reiterate: The network stack does not recover after receiving a bunch of large packets. Some people see that it recovers just fine if the packets stop, but I never see this. Yes, the device stops seeing packets if it's flooded with them, and yes, some people will see the network stack recover after a short period, but I never do. A hard reset is required.
To continuously send closely-spaced packets, this Bash script is useful:
The
1400
is the UDP size; I find that 1025 and greater causes the problem for me. Also, change the IP address and sleep (in seconds) to play with different network loads.The crux is this: I see
onPacket
never called again once a bunch of large packets are received, even if no more packets are sent. I've tried on three different ESP32 Feathers and three different network setups for non-softAP mode.The effect: The device becomes permanently unusable on the network, without a hard reset, if it sees lots of larger UDP packets in a row.
Sketch:
The text was updated successfully, but these errors were encountered: