-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wifi randomly pulls a second 169 IP, causing all internet connections to fail post-boot #29
Comments
To fix the problem, need to have this problem. Generally speaking 99% of time i use wired connection and it's hard for me to work on this issue. I'm not very much in Linux specifics, so don't treat me as a master here :) |
there is kind of race condition in boot sequence. I was trying to fix it when i was working on Bluetooth. But it seems impossible to fix it especially when more USB devices are connected. You may try to play with etc/init.d/* scripts sequence - may be it will help. |
It's hard to really see what's going on without access to the buildroot scripts. These don't seem to be public. Do you know where these are located for the project? |
It occurs for me too with this adapter --> https://www.amazon.com/gp/product/B08D72GSMS/ combined with this router+ap --> https://www.amazon.com/dp/B08DTF7KGC/ref=twister_B09P4Q7JK4 I just have to run |
There's multiple things going on:
1a) for some users the wlan0 device is available when dhcpcd first starts. It negotiates a lease but can't write any state to disk. Then for some reason it also receives a udev 'add' event for wlan0. Due to the fact there's no lease state written it tries to refresh/rediscover a dhcp lease. For some users this fails (I suspect the router is applying "protection"). When it fails dhcpcd falls back to a self assigned IP and deletes the route/dns for the 'good' lease. Or at least inserts a higher priority route to nowhere. I unfortunately cannot debug this one because my wlan0 interface is not available when dhcpcd launches, so it only tries to get a lease once due to udev add event. I've seen Drakonas' logs and their wlan0 device is detected a full 2 seconds before mine so I suspect this is just due to variations in USB setups (hub, other devices etc). Either /var/db/dhcpcd needs to be writeable or dhcpcd needs to use a different database directory. You can symlink /var/db/dhcpcd to /media/fat/dhcpcd and it will work. Or you could recompile dhcpcd and set DBDIR to /media/fat/dhcpcd (configure --dbdir=/media/fat/dhcpcd)
If you change /etc/network/interfaces so the line like 'iface wlan0 inet dhcp' is instead 'iface wlan0 inet manual' the ifup script won't try to invoke a dhcp client, but will still invoke the pre-up scripts for wpa_supplicant. Then dhcpcd will handle the dhcp lease when it runs. |
Nothing to do with buildroot. Boot scripts are in image and can be read/tweaked. |
I have what seems to be the same or at least a very similar problem, outlined in greater detail in this thread on the Mister FPGA forums: https://misterfpga.org/viewtopic.php?p=58198#p58198 Essentially two leases are allocated to the MiSTer on Wi-Fi (haven't tested whether this happens wired as well). Same MAC address, but one is registered without a hostname and a vendor ID of "udhcp", the other with the "MiSTer" hostname but no vendor ID (from dhcpc). When the device comes up, there is a brief moment of connectivity, followed by 10-20 seconds of disruption, and then connectivity again. You can't see both addresses with Looking at the DHCP datagrams it's more clear -- the udhcpc requests come in with the MAC address (DUID) as the client identifier, but the dhcpc request comes in with the MAC address (DUID) PLUS an IAID as the client identifier. In the eyes of the DHCP server, these each require a unique IP address despite having the same base MAC address. For testing/as a workaround, I changed the following option from # Use the hardware address of the interface for the Client ID.
clientid
# or
# Use the same DUID + IAID as set in DHCPv6 for DHCPv4 ClientID as per RFC4361.
# Some non-RFC compliant DHCP servers do not reply with this set.
# In this case, comment out duid and enable clientid above.
#duid So given the behavior I think we're running into the same thing here. I'm under the impression that when the choice was made to transition to dhcpc from udhcpc, the latter wasn't fully disabled in the base image and the two DHCP clients are conflicting and causing issues -- so +1 to sticking to one or the other regardless of any other fixes. Separately, if people are still running into new IP addresses every startup/polluting DHCP pools even after disabling one of the two DHCP clients, then the config change above should take care of that problem since the MAC address shouldn't be changing every boot. There's really no reason to include IAID as part of the client identifier for the case of MiSTer as far as I can tell (though it shouldn't be cycling with every boot anyway), and it is typically omitted for compatibility purposes for IPv4 anyway. |
To clarify, what file is this change supposed to occur? I assume /etc/dhcpcd.conf |
Yep, that's the one. Sorry I forgot to include that here. |
That's because ifup/ifdown are hardcoded to use udhcpc. Try renaming /usr/sbin/udhcpc to /usr/sbin/_udhcpc. |
Why is this then? It seems, based on this, that two separate DHCP clients can (or maybe always?) load on startup, given the right scenario. Does Busybox handle both for the MiSTer setup? Pretty sure that fixing this should really warrant a change having everything hardcoded for one DHCP client, instead of requiring dirty workarounds Also, please see Zakk's statements a couple months ago in this issue. There is more to the issue than two DHCP daemons. @Akuma-Git |
Because udhcpc, ifup, ifdown are busybox components.
Correct
Idk, afaict the dhcpcd package is unnecessary
Correct, this is due to configuration errors resulting in some fighting between:
|
I'm curious what sparked this change as it seems like it was an intentional choice to switch to dhcpc. Possibly @sorgelig can provide more context, if there was originally an issue with udhcpc that can be addressed here. |
because udhcpc didn't work well. |
Via wired Ethernet without the dnsmasq DHCP logs
dnsmasq DHCP lease
Datagram contents
Active IP addresses
Via wired Ethernet with the dnsmasq DHCP logs
dnsmasq DHCP lease
Datagram contents
Active IP addresses
Regardless of which configuration I'm only seeing one set of requests show up from @Drakonas while this certainly isn't a long-term fix, does the change to |
There are at least two 'multiple lease' problems:
The log linked in the initial post seems to indicate the same IAID is used for both requests. Unfortunately I can't reproduce the issue here (my wlan0 is not available when dhcpd starts up, so it only reacts to the udev event) so I can't see if there's anything different about the request. Dhcpd needs a way to write lease files, even on startup. I'm not sure if it is feasible to move the startup of dhcpd so it always starts after the filesystem is remounted rw |
So I haven't been able to repro this one on my end, but I wonder if it'd be good enough to instead write the DHCP lease state to ephemeral storage |
If you have a working config (both ethernet and wifi) already, then please put modified files here, i will include it in next linux release. |
problem is related to 2 dhcp clients running at the same time and it's not an kernel problem but userspace linux image do: |
I believe the fix suggested by @gkrzystek is similar to the proposed change here. I, for one, this think should be revisited. In short, the boot script in the linux image doesn't actually bring down the interfaces and kill the dhcp client. From my understanding, they are left running and the boot process starts again, attempting to grab a new lease with the previous one still active. Addressing this should fix this issue, I expect. The proposed change also unmounts the filesystem. I can neither confirm or deny that this is necessary. |
Sorry to get back so late on this, but the issue is not resolved by changing to clientid alone and leaving /usr/sbin/udhcpc in place. Disabling udhcpc (renaming it) has consistently fixed the issue for me over wifi, and I've been using ethernet without issue for months. I should mention that the udhcpc issue does not affect everyone, but it's because some routers will not get confused by the duplicate lease attempt, and handle it properly. Good routers will not actually see this issue at large. But cheap or poorly made ones (especially those provided by Internet Providers, which some force you to use) will get confused and handoff a wrong IP oor fail to give the lease, and it's easily reproducible. I am using one of these routers, sadly. @sorgelig Does this give enough information that ethernet is unaffected, and that getting rid of udhcpc should be looked into? I can do more testing if you'd like. |
so, all i need to do is to remove udhcpc and problem solved? |
@Drakonas statementa about "good" touters is slight miss. all routers are just another linux , i did test most of dhhcp server implementations , and most of them assigning lase on combination Clientid + mac , by default. so not routers are bad ,but our linux distro is badly set. |
what we can do here is:
|
Simplest solution for everyone affected who wish to test |
I'm not sure option 1 here works as-is. When adding the
|
have you reboot? note , you should not have booth eth+ wifi connected there is small chance in yopour system wifi starts before dhcpcd ... which ovverwrite resolv.conf imho we should go with dhcpcd as is , global , just reconfigure wpa_supplicant hook to do not call udhcpd... |
found ULTIMATE simple solution. switch all dhcp to dhcpcd (so revert dhcpcd.conf to default please) to explanationnetwork startup script start udhcp as interface is set to dhcp (we do not wish to do that) as dhcpcd daemon listens, it pickup interface and configure it... boom magic ;) root@MiSTer:~>cat /etc/resolv.conf |
Ahh nice, I like the simplicity of this approach. The proposed change to |
Need more feedbacks. If it will work for others, then i will add it. |
@Drakonas there is no magic here , dhcpcd daemon will work for all. (assuming your linux is up to date) |
I said 'With this "inet manual" fix and no others'. I had reverted all other changes to default prior to testing including any dhcp config and executable renames, but I'll do it again to get you this information. Ethernet is not affected by any of these changes, and I've never had trouble with ethernet regardless of using everything default or not. It's just wifi that is affected. The following is only with the inet manual change in /etc/network/interfaces. The /etc/dhcpcd.conf is default, and /usr/sbin/udhcpc exists. I've removed screenshot previews because this post would be astronomically long with them. With eth0 only As you can see from the wlan0 screenshot, with the inet manual fix alone, dhcpcd still attempts to get another lease with a 169 address. wlan0 - ip a This allows connections between my machine and the MiSTer (albeit hostname-relationships do not work), but the MiSTer scripts cannot get an internet connection. wlan0 - cat /etc/resolv.conf Now, if I rename /usr/sbin/udhcpc to /usr/sbin/_udhcpc, but leave this inet manual fix intact, scripts still do not get an internet connection. dhcpcd still obtains a second ip with 169 address: I should mention that my initial attempt to run inet manual with no udhcpc was met with it finally only grabbing one IP address. However, as I know this issue is related to certain modems getting confused, I turned the MiSTer off, unplugged the wlan adapter, plugged in ethernet, and then turned it on. It grabbed a new IP. I rebooted once more as-is. Still fine. Then I turned it off as-is, unplugged ethernet and plugged in wlan adapter, and now it gets a 169 address again. So this shows my initial experience with it eventually working after a few reboots but the issue will return again later. My assumption for this working after a few reboots eventually is the modem stops getting confused. So you have to force the MiSTer to change IP's, then the issue returns when you next try wifi again. Now, if I leave /usr/sbin/_udhcpc renamed (disabled) and revert /etc/network/interfaces to defaults (inet dhcp), everything works. See below: And now I will repeat the exact same wlan0 -> ethernet + reboot twice -> wlan0 to prove it will work first time wlan0 pulls a new IP (I'm writing this before doing it, to show how confident I am that removing udhcpc is all that is needed to fix this issue): TL;DR. Just remove udhcpc. That's all that's needed. There's no reason to get anymore complicated. |
@Drakonas noipv4ll to your /etc/dhcpcd.conf which will prevent from bringing up link locall addresses (169.254.x.x) however this will solve only "no intenret" "problem" can you please examine output from also what i would suggest is to put wpa_supplicant to debug mode and see if it doesn't report rapid re-connections.. thansks for the help wit h investigation |
note for @sorgelig |
@gkrzystek as stated here, this is the cause. Please read the thread before saying my wifi 6 router being 3 meters away from my MiSTer is the issue. I have been calm, but after that post I am trying my best to be civil. Lol. I have spent months testing this and replacing my modem was already something I tried. the problem still wasn't fixed. I am.open to suggestions but I propose we try to figure out what is causing wlan0 to be visible sometimes when the MiSTer launches, while having @sorgelig move ahead with removing udhcpc/ifup/ifdown, as they're all hardcoded to use udhcpc in BusyBox. This will fix a number of people's issues, but not all (as in problem 2 in the quoted post above) So, in regards to what we already know, I have a new theory, and I can do more testing for this @zakk4223 but from what I have found recently, the hard reboot script for MiSTer does not fully reboot and does not bring interfaces down or the dhcp client, but the boot script is relaunched. Could that be the cause of some people having multiple leases? I'm wondering if some people thought rebooting from the MiSTer menu and power cycling it was the same thing, but my recent findings have shown they are not. Looking further up in this thread you'll find a link to someone proposing a script change for the cold reboot script. I am not sure if this is necessary to address the issue, but I am wondering if cold rebooting might render different boot process that is worth testing. I can do some later on. I am heading to bed lol. |
@Drakonas
i fully understand your frustration mate , and i am really interested in finding where problem is. |
I deleted my original post. I was too hasty to respond and for that I apologize. |
@Drakonas no hard feelings |
I can put the effort in and am willing to. I will report back tomorrow. It is 6AM here and I must sleep. Thank you for being understanding. I have anger issues and it's been hard work getting them under control for the past few years. |
and about reboot from menu: |
My ssh connection isn't cut even when I power cycle manually, which doesn't make sense to me. It should lose connection. During all my tests with eth0 only and wlan0 only, the ssh connection was never cut when I turned off my mister. I use splitter to USB hub and de10-nano, with analog io. |
huh taht would be magic (sorry for my sarcasm) or your hub is feeding power to the system somehow reset pwoer with unpowering both de10 and hub |
Nevermind, if I wait long enough it is lost. I am tired. |
I have added the I will check back later on. |
@sorgelig can you please rebuild linux image with change i proposed? |
@Drakonas please download, unpack and replace linux.img and zImage_dtb in your sd card in linux folder. |
@sorgelig thanks for quick response. changes we supplied, solving primary problem of 2 dhcpclients handling wlan0 (which at last for some users) were causing a problems. i do not expect more problems occurring because of those changes, one is cosmetics, and rtl drivers have more userbase in aircrack-ng community than we have ;) |
one more change , that improve how dhcpcd behave this will remove dhcpcd complains about lasefile write ,and slightly improve bootup network startup |
Just following up on this -- the most recent updates published seem to be working great on my end. Thanks for that. The multiple DHCP lease address issue persists however because the of default DUID configuration in dhcpcd.conf that I mentioned previously. With this set to DUID a new address is assigned to the MiSTer every restart for both wired and wireless connections, instead of using an existing DHCP lease. This also causes significant delays reconnecting to the MiSTer over the network due to local DNS caches. This should be set to use client ID based on my experience, as some routers struggle with DUID in the first place and others will be wasteful with leases in a scope due to the way the MiSTer appears as a new device every time it is restarted. When set to client ID with a DUID-aware DHCP server, the MiSTer should retrieve the same leases upon restart. |
- With DUID, new DHCP leases would be issued upon reboot when paired with DHCP servers that respect DUID, instead of reusing the existing DHCP lease. - The MiSTer installation process now generates a random hardware address, so DUID is unnecessary. Including DUID results in wasted leases within a DHCP scope, and can also lead to DNS resolution issues when accessing the device. - More details - MiSTer-devel/Linux-Kernel_MiSTer#29 (comment) - MiSTer-devel/Linux-Kernel_MiSTer#29 (comment) - https://misterfpga.org/viewtopic.php?p=74311#p74311
- With DUID, new DHCP leases would be issued upon reboot when paired with DHCP servers that respect DUID, instead of reusing the existing DHCP lease. - The MiSTer installation process now generates a random hardware address, so DUID is unnecessary. Including DUID results in wasted leases within a DHCP scope, and can also lead to DNS resolution issues when accessing the device. - More details - MiSTer-devel/Linux-Kernel_MiSTer#29 (comment) - MiSTer-devel/Linux-Kernel_MiSTer#29 (comment) - https://misterfpga.org/viewtopic.php?p=74311#p74311
Over the last week a few of us have been hammering at what is causing these weird wifi-related failures on boot at random. When they occur, the MiSTer gets an IP, even gets the time when you don't have an RTC board, and SAMBA/SSH connections work, but anything attempting to get an internet connection after that fails. Even
nslookup google.com 8.8.8.8
fails with a timeout to the DNS.What we have found are the following:
udevadm trigger /sys/class/net/wlan0 --action add
rm /sbin/udhcpc
causes ifup to use dhcpcd, and then the rc script starts another copy. It seems the priority for the startup script is to load udhcpc before dhcpcd. It would probably be best to only use dhcpcd. If udhcpd is wanted instead, eth0 does not load since it is not in the interfaces config file, so it would need to be added there.Possible methods to fix (doing multiple is not a bad idea):
/usr/share/dhcpcd/hooks/10-wpa_supplicant
), but it is broken with the MiSTer's wpa_supplicant implementation. wpa_supplicant would need to be fixed. This would allow using dhcpcd.conf to address anything related to network issues. And have ifupdown just do loopback.Any thoughts as to what may be directly causing this is welcome. I'd like to get to the bottom of this, as various users besides me have reported this happening at random with their MiSTer. We are using the latest Mr. Fusion images as far as I know.
I have attached my syslog for review.
/var/log/messages
The text was updated successfully, but these errors were encountered: