-
Notifications
You must be signed in to change notification settings - Fork 74
Fix #48 segfault with k70 rgb in bios mode #89
Fix #48 segfault with k70 rgb in bios mode #89
Conversation
|
I tried this patch now. This is what happens when changing the switch from position "1" to BIOS mode: And the keyboard is working again in 46 seconds. Keep in mind that this particular keyboard on this platform always has this initialization delay. Switching back to "1" makes keyboard non-working. Only the following output is shown: Now I have to unplug and plug it again to get it working: And it is working again in just under a minute. |
|
This is now what I get during bootup (tried to filter all relevant). |
|
@hevanaa please upload the logs as files. |
|
Yes, sorry, fixed. |
|
@hevanaa soryy, your 3rd and 4th link are dead: |
|
Works over here. I guess it's the problems on Amazon S3. |
|
@hevanaa To logfile 1: Why the daemon does not tell the OS to claim the interfaces for its HID driver again - I don't know. So far, so (relative) good. |
|
@hevanaa the second logfile: Is there really no more lines after OK, after 17 seconds - what ever the kernel has done in between... If there is really no detection of the KB, I understand, why the KB is dead. But this is Corsair firmware + kernel function and has nothing to do with ckb. |
|
@hevanaa ok, files 3 + 4 worked now. |
|
Yes, this time there were no more lines. The log continues in the third file (I think I only removed a few lines from some Gnome plugin). |
|
Curious, do you both have the power USB cable also plugged in?
What if you only plug the data one on a USB3 port and leave the power
one unplugged?
I think this might be relevant to the keyboard's behaviour.
|
|
Yes, the power cable is plugged. I will try without and on USB3. |
|
@hevanaa File 3 starts with a normal init. Because it begins 2 minutes after the last file ends, I guess there is no message about unplugging usb 3-6. What I can see when I debug the usb communication with wireshark, is 3 times the kernel tries a GET_CONFIG which is answered by the device, but the kernel detects the answer as invalid (must be on lowest protocol layer, because wireshark marks it as invalid itself). After these timeouts the normal startup process starts: New to me is the last block: What does it mean? The daemon tries to read the LED profiles, stored to the KB. The protocol for this is fix, but changed on my KB between firmware v204 and 205. You have V204 in your KB. ist there a newer one you can upgrade? Even if the hardware could not be read, the KB runs afterwards. |
|
@frickler24 Quirks don't work on @hevanaa's installation because usbhid
is built in to the kernel, and is not a module. At least that's what I
remember.
I don't have any Fedora machines to test custom kernels on,
unfortunately.
|
|
@tatokis Thank you for that info! |
|
According to https://bugzilla.redhat.com/show_bug.cgi?id=907221 it
worked in the past for other people though.
It _should_ work, but unfortunately I do not have any experience with
Fedora, and it doesn't seem to do anything in this case.
Regarding the firmware version, the Lux seems to be on 2.04
unfortunately.
|
|
I feel stupid. I did mess around with different USB ports in the beginning, but apparently nothing worked well before I updated the firmware. Now it's booting fast with only data cable plugged in an USB3 port! Thanks for that @tatokis! Uploading same logs as above... |
|
@hevanaa last file (fresh boot): This is a receive request to the device and the device does not send an answer or the kernel does not give the request to the device. Even if the -110 comes from the device and errno is set to timeout, there is no time delay! |
|
Switching from 1 to bios mode: And back from bios to 1: Worked! |
|
And a new boot log (filtered). Booting really fast now: |
|
Does the same behaviour keep up if you leave the data cable in USB3,
but also plug in the power one as well?
Asking to see if this issue is related to the USB Host controller, or
if it is indeed the power cable causing the keyboard to act differently.
|
|
So, the issue lies somewhere around the second USB cable being connected? I know now that on Strafe RGB the second cable works just as a throughput (a throughput? don't know how it's called) for a USB device, the keyboard itself has a USB input on its surface (so that you can plug a flash drive directly in the keyboard). And I also remember having some problems almost a year ago with two cables being plugged at the same time. I should return to this at some point, actually. |
|
@light2yellow Unfortunately that is different. All that cable does is
pass through the power and data to the rear USB port on your keyboard.
If you have a multimeter and a usb connector, you could test it with
the continuity setting on it.
Our keyboards do not have such a feature, and instead use the extra USB
cable for power.
The microcontroller inside the keyboard is aware of it, and knows when
the power cable is plugged in or not. This can be observed if you only
plug the power cable and not the data cable. After some time, the
keyboard will go into demo mode, and start going through different
preprogrammed patterns.
|
|
Interesting... My keyboard no longer goes into demo mode if I do that.
I am positive that this was the behaviour in the past, so the only
thing I can think of is that they changed it in a firmware update.
@frickler24, We shouldn't really have to support old firmware versions.
We already tell people to upgrade their firmwares before reporting
issues.
|
|
@tatokis I agree in your last point. So and because there are some infos in the 2. logfile sent by @hevanaa which is not correct, I prepared a PR just now. Let me have a look into the third file. Then we can decide what to do with the FW-hack. |
|
yes, there is still the firmware-bug (the 2.04 has the new functionality, not the old one!): Because I cannot revert my FW to 2.04, I cannot say if 2.04 on a K95RGB has the new function also. So my recommendation is: let's merge the #96 into testing so @hevanaa can test it. |
|
It works fine with the power cable plugged into an USB3 port, too. So the kernel timeout seems to be an issue with the USB2 port that I used before. I also left out some rows from the last boot log above. Here is the correct one: |
|
To be honest, I expected quite the contrary - USB3 would have been causing troubles (if there would be such at all). Shall this be a hardware issue? Excuse me if I didn't understand the problem fully. |
|
@frickler24
Can we explicitly clarify the status of this PR? If it's ready to be merged, let's merge it. I am able to test (but haven't done this yet) that:
The same clarification request is for the accompanying issue #48 - from what I see we need @tatokis to try? The same clarification request is for #96 and it's accompanying issue mentioned in your last comment here. From what I see the reason why they are all fluxing (and other PRs as well) is that nobody wrote what else must be done to it or that it's working as expected. And because most of this thread was about solving a kernel timeout and not much about the PR. Sorry, I'm a bit lost. Regarding this particular issue, seems like I should put the kb into BIOS mode and tell whether the daemon crashes or not. If I'm mistaken, please correct me. Also, just for your information:
|
9e8bd5c to
0e84888
Compare
|
@light2yellow Thanks for your comments. I have had the effect today that my kernel has regularly issued errors. So I looked deeper at the fix again. It has so far handled only the effect that a device either does not give proper information about its endpoints or the device is in BIOS mode. In the case of various errors (also in the logs given in this issue by @hevanaa), other errors occur which are due to poor implementation of the communication between OS and firmware (the errors also occur when the daemon is not running). Today I have expanded the fix in two situations, which appear frequently in the logfiles. Then I repeated some of the tests which I had already commented above. Especially switching between the poll rates leads again and again to exactly such "hangers" in the bus protocol. For me the Daemon at least runs better with the patch than without. To the testing possibilities you have: I will rebase my branch with branch testing, if it changes. |
|
Installed, stopped the daemon, put the kb into BIOS mode and launched the daemon manually. |
|
Same as above, but after the disconnect of |
|
OK, but why did you get the endless loop in the first case? |
When switching K70RGB or K95RGB to BIOS mode, only one usb channel is provided by the KB driver. Because in usb_linux.c for all other modes the last channel is not used, zero channels have to be initialized. this brings the daemon to a SIGSEGV. While debugging this, sometimes the KB got informations which stopped the KB completely from working. Only disconnecting + connecting worked, no software-reset via the switch had been successful in this state. When connecting a KB in this state to the daemon, 0 channels were detected. These two special conditions (0 and 1 EP for an RGB device) got special treatment in the code.
Changing the operating modes for the RGB keyboards that provide this feature (such as the K70RGB or the K95RGB) will result in massive disruptions to the daemon and the keyboard itself. A test matrix is stored in the corresponding issue # 48. The settings here refer to the timeout before sending a USB message. The timeout was previously 10ms and has been doubled. Minor corrections and the clipping of the debug output in a #define DEBUG were also committed here.
In several cases we have seen error messages in the logfiles like USBDEVFS_CONTROL failed cmd ckb-daemon rqt 33 rq 9 len 64 ret -110 [W] _start_dev (device.c:24): Unable to load firmware version/poll rate [E] os_usbsend (via firmware.c:15): Connection timed out or similar. This happens if the device is in a status where the communication between linux and device is corrupted. E.g. with an K95RGB one may provoke the behavior by moving the poll-rate-switch, eg to BIOS mode. This fix adds some USB-resets for the device if either ioctls for getting infos give bad return values or the device does not provide valuable information about endpoints, version numbers etc.
0e84888 to
a2b6508
Compare
|
OK, both log outputs are correct. The first one terminates after 5 try_usbreset() and does a disconnect from ckb1. With this disconnect the OS HID driver manages the device again in BIOS mode. |
|
Yes, thank you. |
This reverts commit f2ddc50.
... and improve the behavior when modifying modes for RGB keyboards via the modes switch.
Replaces #64 which was based on master:
This pullrequest avoids the most important crashes of the daemon when the polling switches on the RGB keyboards are changed. Unfortunately, the cause of the error is the co-working of the keyboard firmware, the Linux USB driver and the ckb-daemon.
I have tried different test combinations with my K95RGB and M65RGB mouse. The result can be found here: Test.matrix.pdf.
In short, even without the ckb-daemon you can bring the USB driver or even the keyboard itself to crash by moving the state-switches. This is only supplemented by the daemon, so that the user is completely irritated and the cause of the error can no longer be identified.
What I found was a hint in the usb driver about the delays before sending a message: 5ms was the standard. Because I had a lot of trouble with the 8ms settings for my K95RGB, I tried to double this delay. It works a lot better, even not perfect. My suggestion is that we use that increased setting and observe whether we get a timing problem elsewhere.
If you are interested: I have several logfiles from the usb communication to show with wireshark or vusb-analyzer
The following info may be helpful for our users in the README.md file.
But first please check with your HW (K70xxx and others), if that change work for you.
The most reliable way to switch the mode is by following that order:
If after 15 seconds the keyboard remains dark or simply not react, the daemon should be killed, the keyboard unplugged and after 15 seconds replugged. The times may vary depending on your system: It is needed for the usb driver to detect the reconnection of the usb device.
Please try killing the daemon gently first with "sudo pkill ckb-daemon", because with this it does some cleanup. Only if the process wont stop (there is a situation with a loop between kernel and daemon), kill it tough and then clean up manually. Example for Linux:
sudo pkill -9 ckb-daemon; sudo rm -rf /dev/input/ckb* # be careful with this statement!!!If you do not clean up and have a lot of testing, you will have too many devices in the directory (max 10 allowed).
references #48