|
| 1 | +--- |
| 2 | +title: Things you really should know about Windows Input, but would rather not - Raw Mouse edition |
| 3 | +date: 2024-09-16 19:30:00 +0100 |
| 4 | +categories: [Programming, Game Development] |
| 5 | +tags: [programming, input, windows, mouse, win32, rawinput, xinput, polling] |
| 6 | +author: petert |
| 7 | +img_path: /assets/2024-09-16-Windows-Input |
| 8 | +--- |
| 9 | + |
| 10 | +# Motivation |
| 11 | + |
| 12 | +When you are developing a game for PC -- or porting one -- then you will have to deal with user input, generally |
| 13 | +from three distinct categories of sources: **mouse**, **keyboard**, and **gamepads**. |
| 14 | + |
| 15 | +At first, it could reasonably be assumed that mouse and keyboard should be the simplest parts of this to deal with, but |
| 16 | +in reality, they are not -- at least if we are talking about Windows. In fact, ***several extremely popular AAA games |
| 17 | +ship with severe mouse input issues when specific high-end mice are used***, and some popular engines have issues that |
| 18 | +are still extant. |
| 19 | + |
| 20 | +In this article we'll explore a few reasons why that is the case, and end up with a solution that works but is still |
| 21 | +unsatisfactory. I assume that there is a whole other level of complexity involved in properly dealing with accessories |
| 22 | +like steering wheels, flight sticks, and so on in simulators, but so far I never had the pleasure of working on a game |
| 23 | +that required this, and this article will not cover those types of input devices. |
| 24 | + |
| 25 | +> If you are already an expert on game input on Windows, please skip directly to [our current |
| 26 | +> solution](#our-current-solution) and tell us how to do it better, because I hope fervently that it's not the best! |
| 27 | +{: .prompt-warning } |
| 28 | + |
| 29 | +> While the vast majority of this article is about mouse input, we also recently discovered something really interesting |
| 30 | +> about xinput performance that we will share [towards the end of the article](#a-note-on-xinput). |
| 31 | +{: .prompt-tip } |
| 32 | + |
| 33 | +# Background -- Raw Input |
| 34 | + |
| 35 | +In Windows, there are many ways to receive input from the user. The most traditional one is to receive Windows messages, |
| 36 | +which are sent to your application's message queue. This is how you receive keyboard and mouse input in a typical Windows |
| 37 | +application. However, this method has a few drawbacks when it comes to games. |
| 38 | + |
| 39 | +The most notable of these drawbacks is that you cannot use the message queue to receive precise and unaltered mouse input, |
| 40 | +which is particularly important for any game where the mouse is used to control a 3D camera. Traditional input is meant to |
| 41 | +control a cursor, and the system will apply acceleration and other transformations to the input before it reaches your |
| 42 | +application, and also won't give you sub-pixel precision. |
| 43 | + |
| 44 | +> If your game only has cursor-based input, e.g. a strategy game or point-and-click adventure, then you can probably |
| 45 | +> get away with ignoring everything mouse-related in this article and just blissfully use standard Windows messages. |
| 46 | +{: .prompt-info } |
| 47 | + |
| 48 | +The solution to this problem is to use the [Raw Input API](https://learn.microsoft.com/en-us/windows/win32/inputdev/using-raw-input), |
| 49 | +which allows you to receive input from devices like mice and keyboards in a raw, unaltered form. This is the API that |
| 50 | +most games use to receive input from the mouse, and the linked article provides a good overview of how to use it, which |
| 51 | +I will not repeat here. |
| 52 | + |
| 53 | +So, why the somewhat whiny undertone of this article? Oh, we're just getting started. |
| 54 | + |
| 55 | +{: width="90%" #fancyimg } |
| 56 | +_A Razer Viper mouse with 8k polling rate -- I assume that all of these people stare in bewilderment as to why some of their |
| 57 | +games drop by 100 FPS when they use it (Image source: Razer)_ |
| 58 | + |
| 59 | +## Using Raw Input |
| 60 | + |
| 61 | +If you are familiar with the Raw Input API, or just read the linked documentation, then you might believe that I'm just |
| 62 | +getting at the importance of using buffered input rather than processing individual events, but really, that wouldn't |
| 63 | +be too bad or worth an article. The real problem is that it is not nearly as simple as that -- in fact, as far as I can |
| 64 | +tell, **there is no good general way to do this at all**. |
| 65 | + |
| 66 | +Let's step back a bit -- for those who have not done this before, there are two ways to receive raw input from a device: |
| 67 | + |
| 68 | +1. Using **standard reads** from the device, which is the most straightforward way to do it. This basically just involves |
| 69 | + receiving additional messages of type `WM_INPUT` in your message queue, which you can then process. |
| 70 | + |
| 71 | +2. Using **buffered reads**, where you access all extant raw input events at once by calling `GetRawInputBuffer`. |
| 72 | + |
| 73 | +As you might surmise, the latter method is designed to be more performant, as message handling of individual events using |
| 74 | +the message queue is not particularly efficient. |
| 75 | + |
| 76 | +Now, actually doing this, and doing it correctly, is not as easy as it should be -- or maybe I just missed something. |
| 77 | +As far as I can tell, to prevent problems related to "losing" messages that occur at specific points in time, while |
| 78 | +only processing raw input in batched form, you need to do something like the following: |
| 79 | + |
| 80 | +```cpp |
| 81 | +processRawInput(); // this does the whole `GetRawInputBuffer` thing |
| 82 | + |
| 83 | +// peek all messages *except* WM_INPUT |
| 84 | +// except when we don't have focus, then we peek all messages so we wake up consistently |
| 85 | +MSG msg{}; |
| 86 | +auto peekNotInput = [&] { |
| 87 | + if(!g_window->hasFocus()) { |
| 88 | + return PeekMessage(&msg, NULL, 0, 0, PM_REMOVE); |
| 89 | + } |
| 90 | + auto ret = PeekMessage(&msg, NULL, 0, WM_INPUT-1, PM_REMOVE); |
| 91 | + if (!ret) { |
| 92 | + ret = PeekMessage(&msg, NULL, WM_INPUT+1, std::numeric_limits<UINT>::max(), PM_REMOVE); |
| 93 | + } |
| 94 | + return ret; |
| 95 | +}; |
| 96 | + |
| 97 | +while (peekNotInput()) { |
| 98 | + TranslateMessage(&msg); |
| 99 | + DispatchMessage(&msg); |
| 100 | +} |
| 101 | + |
| 102 | +runOneFrame(); // this is where the game logic is |
| 103 | +``` |
| 104 | + |
| 105 | +As shown in the code snippet above, you need to peek all messages except `WM_INPUT` to ensure that you don't lose any |
| 106 | +messages that occur between the times you are processing batched raw input and "normal" messages. This is not made |
| 107 | +particularly clear in the documentation, and it's also not made particularly easy by the API, but a few extra lines |
| 108 | +of code can solve the problem. |
| 109 | + |
| 110 | +All of this *still* wouldn't be a big deal, just the normal amount of scuff you expect when working on an operating |
| 111 | +system which has a few decades of backwards compatibility to maintain. So, let's get to the real problem. |
| 112 | + |
| 113 | +## The Real Problem |
| 114 | + |
| 115 | +Let's assume you did all this correctly, and are now receiving raw input from the mouse, in a buffered way, as |
| 116 | +suggested. You might think that you are done, but you are not. In fact, you are *still* just getting started. |
| 117 | + |
| 118 | +{: width="90%" #fancyimg } |
| 119 | +_Comparison between no mouse movement (upper part) and mouse movement (lower part), everything else equal_ |
| 120 | + |
| 121 | +What you see above is a comparison of the frametime performance chart of a game, in the exact same scene. |
| 122 | +The only difference is that in the lower part, the mouse is being vigorously shaken about -- and not just any mouse, |
| 123 | +but a high-end one with a polling rate of 8 kHz. As you can see, just moving the mouse around destroys performance, |
| 124 | +dropping from being consistently pegged at the soft FPS cap (around 360 FPS) to ~133 FPS and becoming completely |
| 125 | +unstable. ***Just by vigorous mouse movement.*** |
| 126 | + |
| 127 | +Now you might think "Aha, he included this to show how important it is to use batched processing!". Sadly not, what you |
| 128 | +see above *is, in fact, the performance of the game when using batched raw input processing*. Let's talk about why this |
| 129 | +is the case, and what to do about it. |
| 130 | + |
| 131 | +## The Bane of Legacy Input |
| 132 | + |
| 133 | +To make a long story short, the problem is so-called **"legacy input"**. When you initialize raw input for a device using |
| 134 | +`RegisterRawInputDevices`, you can specify the `RIDEV_NOLEGACY` flag. This flag prevents the system from generating "legacy" |
| 135 | +messages, such as `WM_MOUSEMOVE`. And there we have our problem: if you don't specify this flag, then the system will |
| 136 | +generate both raw input messages and legacy messages, and the latter will still clutter your message queue. |
| 137 | + |
| 138 | +So once again, why am I whining about this? Just disable legacy input, right? Indeed, that completely solves the |
| 139 | +performance issue -- as long as you do everything else correctly as outlined above of course. |
| 140 | + |
| 141 | +And then you congratulate yourself on a job well done, and move on to the next task. A few days later after the build is |
| 142 | +pushed to beta testers, you get a bug report that the game window can no longer be moved around. And then you realize |
| 143 | +that you just disabled the system's ability to move the window around, because that is done using legacy input. |
| 144 | + |
| 145 | +> Disabling legacy input disables any form of input interaction that you would normally expect to be handled by the system. |
| 146 | +{: .prompt-danger } |
| 147 | + |
| 148 | +So what can we do about this? Here is a short list of things I considered, or even fully implemented, and which either |
| 149 | +don't work, can not actually be done, or are just silly in terms of complexity: |
| 150 | + |
| 151 | +1. **Use a separate message-only window and thread for input processing**. This seemed like a good solution, so I went |
| 152 | + through the trouble of implementing it. It basically involves creating an entirely separate invisible window and |
| 153 | + registering raw input with it rather than the main window. A bit of a hassle, but it seemed like it would |
| 154 | + resolve the issue and do it "right". No luck: the system will still generate high-frequency legacy messages for the |
| 155 | + main window, even if the raw input device is registered with the other window. |
| 156 | + |
| 157 | + > Raw input affects the entire process, even though the API takes a window handle. |
| 158 | + {: .prompt-warning } |
| 159 | + |
| 160 | +2. **Only disable legacy input in fullscreen modes**. This would at least solve the problem for the vast majority of |
| 161 | + users, but it's not possible, as far as I can tell. You seemingly **cannot** switch between legacy and raw input |
| 162 | + once you've enabled it. You might think `RIDEV_REMOVE` would help, but that completely removes all input that the |
| 163 | + device generates, including both legacy and raw input. |
| 164 | + |
| 165 | + > You can't switch between legacy and raw input once you've enabled it. |
| 166 | + {: .prompt-warning } |
| 167 | + |
| 168 | +3. **Use a separate process to provide raw input**. This is a bit of a silly idea, but it's one that I can think |
| 169 | + of that would actually work. You could create a separate process that provides raw input to the main process, and |
| 170 | + then use some form of IPC to communicate the input. This would be a massive hassle, and I really don't want to support |
| 171 | + something like that, but I'm pretty sure it would work. |
| 172 | + |
| 173 | +4. **Disable legacy input, create your own legacy input events at low frequency**. Another one in the category of "silly |
| 174 | + ideas that probably could work", but there are a *lot* of legacy messages and this would be another support nightmare. |
| 175 | + |
| 176 | +5. **Move everything else out of the thread that does the main message queue processing**. This is something I would probably |
| 177 | + try if I was doing greenfield development, but it's a massive change to make to an existing codebase, as in our porting |
| 178 | + use case. And it would still mean that this one thread is spending tons of time uselessly going through input messages. |
| 179 | + |
| 180 | +So option 1 and 2 would be somewhat viable, but the former doesn't actually work, and the latter is not possible. The others |
| 181 | +are, in my opinion, too silly to explore for actually shipping a game, or infeasible for a porting project. |
| 182 | + |
| 183 | +So perhaps now you can see both why there are AAA games shipping on PC which break with 8 kHz mice, and why I'm *just a bit* |
| 184 | +frustrated with the situation. So what are we actually doing? |
| 185 | + |
| 186 | +# Our Current Solution |
| 187 | + |
| 188 | +Our current solution is very dumb and seems like it shouldn't work, or at least have some severe repercussions, but so far it |
| 189 | +does seem to work fine and not have any issues. It's a bit of a hack, but it's the best we've got so far. |
| 190 | + |
| 191 | +This solution involves **keeping legacy input enabled**, but using **batched raw input for actual game input**. And then the stupid |
| 192 | +trick: **prevent performance collapse by just not processing more than `N` message queue events per frame.** |
| 193 | + |
| 194 | +We are currently working with `N=5`, but that's a somewhat random choice. When I tried this I had a lot of concerns: |
| 195 | +what if tons of messages build up? What if the window becomes unresponsive? I didn't worry about the game input itself, |
| 196 | +because we rapidly and with very low latency get all the buffered raw input events, but the window interactions might |
| 197 | +become unresponsive due to message buildup. |
| 198 | + |
| 199 | +After quite a lot of testing with an 8 kHz mouse, none of that seems to happen, even if you really try to make it happen. |
| 200 | + |
| 201 | +So, that's where we are: an entirely unsatisfactory solution that seems to work fine, and provides 8k raw input without |
| 202 | +performance collapse and without affecting "legacy" windows interactions. If you know how to actually do this properly, |
| 203 | +please write a comment to this post, or failing that send an email, stop me in the street and tell me, or even send a carrier |
| 204 | +pigeon. I would be very grateful. |
| 205 | + |
| 206 | +# A Note on XInput |
| 207 | + |
| 208 | +This is completely unrelated to the rest of the article, other than being about input, but I found it interesting and it might |
| 209 | +be new to some. When you use the XInput API to work with gamepads, you might think there is very little you can do wrong. |
| 210 | +It's an exceedingly simple API, and mostly you just use `XInputGetState`. However, the documentation has this curious note, |
| 211 | +which is quite easy to miss: |
| 212 | + |
| 213 | +> For performance reasons, don't call XInputGetState for an 'empty' user slot every frame. We recommend that you space |
| 214 | +> out checks for new controllers every few seconds instead. |
| 215 | +
|
| 216 | +This is not an empty phrase: we have observed performance losses of 10-15% in extremely CPU-limited cases just by calling |
| 217 | +`XInputGetState` for all controllers every frame while none are connected! |
| 218 | + |
| 219 | +I have no idea why the API would be designed like this, and not have some internal event-based tracking that makes calls |
| 220 | +for disconnected controller slots essentially free, but there you have it. You actually have to implement your own |
| 221 | +fallback mechanism to avoid this performance hit, since there's no alternative API (at least in pure XInput) to tell you |
| 222 | +whether a controller is connected. |
| 223 | + |
| 224 | +This is another area where an existing API is quite unsatisfactory -- you generally want to |
| 225 | +avoid things that make every Nth frame take longer than the ones surrounding it, so you'd need to move all that to |
| 226 | +another thread. But it's still much easier to deal with than the raw mouse input / high polling rate issue. |
| 227 | + |
| 228 | +# Conclusion |
| 229 | + |
| 230 | +Game input in Windows ain't great. I hope this article saves someone the time it took me to delve into this rabbit hole, |
| 231 | +and as I said above I'd love to hear from you if you know how to do this truly **properly**. |
| 232 | + |
| 233 | +*And we haven't even gone into keyboard layouts yet!* Are you a QWERTZ user and have you ever wondered why some games have |
| 234 | +default bindings that have actions on e.g. `Z`, `X` and `C`, which makes no sense on your input device? But that's a story |
| 235 | +for another day. |
0 commit comments