-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speeding up Grim - Adding SIMD support #1841
base: 2.0
Are you sure you want to change the base?
Conversation
That's a good idea |
How do people feel about using my existing Vector implementation (yes I know its slow, it's more about the interfaces and how it works)? Is everyone fine with using the factory to create new instances and static imports? I know I could probably speed it up just a little by eliminating the casting but that would require treating people on scalar as Second class citizens. I figured it would be worth it since people still run Xeon V1/V2s for Minecraft servers and the oldest machines with support for AVX-256 for doubles are Haswell chips from 2013 (Intel Core i7-4770K & Intel Xeon E5-2697 v3) Key differences below: (rest are mostly just replacing existing usage of Bukkit Vectors)
EDIT: This would likely supersede #1765 |
# Conflicts: # src/main/java/ac/grim/grimac/predictionengine/MovementCheckRunner.java # src/main/java/ac/grim/grimac/predictionengine/predictions/PredictionEngine.java # src/main/java/ac/grim/grimac/utils/nmsutil/JumpPower.java
8d49f2a
to
1f2b697
Compare
It seems Project Panama has abandoned allowing you to ignore safety for performance by directly calling a method with its memory address in Java 21 which somewhat embarrassingly means that all of the implementations here (with the exception of SIMD cross product on Project Panama) will probably be slower than Scalar unless we batch them. The overhead is just too high when you're doing operations that are measured in nanoseconds. I'll probably write a NalimVector3D implementation later just to prove we can beat the JVM if we try hard enough when I have time and post benchmarks later. |
As you can see for very simple/trivial functions like these the overhead of using these techniques is often not worth it. Even using Nalim with SIMD the cost of calling the C function is too high. For vectors batching is needed for it to be worth it. In any case. This is not important; the goal of this draft is too add the right tooling to the project to take advantage of SIMD, and low overhead native code calls. I'll continue to update this draft with example code and project setup to make it easy to understand how to take advantage of these features for real performance gains. |
seems gimmicky idk, you can optimize grims performance considerably before resorting to simd operations, its not like we are doing image manipulation here that simd is great at. |
As our codebase grows with more checks, performance has and will become increasingly important.
This code here provides the basic structure necessary to add support for running operations with SIMD, and falling back on scalar operations when they are available at no runtime performance cost even in cases where SIMD is not available. Java 18 is the minimum requirement.
There are many parts of the code that can/may have to be change in order to take full advantage of SIMD. This PR Draft does not do that.
If there's interest, I'll consider also updating this draft to include Project Panama native code setup as well. We can include or download a small native library at runtime and our fall back order would look something like Native Code (probably SIMD) -> Java SIMD -> Scalar