Speeding up Grim - Adding SIMD support #1841

Axionize · 2024-11-30T05:36:58Z

As our codebase grows with more checks, performance has and will become increasingly important.

This code here provides the basic structure necessary to add support for running operations with SIMD, and falling back on scalar operations when they are available at no runtime performance cost even in cases where SIMD is not available. Java 18 is the minimum requirement.

There are many parts of the code that can/may have to be change in order to take full advantage of SIMD. This PR Draft does not do that.

If there's interest, I'll consider also updating this draft to include Project Panama native code setup as well. We can include or download a small native library at runtime and our fall back order would look something like Native Code (probably SIMD) -> Java SIMD -> Scalar

…port

…for contributors

AoElite · 2024-11-30T06:37:36Z

That's a good idea

Axionize · 2024-12-05T08:38:05Z

How do people feel about using my existing Vector implementation (yes I know its slow, it's more about the interfaces and how it works)? Is everyone fine with using the factory to create new instances and static imports?

I know I could probably speed it up just a little by eliminating the casting but that would require treating people on scalar as Second class citizens.

I figured it would be worth it since people still run Xeon V1/V2s for Minecraft servers and the oldest machines with support for AVX-256 for doubles are Haswell chips from 2013 (Intel Core i7-4770K & Intel Xeon E5-2697 v3)

Key differences below: (rest are mostly just replacing existing usage of Bukkit Vectors)

EDIT: This would likely supersede #1765

# Conflicts: # src/main/java/ac/grim/grimac/predictionengine/MovementCheckRunner.java # src/main/java/ac/grim/grimac/predictionengine/predictions/PredictionEngine.java # src/main/java/ac/grim/grimac/utils/nmsutil/JumpPower.java

Axionize · 2024-12-05T14:49:37Z

It seems Project Panama has abandoned allowing you to ignore safety for performance by directly calling a method with its memory address in Java 21 which somewhat embarrassingly means that all of the implementations here (with the exception of SIMD cross product on Project Panama) will probably be slower than Scalar unless we batch them. The overhead is just too high when you're doing operations that are measured in nanoseconds.

I'll probably write a NalimVector3D implementation later just to prove we can beat the JVM if we try hard enough when I have time and post benchmarks later.

Axionize · 2024-12-05T19:08:41Z

Benchmark                                      Mode  Cnt   Score    Error  Units
NalimVector3DBenchmark.benchmarkCrossProduct   avgt    3   5.622 ±  0.409  ns/op
SIMDVector3DBenchmark.benchmarkCrossProduct    avgt    3  19.063 ± 57.292  ns/op
ScalarVector3DBenchmark.benchmarkCrossProduct  avgt    3   5.466 ±  1.033  ns/op

As you can see for very simple/trivial functions like these the overhead of using these techniques is often not worth it. Even using Nalim with SIMD the cost of calling the C function is too high. For vectors batching is needed for it to be worth it.

In any case. This is not important; the goal of this draft is too add the right tooling to the project to take advantage of SIMD, and low overhead native code calls. I'll continue to update this draft with example code and project setup to make it easy to understand how to take advantage of these features for real performance gains.

SergioK29 · 2024-12-16T20:32:50Z

seems gimmicky idk, you can optimize grims performance considerably before resorting to simd operations, its not like we are doing image manipulation here that simd is great at.

Axionize added 5 commits November 30, 2024 00:24

Create initial project structure proof of concept for adding SIMD sup…

3545644

…port

Clean up Gradle configuration

52a611b

Build with Java 21 toolchain to prevent requiring downloading JDK 18 …

4f2fe70

…for contributors

Example SIMD code

c6cabe7

Java 8 support for main sources -> Java 17

1bb0bf7

Axionize marked this pull request as draft November 30, 2024 05:37

Axionize added 4 commits December 5, 2024 03:39

Expand API definitions

baba4d6

Implement Factory, interface, fallback, and Scalar and SIMD methods

4e5e202

Replace Bukkit Vector with new Vector impl

d5847f7

Axionize force-pushed the simd-upstream branch from 8d49f2a to 1f2b697 Compare December 5, 2024 08:54

Axionize added 9 commits December 5, 2024 04:44

JMH first attempt

ab86e65

JMH Working

6b2153f

Remove debug messages

0072bfd

Fix running with Vector API

d1288bf

Fix cloning vector

2337386

Fix implementation error

72b1ec2

Stash changes

f84321b

Java 21 source

d0ff246

Working native code

61a9bae

Axionize added 4 commits December 5, 2024 11:30

Nalim working tests

85ed18e

migrate to using natives and poinetrs

82bb743

Incomplete Nalim Impl

e1281c2

Incomplete Nalim Impl

513e42c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up Grim - Adding SIMD support #1841

Speeding up Grim - Adding SIMD support #1841

Axionize commented Nov 30, 2024 •

edited

Loading

AoElite commented Nov 30, 2024

Axionize commented Dec 5, 2024 •

edited

Loading

Axionize commented Dec 5, 2024 •

edited

Loading

Axionize commented Dec 5, 2024 •

edited

Loading

SergioK29 commented Dec 16, 2024

Speeding up Grim - Adding SIMD support #1841

Are you sure you want to change the base?

Speeding up Grim - Adding SIMD support #1841

Conversation

Axionize commented Nov 30, 2024 • edited Loading

AoElite commented Nov 30, 2024

Axionize commented Dec 5, 2024 • edited Loading

Axionize commented Dec 5, 2024 • edited Loading

Axionize commented Dec 5, 2024 • edited Loading

SergioK29 commented Dec 16, 2024

Axionize commented Nov 30, 2024 •

edited

Loading

Axionize commented Dec 5, 2024 •

edited

Loading

Axionize commented Dec 5, 2024 •

edited

Loading

Axionize commented Dec 5, 2024 •

edited

Loading