Integer hashing functions that compile to optimal assembly #37

hoxxep · 2025-09-26T13:46:26Z

@Nicoshev totally up to you whether you want to include these or not. I'm also happy to adjust docs or rename things per your preference. I wrote them for the godbolt example, so figured it's worth at least having a PR to refer back to if someone needs it in future.

I assume we'll leave it up to the end user to write their own integer-tuple hash functions, as they should simply be using rapidhash_to_le_* and building a byte array to then be hashed with a constant length.

Fixed size (integer data) variants of rapidhashNano #36
and thanks to @EvanBalster for the suggestion

hoxxep · 2025-09-26T13:51:44Z

Actually, this needs some more preprocessor logic per rapid_read64, and then we can simplify the rapid_read definitions.

EvanBalster · 2025-09-26T21:56:08Z

While I don't think this is an inelegant solution, I am a bit leery of relying on the optimizer to consolidate two byteswaps down to a no-op. I'm putting together an alternative take at this that uses a new _internal function accepting a pair of uint64_t instead of a buffer.

EDIT: I made a pull request with my alternative approach, linked just below. Although I feel a bit silly "solving a solved problem", dissecting how rapidhash works on these small inputs was a fun exercise.

Nicoshev · 2025-09-29T13:52:27Z

@hoxxep This PR is not bad. Concerns are:

As you said, most modern compilers optimize the little-endian case
It is too much code for just reading and writing variables
Most hash maps use the identity function when the key is an integer. The important case are strings and byte streams

hoxxep · 2025-09-29T14:16:17Z

In response to 1,2,3:

Compilers should generate optimal code on both big- and little-endian platforms; on big-endian platforms the double byte-swap should be easy for the compiler to prove and optimise away.
I'll shorten the docstrings if that helps? I think it makes rapid_read easier to understand as the big/little endian code is clearly encapsulated, and makes the portable to-little-endian logic re-usable by the user (useful when they are hashing integers or more complex types).
True, but if the low bits have low entropy there's value in hashing, such as a nanoseconds field that is rounded to microsecond precision (MacOS timestamps etc). Bloom filters and hyperloglog are other examples where hash quality matters.

I have no skin in the game here though, totally understand if it's not deemed necessary.

EvanBalster · 2025-09-30T23:10:44Z

Most hash maps use the identity function when the key is an integer. The important case are strings and byte streams

This is unsuitable for many use cases, with and without hashmaps.

A concrete example... in procedural generation we often use multi-dimensional coordinate packed into an integer, which may differ by just a single bit between cells. This makes the avalanche effect highly desirable: hashes of contiguous coordinates are used as a pseudorandom noise function. These coordinates will often differ from their neighbors by power-of-two increments, which will produce many collisions in an identity hashmap.

Integer hashing functions that compile to optimal assembly

629c176

hoxxep mentioned this pull request Sep 26, 2025

Fixed size (integer data) variants of rapidhashNano #36

Open

hoxxep force-pushed the integer-hash-functions branch from 5cd114c to 4150b43 Compare September 26, 2025 14:00

EvanBalster mentioned this pull request Sep 26, 2025

Integer hashing functions that use simplified subroutines #38

Open

hoxxep force-pushed the integer-hash-functions branch from 4150b43 to c2a86d0 Compare September 29, 2025 14:17

Make rapid_read use the rapid_to_le methods

9d742d7

hoxxep force-pushed the integer-hash-functions branch from c2a86d0 to 9d742d7 Compare September 29, 2025 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integer hashing functions that compile to optimal assembly #37

Integer hashing functions that compile to optimal assembly #37

Uh oh!

hoxxep commented Sep 26, 2025

Uh oh!

hoxxep commented Sep 26, 2025 •

edited

Loading

Uh oh!

EvanBalster commented Sep 26, 2025 •

edited

Loading

Uh oh!

Nicoshev commented Sep 29, 2025

Uh oh!

hoxxep commented Sep 29, 2025 •

edited

Loading

Uh oh!

EvanBalster commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Integer hashing functions that compile to optimal assembly #37

Are you sure you want to change the base?

Integer hashing functions that compile to optimal assembly #37

Uh oh!

Conversation

hoxxep commented Sep 26, 2025

Uh oh!

hoxxep commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EvanBalster commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Nicoshev commented Sep 29, 2025

Uh oh!

hoxxep commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EvanBalster commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hoxxep commented Sep 26, 2025 •

edited

Loading

EvanBalster commented Sep 26, 2025 •

edited

Loading

hoxxep commented Sep 29, 2025 •

edited

Loading