Integer hashing functions that use simplified subroutines #38

EvanBalster · 2025-09-26T23:11:38Z

Adds integer hashes whose results are invariant with respect to the system endianness. On a little-endian system, these behave identical to the existing rapidhash functions; on a big-endian system, they omit the byteswap operation.

new function(s)	description
`rapidhashInt_internal`	hash logic for 32-, 64- and 128-bit integers.
`rapidhashShort_internal`	hash logic for 8- and 16-bit integers.
`rapidhashInt8(key)`, `rapidhashInt8_withSeed(key, seed)`	hash one byte.
`rapidhashInt16(key)`, `rapidhashInt16_withSeed(key, seed)`	endian-invariant integer hash.
`rapidhashInt32(key)`, `rapidhashInt32_withSeed(key, seed)`	endian-invariant integer hash.
`rapidhashInt64(key)`, `rapidhashInt64_withSeed(key, seed)`	endian-invariant integer hash.
`rapidhashInt128(lsb, msb)`, `rapidhashInt128_withSeed(lsb, msb, seed)`	endian-invariant integer hash, fashioned after `rapid_mum`.

This is an alternative approach to @hoxxep's lovely PR #37 which was written in response to my issue #36. Their implementation and mine can be expected to produce the same machine code when compiled with full optimizations.

My approach was to write simplified special-case internal functions rather than relying on the optimizer's ability to make clever simplifications (e.g. cancelling out two byteswaps with a memcpy in between). This results in simpler, easier-to-follow source code and potentially smaller code when compiled without optimization — at the cost of repeating the logic for small hashing jobs.

EvanBalster · 2025-09-26T23:23:07Z

Here is the quick-and-dirty C++ code I used to check that my hash functions produce identical results to rapidhashNano on little-endian systems. Tested on AMD64. These tests will fail on big-endian architectures, but that's very much the point.

int main()
{
    uint64_t seeds[4] = {0, 0xDEADBEEF4C47, 0x123456789ABCDEF, 0xC15C7D2FA43F24F1};

    for (int s = 0; s < 4; ++s)
    {
        const uint64_t seed = seeds[s];
        uint64_t fails;

        std::cout << "Using seed " << seed << std::endl;


        // 8-bit test
        std::cout << "  Compare Int8" << std::endl;
        fails = 0;
        for (unsigned i = 0; i < 256; ++i)
        {
            uint8_t key = i;
            if (rapidhashNano_withSeed(&key,1,seed) != rapidhashInt8_withSeed(key,seed)) ++fails;
        }
        if (fails) std::cout << "\tMismatches: " << fails << std::endl;
        else       std::cout << "\tSUCCESS" << std::endl;

        

        // 16-bit test
        std::cout << "  Compare Int16" << std::endl;
        fails = 0;
        for (unsigned i = 0; i < 65536; ++i)
        {
            uint16_t key = i;
            if (rapidhashNano_withSeed(&key,2,seed) != rapidhashInt16_withSeed(key,seed)) ++fails;
        }
        if (fails) std::cout << "\tMismatches: " << fails << std::endl;
        else       std::cout << "\tSUCCESS" << std::endl;

        

        // 32-bit test
        std::cout << "  Compare Int32" << std::endl;
        fails = 0;
        for (uint64_t i = 0; i < 0x1000000ull; ++i)
        {
            uint32_t key = i;
            if (rapidhashNano_withSeed(&key,4,seed) != rapidhashInt32_withSeed(key,seed)) ++fails;
        }
        if (fails) std::cout << "\tMismatches: " << fails << std::endl;
        else       std::cout << "\tSUCCESS" << std::endl;


        // 64-bit test
        std::cout << "  Compare Int64" << std::endl;
        fails = 0;
        for (uint64_t i = 0; i < 0x1000000ull; ++i)
        {
            uint64_t key = i;
            if (rapidhashNano_withSeed(&key,8,seed) != rapidhashInt64_withSeed(key,seed)) ++fails;
        }
        if (fails) std::cout << "\tMismatches: " << fails << std::endl;
        else       std::cout << "\tSUCCESS" << std::endl;


        // 128-bit test
        std::cout << "  Compare Int128" << std::endl;
        fails = 0;
        for (uint64_t b = 0; b < 0x100ull; ++b)
            for (uint64_t i = 0; i < 0x100000ull; ++i)
        {
            uint64_t key[2] = {i,b};
            if (rapidhashNano_withSeed(key,16,seed) != rapidhashInt128_withSeed(i,b,seed)) ++fails;
        }
        if (fails) std::cout << "\tMismatches: " << fails << std::endl;
        else       std::cout << "\tSUCCESS" << std::endl;
    }
}

Nicoshev · 2025-09-29T13:48:33Z

@EvanBalster

The problem with this PR is that hashing logic gets duplicated across two different functions.

It is more difficult to maintain, given that changing one place requires the same change in the other one.

EvanBalster · 2025-09-30T22:50:48Z

I was following a perceived precedent — the header already contains three copies of the hashing logic in question, and it's easy enough to validate that the resulting behavior is identical. De-duplicating the existing code would restructure the whole header so I assume that's a non-starter.

My motive here was to extract some code that is friendly to dumb compilers and casual human readers. As a compromise, it would be simple to combine rapidhashInt_internal and rapidhashShort_internal into one function branching on len < 4.

System-independent integer hashes: Int8, Int16, Int32, Int64, Int128

dce50a4

EvanBalster changed the title ~~Integer hashing functions that use simplified code~~ Integer hashing functions that use simplified subroutines Sep 26, 2025

EvanBalster mentioned this pull request Sep 27, 2025

Fixed size (integer data) variants of rapidhashNano #36

Open

Merge branch 'master' into integers

b366eb5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integer hashing functions that use simplified subroutines #38

Integer hashing functions that use simplified subroutines #38

Uh oh!

EvanBalster commented Sep 26, 2025 •

edited

Loading

Uh oh!

EvanBalster commented Sep 26, 2025

Uh oh!

Nicoshev commented Sep 29, 2025

Uh oh!

EvanBalster commented Sep 30, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Integer hashing functions that use simplified subroutines #38

Are you sure you want to change the base?

Integer hashing functions that use simplified subroutines #38

Uh oh!

Conversation

EvanBalster commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EvanBalster commented Sep 26, 2025

Uh oh!

Nicoshev commented Sep 29, 2025

Uh oh!

EvanBalster commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

EvanBalster commented Sep 26, 2025 •

edited

Loading

EvanBalster commented Sep 30, 2025 •

edited

Loading