Skip to content
This repository was archived by the owner on Jan 26, 2024. It is now read-only.
This repository was archived by the owner on Jan 26, 2024. It is now read-only.

Implement float/double atomicMin/Max in terms of integer atomics #65

@FreddieWitherden

Description

@FreddieWitherden

Currently, HIP implements atomicMin/Max for single and double precision floating point values as CAS loops. However, in fast math scenarios, on architectures with hardware support for signed/unsigned integer atomicMin/Max a better implementation is possible. As per https://stackoverflow.com/a/72461459 for single precision:

__device__ __forceinline__ float atomicMinFloat(float* addr, float value) {
    float old;
    old = !signbit(value) ? __int_as_float(atomicMin((int*)addr, __float_as_int(value))) :
        __uint_as_float(atomicMax((unsigned int*)addr, __float_as_uint(value)));

    return old;
}

__device__ __forceinline__ float atomicMaxFloat(float* addr, float value) {
    float old;
    old = !signbit(value) ? __int_as_float(atomicMax((int*)addr, __float_as_int(value))) :
        __uint_as_float(atomicMin((unsigned int*)addr, __float_as_uint(value)));

    return old;
}

Better implementations still are possible on NVIDIA using Opportunistic Warp-level Programming wherein one first looks to see if any other active threads in the warp have the same addr, and if so first do the reduction at the warp level. This greatly cuts down the number of RMW operations which leave the core when there is contention. I suspect a similar idea can carry over to AMD GPUs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions