Implement float/double atomicMin/Max in terms of integer atomics

Currently, HIP implements atomicMin/Max for single and double precision floating point values as CAS loops.  However, in fast math scenarios, on architectures with hardware support for signed/unsigned integer atomicMin/Max a better implementation is possible.  As per https://stackoverflow.com/a/72461459 for single precision:
```c
__device__ __forceinline__ float atomicMinFloat(float* addr, float value) {
    float old;
    old = !signbit(value) ? __int_as_float(atomicMin((int*)addr, __float_as_int(value))) :
        __uint_as_float(atomicMax((unsigned int*)addr, __float_as_uint(value)));

    return old;
}

__device__ __forceinline__ float atomicMaxFloat(float* addr, float value) {
    float old;
    old = !signbit(value) ? __int_as_float(atomicMax((int*)addr, __float_as_int(value))) :
        __uint_as_float(atomicMin((unsigned int*)addr, __float_as_uint(value)));

    return old;
}
```

Better implementations still are possible on NVIDIA using [Opportunistic Warp-level Programming](https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/) wherein one first looks to see if any other active threads in the warp have the same `addr`, and if so first do the reduction at the warp level.  This greatly cuts down the number of RMW operations which leave the core when there is contention.  I suspect a similar idea can carry over to AMD GPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement float/double atomicMin/Max in terms of integer atomics #65

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement float/double atomicMin/Max in terms of integer atomics #65

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions