Skip to content

[BUG] int8 Unary Reductions w/ axis set producing wrong results (on GPU) #1270

@krasow

Description

@krasow

Software versions

System info:
Python : 3.13.11 | packaged by conda-forge | (main, Jan 26 2026, 23:57:06) [GCC 14.3.0]
Platform : Linux-5.15.0-174-generic-x86_64-with-glibc2.35
GPU driver : 580.105.08
GPU devices :
GPU 0 : NVIDIA A30X

Package versions:
legion : legion-25.12.0-49-g27b2c7fec-dirty (commit: 27b2c7fec5979298aca6fae935e8d857b58004eb)
legate : 26.05.00.dev0
cupynumeric : 26.05.00.dev+6.g2b1bfee8
numpy : 2.3.5
scipy : 1.16.3
numba : (failed to detect)

Legate build configuration:
build_type : Release
use_openmp : True
use_cuda : True
networks : ucx
conduit :
configure_options : --LEGATE_ARCH=arch-conda;--with-python;--with-cc=/tmp/conda-croot/legate/_build_env/bin/x86_64-conda-linux-gnu-cc;--with-cxx=/tmp/conda-croot/legate/_build_env/bin/x86_64-conda-linux-gnu-c++;--build-march=haswell;--cmake-generator=Ninja;--with-openmp;--with-cuda;--build-type=release;--with-ucx

Package details:
cuda-version : cuda-version-13.1-h2ff5cdb_3 (conda-forge)
legate : legate-26.5.0.dev0-pypi_0 (pypi)
cupynumeric : cupynumeric-26.05.00.dev6-cuda13_py313_gpu_g2b1bfee8_6 (legate-nightly)

Jupyter notebook / Jupyter Lab version

No response

Expected behavior

When running Int8 unary reductions on a certain axis/dim (with GPU), results are incorrect. cupynumeric fails to find the proper minimum on a given axis. This does not occur on a CPU only build path. Additionally, maximum can fail as well. I have not tested other unary reductions w/ axis.

This occurs on GPU builds from 25.10 (prebuilt binaries for cuNumeric.jl) & 26.05 (the nightly build I have in this legate-issue environment information above).

Running Int8 unary reductions without axis kwarg works and returns correct results.

The expected results of my reproducer below should print "mismatches 0".

Observed behavior

In our reproducer, we expect there to be no mismatches between numpy and cuynumeric.

Our reproducer creates a 256x256 array in both numpy and cupynumeric. The cupynumeric array is generated from the numpy array constructor ensuring that the arrays in both models are the same.

Then we do a unary minimum reduction on axis=0 expecting the minimum value. This should result in 256 seperate results. Multiple results was necessary to test as sometimes cupynumeric's reported minimum is correct. ~half of the 256 are different from numpy to cupynumeric.

This resulting output was ran on an A30x GPU; however, the same failure occured on a 5060 GPU as well.

>>> ref
array([-128, -128, -127, -128, -128, -128, -128, -128, -126, -122, -128,
       -128, -128, -128, -125, -127, -127, -128, -128, -127, -128, -126,
       -127, -128, -128, -128, -128, -127, -128, -128, -128, -128, -128,
       -127, -128, -128, -128, -128, -128, -128, -128, -128, -128, -127,
       -125, -128, -125, -128, -128, -128, -127, -128, -123, -128, -128,
       -128, -127, -128, -125, -127, -128, -128, -128, -127, -128, -127,
       -126, -127, -126, -128, -128, -127, -127, -127, -127, -128, -128,
       -128, -128, -127, -128, -128, -127, -128, -128, -128, -128, -127,
       -125, -127, -128, -128, -127, -126, -128, -128, -127, -128, -124,
       -128, -128, -128, -128, -127, -126, -128, -128, -127, -128, -127,
       -125, -128, -128, -128, -127, -128, -128, -125, -128, -128, -128,
       -126, -128, -127, -128, -128, -128, -128, -127, -126, -127, -128,
       -127, -127, -125, -128, -128, -125, -127, -128, -128, -125, -127,
       -126, -128, -128, -128, -127, -127, -128, -126, -128, -128, -128,
       -128, -126, -126, -127, -128, -127, -126, -127, -127, -127, -128,
       -127, -128, -128, -127, -128, -127, -127, -128, -127, -128, -127,
       -126, -126, -127, -128, -128, -127, -127, -128, -128, -126, -127,
       -128, -128, -128, -128, -128, -128, -126, -128, -128, -128, -128,
       -128, -128, -128, -127, -128, -127, -128, -128, -128, -128, -128,
       -127, -128, -128, -128, -125, -128, -128, -127, -128, -128, -126,
       -128, -128, -127, -127, -126, -128, -125, -128, -125, -128, -128,
       -128, -128, -128, -126, -126, -128, -128, -126, -128, -128, -127,
       -127, -126, -128, -128, -128, -125, -127, -126, -127, -127, -127,
       -128, -128, -125], dtype=int8)
>>> got
array([-128, -126, -123, -126, -128, -128, -127, -126, -126, -121, -110,
       -127, -128, -127, -122, -124, -127, -119, -127, -126, -128, -114,
       -126, -124, -128, -127, -125, -122, -128, -118, -128, -128, -128,
       -123, -122, -119, -128, -124, -122, -117, -128, -112, -126, -121,
       -125, -120, -124, -128, -128, -128, -120, -128, -123, -120, -128,
       -122, -127, -121, -116,  -94, -128, -119, -113, -122, -128, -127,
       -125, -125, -126, -121, -112, -127, -127, -127, -126, -127, -128,
       -123, -113, -109, -128, -121, -125, -127, -128, -127, -127, -120,
       -125,  -93, -118, -108, -127, -126, -128, -123, -127, -122,   -1,
         -1, -128, -124,   -1,   -1, -126, -124,   -1,   -1, -128, -127,
         -1,   -1, -128, -115,   -1,   -1, -128, -107,   -1,   -1, -128,
       -125,   -1,   -1, -128, -106, -127, -117, -127, -114, -111, -124,
       -127, -114, -117, -120, -128, -112, -120, -125, -128, -125, -119,
       -126, -128, -119, -122, -124, -127, -125, -124, -125, -128, -122,
       -123, -112, -126, -125, -125, -121, -126, -125, -117, -114, -128,
        -94, -118, -122, -127, -110, -118, -123, -128, -114,  -96, -125,
       -126, -115, -126, -124, -128, -110, -124, -128, -128, -124, -126,
       -116, -128, -113, -128, -124, -128, -124, -128,   -1, -128, -122,
       -128,   -1, -128, -125, -119,   -1, -128, -126, -128,   -1, -128,
       -110, -126,   -1, -128, -124, -125,   -1, -127, -104, -125,   -1,
       -128, -122, -127,   -1, -126, -105, -125, -127, -125, -124, -128,
         -1, -128, -127, -116, -126, -128, -126, -126,   -1, -128, -110,
       -121,   -1, -128, -128, -125, -123, -127, -126, -124, -123, -127,
       -127, -126, -125], dtype=int8)

Example code or instructions

import numpy as onp
import cupynumeric as np

host = onp.random.randint(-128, 128, size=(256, 256), dtype=onp.int8)
a = np.array(host)
got = onp.asarray(a.min(axis=0))
ref = host.min(axis=0)
print("mismatches:", int((got != ref).sum()))

Stack traceback or browser console output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions