Skip to content

Conversation

pawel-tarasiuk-quantumz
Copy link

@pawel-tarasiuk-quantumz pawel-tarasiuk-quantumz commented Sep 17, 2025

This allows sorting arrays of more than 2^30 elements.

In merge_sort.jl, half_size_group is currently converted to Int32. As a result, sorting arrays with more than 2^30 elements raises a DivideError.

The expected behavior is that arrays should be sortable up to the limits of available memory.

Proposed fix skips the Int32 conversion when the number of elements exceeds 2^30.

MWE (works with proposed changes):

using AcceleratedKernels
using CUDA

A = CUDA.rand(2^30 + 1)
AcceleratedKernels.sort!(A)

@show issorted(Vector(A))

Current result:

ERROR: LoadError: DivideError: integer division error
Stacktrace:
  [1] div
    @ ./int.jl:295 [inlined]
  [2] div
    @ ./div.jl:345 [inlined]
  [3] div
    @ ./div.jl:49 [inlined]
  [4] merge_sort!(v::CuArray{…}, backend::CUDABackend; lt::Function, by::Function, rev::Nothing, order::Base.Order.ForwardOrdering, block_size::Int64, temp::Nothing)
    @ AcceleratedKernels .../AcceleratedKernels.jl/src/sort/merge_sort.jl:177
  [5] merge_sort!
    @ .../AcceleratedKernels.jl/src/sort/merge_sort.jl:139 [inlined]
  [6] #_sort_impl!#49
    @ .../AcceleratedKernels.jl/src/sort/sort.jl:100 [inlined]
  [7] _sort_impl!
    @ .../AcceleratedKernels.jl/src/sort/sort.jl:81 [inlined]
  [8] #sort!#48
    @ .../AcceleratedKernels.jl/src/sort/sort.jl:74 [inlined]
  [9] sort!
    @ .../AcceleratedKernels.jl/src/sort/sort.jl:70 [inlined]
 [10] sort!(v::CuArray{Float32, 1, CUDA.DeviceMemory})
    @ AcceleratedKernels .../AcceleratedKernels.jl/src/sort/sort.jl:70
 [11] top-level scope
    @ .../mwe.jl:5
 [12] include(fname::String)
    @ Main ./sysimg.jl:38
 [13] top-level scope
    @ REPL[1]:1
in expression starting at .../mwe.jl:5
Some type information was truncated. Use `show(err)` to see complete types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant