CUDA tasking using default alignment constraint for all Input(s) and Output(s)

PR #119 leverages our CUDA.jl tasking system. 

However, [launch(...) ](https://github.com/JuliaLegate/cuNumeric.jl/blob/6ae32dcc63153fec6346720ecf0a713201b8734e/src/cuda/cuda_ptx_task.jl#L75) will call [Legate.default_alignment(...) ](https://github.com/JuliaLegate/Legate.jl/blob/9a78a694cd4bf399973e0883f74d6610262e7842/src/api/tasks.jl#L56-L71) on all inputs and outputs.

This is fine for standard elementwise operations like:
```julia
a .+ b
```

However, it is inefficient for stencil computations where inputs are shifted views of the same array.

For example, consider a 6×6 grid split across two tasks, where task 0 owns the top half and task 1 owns the bottom half:
```julia
grid   = zeros(6, 6)
center = grid[2:5, 2:5]   # 4×4 interior
south  = grid[3:6, 2:5]   # shifted down by 1 row
```
With `default_alignment`, `center` and `south` get identical tile boundaries: 
```julia
# task 0
center_t0 = grid[2:3, 2:5]
south_t0  = grid[3:4, 2:5]

# task 1
center_t1 = grid[4:5, 2:5]
south_t1  = grid[5:6, 2:5]
```

This is a problem for stencil computations because the shifted views require halo data from neighboring partitions. For a Jacobi stencil using all four neighbors, this creates halo copies for north, south, east, and west every iteration.

Instead, this should be represented with a bloat constraint:
```julia
bloat(source=center, bloat=south, low=0, high=1)
```

This gives each task the necessary overlap in its physical instance:
```julia
# task 0
center_t0 = grid[2:3, 2:5]
south_t0  = grid[3:5, 2:5]

# task 1
center_t1 = grid[4:5, 2:5]
south_t1  = grid[4:6, 2:5]
```
Now the required halo rows are included by construction. The overlap is handled once during partitioning rather than copied every iteration.

For example, with 1000 Jacobi iterations over a 1000×1000 grid and four neighbors:
- default_alignment: pays halo-copy cost 4000 times
- bloat: pays overlap cost once at partitioning time

So we likely need a way for CUDA.jl tasks to specify Legate partitioning constraints other than `default_alignment`.

See more details [here](https://nv-legate.github.io/legate/api/cpp/generated/group/group__partitioning.html) about the various constraints in legate. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA tasking using default alignment constraint for all Input(s) and Output(s) #130

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

CUDA tasking using default alignment constraint for all Input(s) and Output(s) #130

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions