Skip to content

amacbride/hip-torch-nl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hip_torch_nl

PyTorch C++ extension for GPU-accelerated neighbor list computation on AMD hardware (HIP/ROCm). Unlike a standalone HIP library, this extension shares PyTorch's HIP runtime, allocator, and stream, so the neighbor list can be composed with other PyTorch GPU ops in the same process without context conflicts.

Requirements

  • ROCm 5.0+ (tested with 6.0 and 7.0)
  • PyTorch built with ROCm support
  • A C++17 toolchain and hipcc on PATH (provided by ROCm)

Installation

# Tell HIP which GPU architecture to target (example: gfx1102 / RX 7600)
export HSA_OVERRIDE_GFX_VERSION=11.0.0

pip install -e .

The build relies on PyTorch's CUDAExtension machinery, which routes through hipcc when ROCm is detected. setup.py looks for /opt/rocm, /opt/rocm-7.0.1, and /opt/rocm-6.0.0 and points CUDA_HOME at whichever exists.

Usage

import torch
from hip_torch_nl import hip_torch_nl, HIP_TORCH_NL_AVAILABLE

assert HIP_TORCH_NL_AVAILABLE, "extension not built"

positions = torch.rand(1000, 3, device="cuda") * 10.0
cell = torch.eye(3, device="cuda") * 10.0
pbc = torch.tensor([True, True, True], device="cuda")
cutoff = torch.tensor(3.0)

mapping, shifts = hip_torch_nl(positions, cell, pbc, cutoff)
# mapping: (2, n_pairs) int64 — directed pairs (i, j) with i != j or shift != 0
# shifts:  (n_pairs, 3)         — integer cell shifts S such that
#                                 D = pos[j] - pos[i] + S @ cell

API

hip_torch_nl(
    positions, cell, pbc, cutoff,
    sort_id=False,
    compatible_mode=True,
    algorithm="auto",
)
  • positions: (n_atoms, 3) float tensor on GPU.
  • cell: (3, 3) row-vector cell matrix on GPU (row i is the i-th cell vector).
  • pbc: (3,) bool tensor on GPU.
  • cutoff: scalar tensor or float.
  • sort_id: if True, sort the returned pairs by their first index.
  • compatible_mode: if True (default), filter results to exactly match torch_sim.neighbors.standard_nl. Set to False for the raw kernel output (and to remove the torch_sim dependency at call time).
  • algorithm: "auto" (default), "direct"/"v1", or "cell_list"/"v2". "auto" picks cell_list above 15 000 atoms.

The convenience wrappers hip_torch_nl_v1 and hip_torch_nl_v2 force the respective algorithm.

Algorithms

Variant Complexity Memory Practical limit (8 GB VRAM)
V1 (direct) O(n²) brute force, MIC high — pairs buffer scales with n² ~16k atoms
V2 (cell_list) O(n) cell list, MIC int32 pair indices, density-based estimation ~37k atoms

Both algorithms apply the minimum image convention. They produce identical pair sets when cutoff is smaller than half the smallest cell height; above that, MIC is not appropriate for either implementation.

Repository layout

hip_torch_nl/
├── __init__.py                     # Python interface
├── csrc/
│   ├── hip_neighborlist.cpp        # pybind11 bindings + algorithm dispatch
│   └── hip_neighborlist_kernel.cu  # HIP kernels (V1 brute force, V2 cell list)
tests/                              # pytest correctness suite
benchmarks/run_benchmarks.py        # timing harness

Tests

pip install -e ".[test]"
pytest

The suite verifies output against a vectorized brute-force MIC reference implemented in tests/conftest.py. It covers:

  • random positions under full, partial, and zero PBC
  • V1 vs V2 agreement
  • FCC nearest-neighbor coordination number (12)
  • pair-list symmetry, sort_id, and dtype preservation
  • input validation (CPU tensors, bad shapes, missing extension)

Tests skip cleanly if the extension is not built or no HIP/CUDA device is available.

Benchmarks

python -m benchmarks.run_benchmarks --sizes 1000 4000 16000
python -m benchmarks.run_benchmarks --include-reference  # also time standard_nl

The script sweeps system sizes, runs V1, V2, and the auto selector, and reports median and best-of wall-clock time. Pass --include-reference to also time torch_sim.neighbors.standard_nl on CPU as a baseline. Use --cutoff and --density to control the test geometry; the cutoff must stay below half the cubic box height (the script enforces this).

License

BSD-3-Clause

About

PyTorch HIP kernel for neighbor-list calculations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors