A GPU-accelerated implementation of the Projected Gauss-Seidel (PGS) method for solving constrained linear systems with support for sparse matrices and multi-GPU execution.
Supports both NVIDIA CUDA and AMD ROCm/HIP backends.
- Fast GPU implementation of the Projected Gauss-Seidel method
- Dual backend support: NVIDIA CUDA and AMD ROCm/HIP (experimental)
- Support for sparse matrices (CSR format)
- Multi-GPU execution for large problems
- DLPack integration for seamless interoperability with deep learning frameworks
- Python bindings with JAX integration
- SOR (Successive Over-Relaxation) support through relaxation parameter
For NVIDIA GPUs:
- CUDA Toolkit 11.0 or later
- CMake 3.18 or later
- A C++14 compatible compiler
- Python 3.10 or later (for Python bindings)
- JAX (for JAX integration)
For AMD GPUs (Experimental):
- ROCm 5.0 or later with HIP
- rocSPARSE library
- CMake 3.18 or later
- A C++14 compatible compiler (hipcc)
- Python 3.10 or later (for Python bindings)
- JAX (for JAX integration)
See docs/ROCM.md for detailed ROCm setup instructions.
- Clone the repository:
git clone https://github.com/flferretti/cupgs.git
cd cupgs- Create a build directory and run CMake:
mkdir -p build && cd build
cmake -DCMAKE_INSTALL_PREFIX=<install_prefix> ..
make -j
make install- Install the Python package:
pip install -e .- Build with ROCm support:
mkdir -p build && cd build
cmake -DPGS_USE_ROCM=ON \
-DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
-DCMAKE_INSTALL_PREFIX=<install_prefix> \
..
make -j
make install- Install the Python package:
export PGS_USE_ROCM=1
pip install -e .See docs/ROCM.md for more details on ROCm support.
- Install the Python package:
pip install -e .Check the examples directory for more detailed usage examples:
examples/benchmark.py: Performance benchmarking scriptexamples/jax_example.py: Example of using the solver with JAXexamples/poisson_cuda_example.cu: Example solving the 2D Poisson equation with CUDA
This project is licensed under the GPLv3 License - see the LICENSE file for details.