GPU-acceleration routines for DifferentialEquations.jl and the broader SciML scientific machine learning ecosystem
-
Updated
Aug 4, 2025 - Julia
GPU-acceleration routines for DifferentialEquations.jl and the broader SciML scientific machine learning ecosystem
Efficient implementations of Merge Sort and Bitonic Sort algorithms using CUDA for GPU parallel processing, resulting in accelerated sorting of large arrays. Includes both CPU and GPU versions, along with a performance comparison.
Optimized Parallel Sum program demonstrating CPU vs GPU performance
Scaling Unet in Tensorflow
Introduction to the concept of automatic experiment parallelization
Comprehensive machine learning benchmarking framework for AMD MI300X GPUs on Dell PowerEdge XE9680 hardware. Supports both inference (vLLM) and training workloads with containerized test suites, hardware monitoring, and analysis tools for performance, power efficiency, and scalability research across the complete ML pipeline.
Add a description, image, and links to the gpu-parallelism topic page so that developers can more easily learn about it.
To associate your repository with the gpu-parallelism topic, visit your repo's landing page and select "manage topics."