Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated CMAKE_MODULE_PATH for HIP #516

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mhilgema-amd
Copy link

ROCm 6.x has cmake modules installed under ${ROCM_PATH}/lib/cmake. This fixes the cmake configure error:

-- Found HIP: 6.2.41134
-- HIP PATH: /opt/COE_modules/rocm/rocm-6.2.2
CMake Error at CMakeLists.txt:380 (hip_add_library):
  Unknown CMake command "hip_add_library".

ROCm 6.x has cmake modules installed under ${ROCM_PATH}/lib/cmake. This fixes the cmake configure error:

-- Found HIP: 6.2.41134
-- HIP PATH: /opt/COE_modules/rocm/rocm-6.2.2
CMake Error at CMakeLists.txt:380 (hip_add_library):
  Unknown CMake command "hip_add_library".
@TysonRayJones
Copy link
Member

Hi Martin,
Thanks very much for the contribution! The CMake build has been overhauled on the v4 branch which is currently our de facto develop branch, although (afaik) we haven't yet tested HIP deployment. Would you be able to check this patch is still necessary there, and if so, prepare a PR to the v4 branch? We will eventually merge v4 -> develop -> master.
Thanks very much!

@otbrown
Copy link

otbrown commented Jan 31, 2025

My own attempts to reinstate HIP support are currently bashing up against incompatibilities in the complex datatypes, possibly related to this issue 😅

@TysonRayJones
Copy link
Member

My own attempts to reinstate HIP support are currently bashing up against incompatibilities in the complex datatypes, possibly related to this issue 😅

Ah bother! I could try to avoid the arithmetic overloads unsupported by HIP, or add additional compiler guards for HIP as necessary. I am arranging to get access to an AMD machine - I'll report back after trying your current build!

@otbrown
Copy link

otbrown commented Feb 5, 2025

Sounds good -- I was working on it here!

@TysonRayJones
Copy link
Member

To my dismay, I have totally failed in gaining access to a HIP-compatible AMD GPU - both DiRAC and ARC have only NVIDIA GPUs, and there's nobody in Oxford I can reach with an AMD machine! 😤 Alas I cannot test HIP compilation.

If indeed these cu_qcomp *= and += operator overloads are the issue, we could always forego them. There are 23 uses of cu_qcomp *= and 7 uses of cu_qcomp += in gpu_kernels.cuh which we could change to x = x * ... and x = x + ... respectively.

I'll fork your cmake-amd branch with the change - could you then test it compiles? If that is the only barrier to HIP working, then we can decide if it's a tolerable evil. If there are other blockers, we could defer HIP-support to v4.1.

@TysonRayJones
Copy link
Member

Made this PR to try compile on AMD - although it was definitely heartbreaking 😢

@otbrown
Copy link

otbrown commented Feb 27, 2025

@TysonRayJones Bad news first: still doesn't compile 😢 I've attached the rather lengthy compiler error report. Looks like a mix of being unable to disambiguate the right constructor, and still finding some inline operator overloads. Might just be a case of needing to check COMPILE_HIP=0 in some places where COMPILE_CUDA=1?

HIP-compile-error.txt

The good news: You can access ARCHER2's AMD GPUs! If you take the ARCHER2 driving test you can get access for a year. If you want to go that route and don't have a current UK academic affiliation and want to go that route just let me know, and I can chat to the training manager to make sure your account still gets approved. I can also send you my modules and cmake to get you up and running faster! If you don't have time, I'm happy to keep jumping on and recompiling myself!

@otbrown
Copy link

otbrown commented Feb 27, 2025

Oh, also I made a small modification to my cmake-amd branch to prevent the annoying redefinition of COMPILE_CUDA!

@TysonRayJones
Copy link
Member

Drat! Some of those errors I can fix, but others I need to experiment with. Getting access to ARCHER2 would great, I'll happily take the test! I believe I have "academic visitor" status at Oxford (which Simon Benjamin can corroborate), but I've no longer have access to an Oxford institutional email address (only my EPFL one, [firstname].[lastname]@epfl.ch). I'll take the test as soon as you give the green flag

@otbrown
Copy link

otbrown commented Feb 27, 2025

Have at it! Clair knows to look out for your email address 😁 GitHub (probably wisely) won't let me upload bash scripts here, but it will let me include them as code blocks.

modules.sh, source this to load all the right modules:

#!/bin/bash

module load PrgEnv-gnu
module load rocm
module load craype-accel-amd-gfx90a
module load craype-x86-milan
module load cmake

cmake.sh, run this to build QuEST with for AMD GPU on ARCHER2 (or try to anyway):

#!/bin/bash

VERBOSE_LIB_NAME=OFF
ENABLE_MULTITHREADING=ON
ENABLE_DISTRIBUTION=OFF
ENABLE_TESTING=OFF
ENABLE_EXAMPLES=OFF
ENABLE_HIP=ON
HIP_ARCH=gfx90a

cmake -B build -G "Unix Makefiles" \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_CXX_COMPILER=CC \
  -DCMAKE_C_COMPILER=cc \
  -DVERBOSE_LIB_NAME=${VERBOSE_LIB_NAME} \
  -DENABLE_TESTING=${ENABLE_TESTING} \
  -DBUILD_EXAMPLES=${ENABLE_EXAMPLES} \
  -DENABLE_MULTITHREADING=${ENABLE_MULTITHREADING} \
  -DENABLE_DISTRIBUTION=${ENABLE_DISTRIBUTION} \
  -DENABLE_HIP=${ENABLE_HIP} \
  -DCMAKE_HIP_ARCHITECTURES=${HIP_ARCH}

If you get as far as trying to run it, you can find example job submission scripts here!

@TysonRayJones
Copy link
Member

TysonRayJones commented Mar 4, 2025

@mhilgema-amd I've been experimenting with your nominated change in v4 and I've become very confused by it. My testing seems to indicate that ROCM 6.3.3 places libamdhip64.so in /opt/rocm/lib/ (rather than /opt/rocm/hip/lib/) which is incompatible with your setting CMAKE_MODULE_PATH = ${HIP_PATH}/lib/cmake/hip, independently of whether HIP_PATH is set as /opt/rocm/ or /opt/rocm/hip/.

I am using Github Actions to install hipcc 6.3.3 into a clean ubuntu-24.04 VM as per the main doc:

sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo apt install python3-setuptools python3-wheel
sudo usermod -a -G render,video $USER
wget https://repo.radeon.com/amdgpu-install/6.3.3/ubuntu/noble/amdgpu-install_6.3.60303-1_all.deb
sudo apt install ./amdgpu-install_6.3.60303-1_all.deb
sudo apt update
sudo apt install amdgpu-dkms rocm

This creates

/opt/rocm/lib/libamdhip64.so
/opt/rocm/cmake

but does not create opt/rocm/hip/. The result compiles fine with

CMAKE_MODULE_PATH = "/opt/rocm/cmake"

but fails to compile (hip::amdhip64 not found) with your PR's revision of:

CMAKE_MODULE_PATH = "/opt/rocm/lib/cmake/hip"
CMAKE_MODULE_PATH = "/opt/rocm/hip/lib/cmake/hip" # doesn't exist

Have I missed something? My testing below indicates the cmake files in .../lib/cmake/hip (rather than .../cmake) are only compatible with the shared libraries in /opt/rocm/hip/lib (when they exist), rather than /opt/rocm/lib. Was it expected that the shared libraries would move into the /hip/ subfolder with ROCM 6.x?

I've been testing on ARCHER2 with hipcc 5.2. It (somewhat irksomely) contains libamdhip64.so in both locations:

/opt/rocm/lib/libamdhip64.so
/opt/rocm/hip/lib/libamdhip64.so

and cmake files (to which we must set CMAKE_MODULE_PATH) in four directories:

/opt/rocm/cmake              # discovers /opt/rocm/lib/libamdhip64.so
/opt/rocm/lib/cmake/hip      # fails
/opt/rocm/hip/cmake          # fails
/opt/rocm/hip/lib/cmake/hip  # discovers /opt/rocm/hip/lib/libamdhip64.so

I can only ever get HIP to compile with the first and last cmake files. This seems consistent with what I'm seeing in my Github Action which does not create the /opt/rocm/hip subdirectory, and so cannot use the lib/cmake/hip cmake files.

Is there some overarching understanding I'm missing? Getting things working was an incredible chore; it seems like the location of the relevant files is a bit of a mess which has just cost me an entire night of sleep 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants