pytorch README.md path to rocm updated#1
Conversation
|
Some more notes about compiling with Windows. These may have been addressed in another branch already, but I am including them here so I don't lose them. Step 0: Prep venvIt is highly recommended to use a virtual environment unless if in a throw-away
Step 1: Preparing sources
then launch new shell (this is to enable long paths, otherwise you will hit the error " Step 2: Install DepsPython deps: Step 3: Setup and Build
Note: added, which probably indicates that this is not the correct branch to be using, as I'm sure you would have addressed the issue at the source already. |
|
... for completeness, here is And then...Building the wheel... (will be saved to external-builds/src/dist)
Building torchaudio and visionAnyway, you'll need to get rid of all trace of HIP and such like, suggest:
Then, assuming you have HIP installed, either remove it from your You need to checkout the branches that match the pytorch you just built. To double check that it is there, and working:
If this is... 2.6.0a0 (which it is for me, because I'm building from the wrong
You should now have the following wheels. I'm leaving my absolute path in (again) just for clarity. Now to check if it all works
Epic fail
Help!!!I manually nopped a few checks so comfyui would run, but something is NQR (not quite right). dev.type == 'hip', btw. |
|
update I cherrypicked some scripts, and am trying again with those. Seems to give me 2.7.0 install scripts, but the latest patches. |
|
I would highly recommend you build from TheRock main branch. These forks are not really maintained and you won't get much support from here. Follow the instructions to build pytorch from there. |
|
Hi @jammm ... I'll check it out. I feel I got close with torch 2.7.0 using 5d61dbb#diff-625c093a3a6309b1e5adae73696da1c7c693ffffc65812f3d57924ee07ff641b but apparently I didn't cherrypick all the patches and ended up without aotriton compiled in. I am guessing this fork was all about the gfx1151? lshqqytiger has apparently build some stand-alone wheels for older AMDs, so maybe he will sort it all out and I can go back to being lazy. I would rather like gfx1030 and gfx1100 support in one wheel. :) |
|
yeah mainly for gfx1151. Though technically it could support other archs. |
|
Totally, and it supported the gfx1100 great. It's only failings were that it didn't have LAPACK for some cpu tensor stuff used for automatic masking in Wan2GP (easily fixed), and that it got really sulky about large Conv3D's (which I believe is something of a traditional AMD thing involving lengths or widths above 512, and easily solved by using VAE Tiled Decoding). That and CPU Text Encoding being really slow without Triton to compile CPU code. However having just read ROCm#409 I can see how you have all been working super hard, and how much everyone has contributed. It's also great to see AMD and non-AMD (and ex-AMD) people getting together to make community development possible. btw, I also find boost wierd... but there are wonderful things in there. Though those mostly end up being mainlined into the C++ standard or extracted into things like Again, excellent work. I released guides on /r/comfyui for using the wheels on ComfyUI and Wan2GP, but there are only a handful of AMD regulars in there. |
|
We (the collective community) did some more work on this, and we have had a working 6.5 for gfx1100 and gfx1030 for a while now, with flash_attention, sage_attention, and triton. the sordid history of that development is at patientx/ComfyUI-Zluda#170 (comment) and the resultant automated script is at https://github.com/user-attachments/files/21155122/patientx-native-rocm-3.zip Not sure if any of that would be of interest to you or scottt. I also took some time to expand ADLX's Pybind demo into a more conventional python module https://pypi.org/project/ADLXPybind/ that can build itself from an That was a necessary step to create https://pypi.org/project/pynvml-amd-windows/ which is a drop-in replacement (actually, it hijacks the pynvml module name) for As I say, I have no idea if any of this is of interest or use to you, but I figure we owe you and scottt for your work and for inspiring the rest of us to try a little harder to contribute. Hopefully some if it is of some use to you. Obviously we also owe "other" scott and the rest of the "TheRock" crew an incalculable debt, and those at AMD who are working with the community on the ROCm project, but I'm not sure they would have any use for our small efforts. And if you would be open to it, I might want to have chat about whether it would be practical for me to attempt port nanchaku (inference engine for 4-bit neural networks quantized with SVDQuant) to HIP. |
|
oh man, thanks a lot for the v6.5.0rc-pytorch-gfx110x wheels, it has been quite a decent speed up but most importantly fixed most of my vae decode issues and can fully use the vram on my 7900xt. |
|
@Ginxchan If you have a 7900xt (which I assume is basically the same as the 7900xtx), then you'll probably (read: definately) get faster results with ZLUDA + Triton + sageattention, and with less memory use. Though this wheel is very handy for nodes that don't work under ZLUDA (well, I've only found one, which was a stem-maker for songs, but I'm sure there's more). Note: you can also install Triton and sageattention for this native wheel, but it will still not be as efficient. No idea why. |
Running these recently added Python unit tests on CI will help encourage good development practices (see also ROCm#750). I just noticed older tests already running here: https://github.com/ROCm/TheRock/blob/13ef7021af1f183e9344ec177ccb79c16426385e/.github/workflows/build_linux_packages.yml#L120-L123 Sample logs from https://github.com/ROCm/TheRock/actions/runs/15339038605/job/43221678538#step:12:12: ``` Run ctest --test-dir build --output-on-failure Internal ctest changing into directory: /__w/TheRock/TheRock/build Test project /__w/TheRock/TheRock/build Start 1: build_tools_fileset_tool_test 1/25 Test #1: build_tools_fileset_tool_test ......................... Passed 0.30 sec Start 2: build_tools_artifacts_test 2/25 Test ROCm#2: build_tools_artifacts_test ............................ Passed 0.06 sec Start 3: therock-validate-shared-lib-librocm-openblas.so 3/25 Test ROCm#3: therock-validate-shared-lib-librocm-openblas.so ....... Passed 0.04 sec Start 4: therock-validate-shared-lib-libamd.so 4/25 Test ROCm#4: therock-validate-shared-lib-libamd.so ................. Passed 0.03 sec ... ``` We'll probably want to run the Python unit tests much earlier in the build, but this is better than not running them anywhere. We could also run these via pytest instead of ctest.
…s gtest folder (ROCm#1398) This reverts commit 35444a3. See discussion at ROCm#1248 (comment). We suspect this is causing flaky build failures on Windows gfx1151 like https://github.com/ROCm/TheRock/actions/runs/17471481134/job/49620818900?pr=1349#step:11:36751. ``` [MIOpen] [894/920] Building CXX object test/gtest/CMakeFiles/miopen_gtest.dir/smoke_solver_ConvAsmImplicitGemmGTCDynamicXdlopsNHWC_bf16.cpp.obj [MIOpen] FAILED: test/gtest/CMakeFiles/miopen_gtest.dir/smoke_solver_ConvAsmImplicitGemmGTCDynamicXdlopsNHWC_bf16.cpp.obj [MIOpen] ccache B:\build\core\clr\dist\lib\llvm\bin\clang++.exe -DBOOST_ALL_NO_LIB=1 -DBOOST_ATOMIC_NO_LIB -DBOOST_FILESYSTEM_NO_LIB -DBOOST_SYSTEM_NO_LIB -DHIP_COMPILER_FLAGS=" -x hip -D__HIP_PLATFORM_AMD__=1 -DUSE_PROF_API=1 C:/2E1C510A-F3EC-4287-AB5A-59025DAF1B15/build/core/clr/dist/lib/llvm/lib/clang/20/lib/windows/clang_rt.builtins-x86_64.lib --hip-link C:/2E1C510A-F3EC-4287-AB5A-59025DAF1B15/build/core/clr/dist/lib/llvm/lib/clang/20/lib/windows/clang_rt.builtins-x86_64.lib -fno-offload-uniform-block " -DMIOPEN_BETA_API=1 -DMIOPEN_BUILD_TESTING -DMIOPEN_TEST_DRIVER_MODE=1 -DNOMINMAX -DUSE_PROF_API=1 -D__HIP_PLATFORM_AMD__=1 -IB:/build/math-libs/BLAS/hipBLAS/stage/include -IB:/build/math-libs/BLAS/hipBLAS-common/stage/include -IB:/build/math-libs/rocRAND/stage/include -IC:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/test/gtest/.. -IC:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/test/gtest/../../src/kernels -IC:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/src/include -IB:/build/ml-libs/MIOpen/build/include -IC:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/include -isystem B:/build/base/half/stage/include -isystem B:/build/third-party/frugally-deep/dist/include -isystem B:/build/third-party/FunctionalPlus/dist/include -isystem B:/build/third-party/eigen/dist/include/eigen3 -isystem B:/build/third-party/nlohmann-json/dist/include -isystem B:/build/math-libs/BLAS/rocBLAS/dist/include -isystem B:/build/core/clr/dist/include -isystem B:/build/third-party/googletest/dist/include -isystem B:/build/compiler/amd-comgr/dist/include -isystem B:/build/third-party/boost/cmake_project/dist/include/boost-1_87 -isystem B:/build/third-party/sysdeps/windows/sqlite3/build/dist/lib/rocm_sysdeps/include -isystem B:/build/third-party/sysdeps/windows/bzip2/build/dist/lib/rocm_sysdeps/include -DWIN32 -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_WARNINGS -DNOMINMAX -fms-extensions -fms-compatibility -D_ENABLE_EXTENDED_ALIGNED_STORAGE -Wno-documentation-unknown-command -Wno-documentation-pedantic -Wno-unused-command-line-argument -Wno-explicit-specialization-storage-class -Wno-ignored-attributes -Wno-unknown-attributes -Wno-duplicate-decl-specifier --hip-path=B:/build/core/clr/dist --hip-device-lib-path=B:/build/core/clr/dist/lib/llvm/amdgcn/bitcode -O3 -DNDEBUG -std=c++20 -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -U__HCC__ -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-ignored-qualifiers -Wno-sign-compare -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-conversion -Wno-double-promotion -Wno-exit-time-destructors -Wno-extra-semi -Wno-extra-semi-stmt -Wno-float-conversion -Wno-gnu-anonymous-struct -Wno-gnu-zero-variadic-macro-arguments -Wno-missing-prototypes -Wno-nested-anon-types -Wno-option-ignored -Wno-padded -Wno-return-std-move-in-c++11 -Wno-shorten-64-to-32 -Wno-sign-conversion -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-weak-vtables -Wno-covered-switch-default -Wno-unused-result -Wno-unsafe-buffer-usage -Wno-deprecated-declarations -Wno-shadow-uncaptured-local -Wno-global-constructors -Wno-reserved-identifier -Wno-zero-as-null-pointer-constant -Wno-ignored-attributes -Wno-deprecated -Wno-incompatible-pointer-types -Wno-old-style-cast -Wno-unknown-attributes -Wno-microsoft-cpp-macro -Wno-microsoft-enum-value -Wno-language-extension-token -Wno-c++11-narrowing -Wno-float-equal -Wno-redundant-parens -Wno-format-nonliteral -Wno-unused-template -Wno-comma -Wno-suggest-destructor-override -Wno-switch-enum -Wno-shift-sign-overflow -Wno-suggest-override -Wno-inconsistent-missing-destructor-override -Wno-cast-function-type -Wno-nonportable-system-include-path -Wno-documentation -Wno-deprecated-builtins -Wno-enum-constexpr-conversion -Wno-unused-value -Wno-unused-parameter -Wno-missing-noreturn -Wno-tautological-constant-out-of-range-compare -Wno-c++20-extensions -Wno-unique-object-duplication -Wno-switch-default -Wno-nontrivial-memcall -fms-extensions -fms-compatibility -Wno-undef -U__LP64__ -x hip --offload-arch=gfx1151 -MD -MT test/gtest/CMakeFiles/miopen_gtest.dir/smoke_solver_ConvAsmImplicitGemmGTCDynamicXdlopsNHWC_bf16.cpp.obj -MF test\gtest\CMakeFiles\miopen_gtest.dir\smoke_solver_ConvAsmImplicitGemmGTCDynamicXdlopsNHWC_bf16.cpp.obj.d -o test/gtest/CMakeFiles/miopen_gtest.dir/smoke_solver_ConvAsmImplicitGemmGTCDynamicXdlopsNHWC_bf16.cpp.obj -c C:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/test/gtest/smoke_solver_ConvAsmImplicitGemmGTCDynamicXdlopsNHWC_bf16.cpp [MIOpen] PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. [MIOpen] Stack dump: [MIOpen] 0. Program arguments: C:\\2E1C510A-F3EC-4287-AB5A-59025DAF1B15\\build\\core\\clr\\dist\\lib\\llvm\\bin\\clang++.exe -cc1 -triple x86_64-pc-windows-msvc19.44.35215 -aux-triple amdgcn-amd-amdhsa -emit-obj -mincremental-linker-compatible -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name smoke_solver_ConvAsmImplicitGemmGTCDynamicXdlopsNHWC_bf16.cpp -mrelocation-model pic -pic-level 2 -mframe-pointer=none -relaxed-aliasing -fmath-errno -fno-rounding-math -mconstructor-aliases -funwind-tables=2 -target-cpu x86-64 -tune-cpu generic -fdebug-compilation-dir=B:\\build\\ml-libs\\MIOpen\\build -fcoverage-compilation-dir=B:\\build\\ml-libs\\MIOpen\\build -resource-dir C:\\2E1C510A-F3EC-4287-AB5A-59025DAF1B15\\build\\core\\clr\\dist\\lib\\llvm\\lib\\clang\\20 -dependency-file test\\gtest\\CMakeFiles\\miopen_gtest.dir\\smoke_solver_ConvAsmImplicitGemmGTCDynamicXdlopsNHWC_bf16.cpp.obj.d -MT test/gtest/CMakeFiles/miopen_gtest.dir/smoke_solver_ConvAsmImplicitGemmGTCDynamicXdlopsNHWC_bf16.cpp.obj -sys-header-deps -internal-isystem C:\\2E1C510A-F3EC-4287-AB5A-59025DAF1B15\\build\\core\\clr\\dist\\lib\\llvm\\lib\\clang\\20\\include\\cuda_wrappers -idirafter B:/build/core/clr/dist\\include -include __clang_hip_runtime_wrapper.h -isystem B:/build/base/half/stage/include -isystem B:/build/third-party/frugally-deep/dist/include -isystem B:/build/third-party/FunctionalPlus/dist/include -isystem B:/build/third-party/eigen/dist/include/eigen3 -isystem B:/build/third-party/nlohmann-json/dist/include -isystem B:/build/math-libs/BLAS/rocBLAS/dist/include -isystem B:/build/core/clr/dist/include -isystem B:/build/third-party/googletest/dist/include -isystem B:/build/compiler/amd-comgr/dist/include -isystem B:/build/third-party/boost/cmake_project/dist/include/boost-1_87 -isystem B:/build/third-party/sysdeps/windows/sqlite3/build/dist/lib/rocm_sysdeps/include -isystem B:/build/third-party/sysdeps/windows/bzip2/build/dist/lib/rocm_sysdeps/include -D BOOST_ALL_NO_LIB=1 -D BOOST_ATOMIC_NO_LIB -D BOOST_FILESYSTEM_NO_LIB -D BOOST_SYSTEM_NO_LIB -D "HIP_COMPILER_FLAGS= -x hip -D__HIP_PLATFORM_AMD__=1 -DUSE_PROF_API=1 C:/2E1C510A-F3EC-4287-AB5A-59025DAF1B15/build/core/clr/dist/lib/llvm/lib/clang/20/lib/windows/clang_rt.builtins-x86_64.lib --hip-link C:/2E1C510A-F3EC-4287-AB5A-59025DAF1B15/build/core/clr/dist/lib/llvm/lib/clang/20/lib/windows/clang_rt.builtins-x86_64.lib -fno-offload-uniform-block " -D MIOPEN_BETA_API=1 -D MIOPEN_BUILD_TESTING -D MIOPEN_TEST_DRIVER_MODE=1 -D NOMINMAX -D USE_PROF_API=1 -D __HIP_PLATFORM_AMD__=1 -I B:/build/math-libs/BLAS/hipBLAS/stage/include -I B:/build/math-libs/BLAS/hipBLAS-common/stage/include -I B:/build/math-libs/rocRAND/stage/include -I C:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/test/gtest/.. -I C:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/test/gtest/../../src/kernels -I C:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/src/include -I B:/build/ml-libs/MIOpen/build/include -I C:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/include -D WIN32 -D WIN32_LEAN_AND_MEAN -D _CRT_SECURE_NO_WARNINGS -D NOMINMAX -D _ENABLE_EXTENDED_ALIGNED_STORAGE -D NDEBUG -D _DLL -D _MT -U __HCC__ -U __LP64__ -internal-isystem C:\\2E1C510A-F3EC-4287-AB5A-59025DAF1B15\\build\\core\\clr\\dist\\lib\\llvm\\lib\\clang\\20\\include -internal-isystem "C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.44.35207\\include" -internal-isystem "C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.44.35207\\ATLMFC\\include" -internal-isystem "C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Auxiliary\\VS\\include" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.26100.0\\ucrt" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\\\include\\10.0.26100.0\\\\um" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\\\include\\10.0.26100.0\\\\shared" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\\\include\\10.0.26100.0\\\\winrt" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\\\include\\10.0.26100.0\\\\cppwinrt" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\NETFXSDK\\4.8\\include\\um" -internal-isystem "C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.44.35207\\include" -internal-isystem "C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.44.35207\\ATLMFC\\include" -internal-isystem "C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Auxiliary\\VS\\include" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.26100.0\\ucrt" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\\\include\\10.0.26100.0\\\\um" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\\\include\\10.0.26100.0\\\\shared" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\\\include\\10.0.26100.0\\\\winrt" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\\\include\\10.0.26100.0\\\\cppwinrt" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\NETFXSDK\\4.8\\include\\um" -internal-isystem C:\\2E1C510A-F3EC-4287-AB5A-59025DAF1B15\\build\\core\\clr\\dist\\lib\\llvm\\lib\\clang\\20\\include -internal-isystem "C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.44.35207\\include" -internal-isystem "C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.44.35207\\ATLMFC\\include" -internal-isystem "C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Auxiliary\\VS\\include" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.26100.0\\ucrt" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\\\include\\10.0.26100.0\\\\um" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\\\include\\10.0.26100.0\\\\shared" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\\\include\\10.0.26100.0\\\\winrt" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\\\include\\10.0.26100.0\\\\cppwinrt" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\NETFXSDK\\4.8\\include\\um" -internal-isystem "C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.44.35207\\include" -internal-isystem "C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.44.35207\\ATLMFC\\include" -internal-isystem "C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Auxiliary\\VS\\include" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.26100.0\\ucrt" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\\\include\\10.0.26100.0\\\\um" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\\\include\\10.0.26100.0\\\\shared" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\\\include\\10.0.26100.0\\\\winrt" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\10\\\\include\\10.0.26100.0\\\\cppwinrt" -internal-isystem "C:\\Program Files (x86)\\Windows Kits\\NETFXSDK\\4.8\\include\\um" -O3 -Wno-documentation-unknown-command -Wno-documentation-pedantic -Wno-unused-command-line-argument -Wno-explicit-specialization-storage-class -Wno-ignored-attributes -Wno-unknown-attributes -Wno-duplicate-decl-specifier -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-ignored-qualifiers -Wno-sign-compare -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-conversion -Wno-double-promotion -Wno-exit-time-destructors -Wno-extra-semi -Wno-extra-semi-stmt -Wno-float-conversion -Wno-gnu-anonymous-struct -Wno-gnu-zero-variadic-macro-arguments -Wno-missing-prototypes -Wno-nested-anon-types -Wno-option-ignored -Wno-padded -Wno-return-std-move-in-c++11 -Wno-shorten-64-to-32 -Wno-sign-conversion -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-weak-vtables -Wno-covered-switch-default -Wno-unused-result -Wno-unsafe-buffer-usage -Wno-deprecated-declarations -Wno-shadow-uncaptured-local -Wno-global-constructors -Wno-reserved-identifier -Wno-zero-as-null-pointer-constant -Wno-ignored-attributes -Wno-deprecated -Wno-incompatible-pointer-types -Wno-old-style-cast -Wno-unknown-attributes -Wno-microsoft-cpp-macro -Wno-microsoft-enum-value -Wno-language-extension-token -Wno-c++11-narrowing -Wno-float-equal -Wno-redundant-parens -Wno-format-nonliteral -Wno-unused-template -Wno-comma -Wno-suggest-destructor-override -Wno-switch-enum -Wno-shift-sign-overflow -Wno-suggest-override -Wno-inconsistent-missing-destructor-override -Wno-cast-function-type -Wno-nonportable-system-include-path -Wno-documentation -Wno-deprecated-builtins -Wno-enum-constexpr-conversion -Wno-unused-value -Wno-unused-parameter -Wno-missing-noreturn -Wno-tautological-constant-out-of-range-compare -Wno-c++20-extensions -Wno-unique-object-duplication -Wno-switch-default -Wno-nontrivial-memcall -Wno-undef -std=c++20 -ferror-limit 19 -fhip-new-launch-api -fno-use-cxa-atexit -fms-extensions -fms-compatibility -fms-compatibility-version=19.44.35215 -fno-implicit-modules -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -fcolor-diagnostics -vectorize-loops -vectorize-slp --dependent-lib=msvcrt -fcuda-include-gpubinary C:\\Users\\ContainerAdministrator\\AppData\\Local\\Temp\\smoke_solver_ConvAsmImplicitGemmGTCDynamicXdlopsNHWC_bf16-809b30.hipfb -cuid=bf2425f0600af3e8 -fcuda-allow-variadic-functions -faddrsig -o test/gtest/CMakeFiles/miopen_gtest.dir/smoke_solver_ConvAsmImplicitGemmGTCDynamicXdlopsNHWC_bf16.cpp.obj -x hip C:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/test/gtest/smoke_solver_ConvAsmImplicitGemmGTCDynamicXdlopsNHWC_bf16.cpp [MIOpen] 1. <eof> parser at end of file [MIOpen] 2. C:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/test/gtest\gtest_common.hpp:279:6: instantiating function definition 'invoke_with_params<conv2d_driver, GPU_Conv2dTuning_BFP16, void (&)(const std::basic_string<char> &)>' [MIOpen] 3. C:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/test/gtest\../driver.hpp:1356:6: instantiating function definition 'test_drive<conv2d_driver>' [MIOpen] 4. C:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/test/gtest\../driver.hpp:1337:6: instantiating function definition 'test_drive_impl<conv2d_driver<double>>' [MIOpen] 5. C:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/test/gtest\../driver.hpp:1233:6: instantiating function definition 'test_drive_impl_1<conv2d_driver<double>>' [MIOpen] 6. C:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/test/gtest\../driver.hpp:938:10: instantiating function definition 'test_driver::base_run<conv2d_driver<double>>' [MIOpen] 7. C:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/test/gtest\..\conv_common.hpp:1962:10: instantiating function definition 'conv_driver<double>::run' [MIOpen] 8. C:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/test/gtest\../driver.hpp:910:10: instantiating function definition 'test_driver::verify<verify_backward_weights_conv<ConvApi::Find_1_0, double>>' [MIOpen] 9. C:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/test/gtest\../driver.hpp:798:10: instantiating function definition 'test_driver::verify_impl<(lambda at C:\home\runner\_work\TheRock\TheRock\rocm-libraries\projects\miopen\test\gtest\..\driver.hpp:913:13), verify_backward_weights_conv<ConvApi::Find_1_0, double> &>' [MIOpen] 10. C:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/test/gtest\../driver.hpp:746:10: instantiating function definition 'test_driver::run_cpu<verify_backward_weights_conv<ConvApi::Find_1_0, double>>' [MIOpen] 11. C:/home/runner/_work/TheRock/TheRock/rocm-libraries/projects/miopen/test/gtest\..\ford.hpp:56:6: instantiating function definition 'then<tensor<double>, (lambda at C:\home\runner\_work\TheRock\TheRock\rocm-libraries\projects\miopen\test\gtest\..\driver.hpp:768:46)>' [MIOpen] 12. C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\include\future:1362:81: instantiating function definition 'std::async<(lambda at C:\home\runner\_work\TheRock\TheRock\rocm-libraries\projects\miopen\test\gtest\..\ford.hpp:59:23)>' [MIOpen] 13. C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\include\future:1350:41: instantiating function definition 'std::_Get_associated_state<tensor<double>, std::_Fake_no_copy_callable_adapter<(lambda at C:\home\runner\_work\TheRock\TheRock\rocm-libraries\projects\miopen\test\gtest\..\ford.hpp:59:23)>>' [MIOpen] 14. C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\include\future:597:5: instantiating function definition 'std::_Deferred_async_state<tensor<double>>::_Deferred_async_state<std::_Fake_no_copy_callable_adapter<(lambda at C:\home\runner\_work\TheRock\TheRock\rocm-libraries\projects\miopen\test\gtest\..\ford.hpp:59:23)>>' [MIOpen] 15. C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\include\type_traits:53:16: instantiating variable definition 'std::conjunction_v<std::negation<std::is_same<std::_Fake_no_copy_callable_adapter<(lambda at C:\home\runner\_work\TheRock\TheRock\rocm-libraries\projects\miopen\test\gtest\..\ford.hpp:59:23)>, std::function<tensor<double> ()>>>, std::_Is_invocable_r<tensor<double>, std::_Fake_no_copy_callable_adapter<(lambda at C:\home\runner\_work\TheRock\TheRock\rocm-libraries\projects\miopen\test\gtest\..\ford.hpp:59:23)> &>>' [MIOpen] 16. C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\include\type_traits:45:8: instantiating class definition 'std::conjunction<std::negation<std::is_same<std::_Fake_no_copy_callable_adapter<(lambda at C:\home\runner\_work\TheRock\TheRock\rocm-libraries\projects\miopen\test\gtest\..\ford.hpp:59:23)>, std::function<tensor<double> ()>>>, std::_Is_invocable_r<tensor<double>, std::_Fake_no_copy_callable_adapter<(lambda at C:\home\runner\_work\TheRock\TheRock\rocm-libraries\projects\miopen\test\gtest\..\ford.hpp:59:23)> &>>' [MIOpen] 17. C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\include\type_traits:35:8: instantiating class definition 'std::_Conjunction<true, std::negation<std::is_same<std::_Fake_no_copy_callable_adapter<(lambda at C:\home\runner\_work\TheRock\TheRock\rocm-libraries\projects\miopen\test\gtest\..\ford.hpp:59:23)>, std::function<tensor<double> ()>>>, std::_Is_invocable_r<tensor<double>, std::_Fake_no_copy_callable_adapter<(lambda at C:\home\runner\_work\TheRock\TheRock\rocm-libraries\projects\miopen\test\gtest\..\ford.hpp:59:23)> &>>' [MIOpen] 18. C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\include\type_traits:1827:8: instantiating class definition 'std::_Is_invocable_r<tensor<double>, std::_Fake_no_copy_callable_adapter<(lambda at C:\home\runner\_work\TheRock\TheRock\rocm-libraries\projects\miopen\test\gtest\..\ford.hpp:59:23)> &>' [MIOpen] 19. C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\include\future:1341:20: instantiating function definition 'std::_Fake_no_copy_callable_adapter<(lambda at C:\home\runner\_work\TheRock\TheRock\rocm-libraries\projects\miopen\test\gtest\..\ford.hpp:59:23)>::operator()' [MIOpen] 20. C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\include\future:1314:16: instantiating function definition 'std::_Invoke_stored<(lambda at C:\home\runner\_work\TheRock\TheRock\rocm-libraries\projects\miopen\test\gtest\..\ford.hpp:59:23)>' [MIOpen] 21. C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\include\future:1308:16: instantiating function definition 'std::_Invoke_stored_explicit<(lambda at C:\home\runner\_work\TheRock\TheRock\rocm-libraries\projects\miopen\test\gtest\..\ford.hpp:59:23), 0ULL>' [MIOpen] Exception Code: 0xC0000005 [MIOpen] #0 0x00007ff64a1882be (C:\2E1C510A-F3EC-4287-AB5A-59025DAF1B15\build\core\clr\dist\lib\llvm\bin\clang++.exe+0x15082be) [MIOpen] #1 0x00007ff64bb9af20 (C:\2E1C510A-F3EC-4287-AB5A-59025DAF1B15\build\core\clr\dist\lib\llvm\bin\clang++.exe+0x2f1af20) ``` It also increased MIOpen test times substantially: Before: https://github.com/ROCm/TheRock/actions/runs/17447546026 * Linux mi325 50m * Linux mi355 1h7m After: https://github.com/ROCm/TheRock/actions/runs/17458318068 * Linux mi325 1h30m * Linux mi355 1h54m (very close to a 2 hour timeout)
|
DivLOGs.txt GPU detected as amdgpu 0000:0a:00.0 (RDNA4 - gfx_v12_0) text The problem appears to be Thunderbolt PCIe limitations preventing proper ROCm compute operations, even though basic GPU initialization works fine. This after installing, it BTW never worked before also with the wheel from here, More digging: Tell GitHub: You can follow the story at |
The path
Seems now to be
Also, some questions:
I'm trying to replicate the windows builds for gfx110x, etc by @jammm but it seems that the upstream rocm is too much of a moving target for the hipBLAS patches. I also wasn't sure what the recommended branch was, so tried checking out the exact commit the releases were made under, and then copying the .tar.gz sources over the top.
I've returned to building from this "main" (gfx1151) branch as that seems to be fine, though only has external-build files for pytorch 2.6.
The other question I have is related to a number of bash commands/scripts that appear in the build process. Are you actually using bash (cygwin or msys2 or smth else?). I have no issue with doing likewise, it's just that I'm never sure where to draw the line: e.g. use cygwin's python or windows python? cygwin's cmake or windows cmake? it seems that eventually something is going to get upset about the pathing being incompatible.