GPU mixed krylov -- WIP #1423

oseikuffuor1 · 2025-11-14T10:18:05Z

The goal of this PR is to enable building and testing mixed-precision Krylov solvers on GPUs.

Fixed issues related to building with long double precision on GPUs
Added utility functions such as array copy and clone to facilitate mixed-precision code development
Added some new mixed-precision matrix and vector functions: clone, copy, convert functions
Use hypre_double for dnum_nonzero member in hypre_parcsr_matrix struct and updated support for it within hypre.

To Do:

Complete multiprecision build on GPU

…trix.

rfalgout · 2025-11-14T13:50:04Z

src/config/HYPRE_config.h.in

 #undef HAVE_INTTYPES_H

+/* Define to 1 if you have the <memory.h> header file. */
+#undef HAVE_MEMORY_H


Why is this needed?

This probably slipped in due to the usage of autoconf 2.69 instead of 2.71 (the version we've been using)

Thanks @rfalgout and @victorapm . I will look into this. You are right, I generated it with autoconf v2.69.

rfalgout · 2025-11-14T14:07:09Z

You may be aware of this already, but just in case... This doesn't build cleanly on my mac currently. There are a bunch of unused variable warnings, nothing major.

src/parcsr_mv/HYPRE_parcsr_mv_mp.h

src/parcsr_mv/parcsr_mv_mp.c

src/parcsr_mv/par_csr_matrix.c

oseikuffuor1 · 2025-11-14T16:35:20Z

You may be aware of this already, but just in case... This doesn't build cleanly on my mac currently. There are a bunch of unused variable warnings, nothing major.

Thanks @rfalgout . Can you share which build options you are using?

rfalgout · 2025-11-14T16:39:28Z

Thanks @rfalgout . Can you share which build options you are using?

configure --enable-debug --enable-mixed-precision

src/utilities/utilities.c

rfalgout · 2025-11-14T17:01:41Z

src/seq_mv/_hypre_seq_mv.h

 HYPRE_Int hypre_CSRMatrixResize( hypre_CSRMatrix *matrix, HYPRE_Int new_num_rows,
                                 HYPRE_Int new_num_cols, HYPRE_Int new_num_nonzeros );
 HYPRE_Int hypre_CSRMatrixEliminateRowsCols(hypre_CSRMatrix *A, HYPRE_Int nrows, HYPRE_Int *rows);
+HYPRE_Int hypre_CSRMatrixResetData(hypre_CSRMatrix  *matrix);


This is currently only called in one place. I am wondering if we really need it, but that is probably a discussion we need to have in the larger context of the ownership question.

Yes currently it is only used in the mixed-precision convert function. It provides a workaround for the ownership question, so perhaps we can revisit it later. It may also be useful if one needs to reset matrix data while keeping the structure unchanged?

src/seq_mv/seq_mv_mp.c

rfalgout · 2025-11-14T17:06:09Z

src/utilities/HYPRE_utilities.h

 * mixed precision code. */
 typedef double       hypre_double;
 typedef float        hypre_float;
+#if defined (HYPRE_USING_GPU)


Just curious. Will this be true for all GPUs?

@rfalgout as far as I know, all GPUs that we support (NVIDIA, AMD and intel) represent long doubles as doubles (if the name of the type is supported). @victorapm @waynemitchell @liruipeng correct me if I'm wrong.

That's right for CUDA and HIP. With SYCL, it might be possible to get full 128-bit double support (I heard there's some hardware support on Intel PVC cards), but I'm not so sure about it. Wayne can probably comment better.

src/utilities/_hypre_utilities.hpp

rfalgout · 2025-11-14T17:15:21Z

This looks good so far, Daniel! Thanks!

ulrikeyang · 2025-11-17T17:42:40Z

src/parcsr_mv/parcsr_mv_mp.c

 /*--------------------------------------------------------------------------
- * Mixed-Precision hypre_ParVectorAxpy
+ * Mixed-precision matrix conversion
+ * Note: This converts only the diag and offd matrices


What else could it convert? At first, I thought that statement meant that you don't reset the precision in the parcsr matrix, which would be bad, but you actually do.

@ulrikeyang this comment was just to specify that we only convert the main csr matrices (diag and offdiag) in the hypre_ParCSRMatrix_struct struct. There are others like diagT and offdT, which I suppose we could convert if they are set (since we are changing the precision afterwards), but I assume they are currently unset and would be generated later.

ulrikeyang · 2025-11-17T17:58:49Z

src/seq_mv/seq_mv_mp.c

+   {
+      hypre_CSRMatrixCopy_pre(precision_A, A, B, 1);
+   }
+


Is there an else statement or return missing?

It should be "return hypre_CSRMatrixCopy_pre()". Thanks for catching that.

ulrikeyang · 2025-11-17T18:06:45Z

src/seq_mv/seq_mv_mp.c

-   return hypre_error_flag;
+
+   /* Call mixed-precision axpy on vector data */
+   return hypre_RealArrayAxpyn_mp(hypre_VectorPrecision (x), xp, hypre_VectorPrecision (y), yp,


Is this really Axpyn ?

Should do y += alpha*x or raw array data (in mixed precision). Are you seeing anything missing?

src/utilities/device_utils.c

…gpu-mixed-krylov

…lists.

rfalgout · 2025-11-21T13:35:56Z

src/config/generate_function_list.sh

 # Use awk to avoid issues with spacing
-nm -P *.o *.obj | awk '$2 == "T" {print $1}' | sed -e 's/^_//' -e 's/_$//'
+# Demangle any c++ name mangling and filter _device_stub_ prefixes.
+nm -P *.o *.obj | awk '$2 == "T" {print $1}' | c++filt | sed -e 's/(.*$//' -e 's/^__device_stub__//' -e 's/^_//' -e 's/_$//'


Have you looked into the portability of c++flt?

Yes, I have thought a bit about this. From what I can tell, it is supported in binutils, macOS, most BSD systems, MinGW/Cygwin and so should be fairly portable. Ideally we would use the demangle option the "-C" for nm, since it is built into the function, but unfortunately work well. There is also --demangle for nm on GNU/linux systems, but c++filt is more portable.

rfalgout · 2025-11-21T13:40:31Z

src/config/mup_check_dir.sh

-cat mup.fixed mup.functions mup.methods | sort | uniq  > mup_check.old
+
+if [ "$BUILD_TYPE" = "GPU" ]; then
+    cat mup.fixed mup.fixed.gpu \


You will probably need to modify this if we choose to only include the '.gpu' files in those directories that need them. Maybe a better way to do this is to create a 'FILES' variable that starts out having the standard three, then appends the '.gpu' files if needed, then runs only one 'cat' line at the end. This would also be easily extensible if something else comes up in the future.

Yes, that's correct. For now, I parsed it to 2>/dev/null to suppress the "file not found messages" so I don't have to check for the files each directory, but your suggestion is probably a better approach.

….sh script

…eformatting.

oseikuffuor1 added 6 commits October 21, 2025 16:07

Merge branch 'master' into gpu-mixed-krylov

31928ae

Update autoconf build to prevent use of long double with GPUs.

a8835bb

Add utility functions for mixed-precision function development.

57d6ebb

Add new mixed-precision matrix and vector functionality.

542180e

Updates to support use of hypre_double for dnum_nonzeros in parcsr_ma…

cd69eba

…trix.

Minor updates to test drivers.

7781cad

oseikuffuor1 requested review from rfalgout and ulrikeyang November 14, 2025 10:18