Skip to content

Conversation

@oseikuffuor1
Copy link
Contributor

The goal of this PR is to enable building and testing mixed-precision Krylov solvers on GPUs.

  • Fixed issues related to building with long double precision on GPUs
  • Added utility functions such as array copy and clone to facilitate mixed-precision code development
  • Added some new mixed-precision matrix and vector functions: clone, copy, convert functions
  • Use hypre_double for dnum_nonzero member in hypre_parcsr_matrix struct and updated support for it within hypre.

To Do:

  • Complete multiprecision build on GPU

#undef HAVE_INTTYPES_H

/* Define to 1 if you have the <memory.h> header file. */
#undef HAVE_MEMORY_H
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably slipped in due to the usage of autoconf 2.69 instead of 2.71 (the version we've been using)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rfalgout and @victorapm . I will look into this. You are right, I generated it with autoconf v2.69.

@rfalgout
Copy link
Contributor

You may be aware of this already, but just in case... This doesn't build cleanly on my mac currently. There are a bunch of unused variable warnings, nothing major.

@oseikuffuor1
Copy link
Contributor Author

You may be aware of this already, but just in case... This doesn't build cleanly on my mac currently. There are a bunch of unused variable warnings, nothing major.

Thanks @rfalgout . Can you share which build options you are using?

@rfalgout
Copy link
Contributor

Thanks @rfalgout . Can you share which build options you are using?

configure --enable-debug --enable-mixed-precision

HYPRE_Int hypre_CSRMatrixResize( hypre_CSRMatrix *matrix, HYPRE_Int new_num_rows,
HYPRE_Int new_num_cols, HYPRE_Int new_num_nonzeros );
HYPRE_Int hypre_CSRMatrixEliminateRowsCols(hypre_CSRMatrix *A, HYPRE_Int nrows, HYPRE_Int *rows);
HYPRE_Int hypre_CSRMatrixResetData(hypre_CSRMatrix *matrix);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently only called in one place. I am wondering if we really need it, but that is probably a discussion we need to have in the larger context of the ownership question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes currently it is only used in the mixed-precision convert function. It provides a workaround for the ownership question, so perhaps we can revisit it later. It may also be useful if one needs to reset matrix data while keeping the structure unchanged?

* mixed precision code. */
typedef double hypre_double;
typedef float hypre_float;
#if defined (HYPRE_USING_GPU)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious. Will this be true for all GPUs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rfalgout as far as I know, all GPUs that we support (NVIDIA, AMD and intel) represent long doubles as doubles (if the name of the type is supported). @victorapm @waynemitchell @liruipeng correct me if I'm wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right for CUDA and HIP. With SYCL, it might be possible to get full 128-bit double support (I heard there's some hardware support on Intel PVC cards), but I'm not so sure about it. Wayne can probably comment better.

@rfalgout
Copy link
Contributor

This looks good so far, Daniel! Thanks!

/*--------------------------------------------------------------------------
* Mixed-Precision hypre_ParVectorAxpy
* Mixed-precision matrix conversion
* Note: This converts only the diag and offd matrices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What else could it convert? At first, I thought that statement meant that you don't reset the precision in the parcsr matrix, which would be bad, but you actually do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulrikeyang this comment was just to specify that we only convert the main csr matrices (diag and offdiag) in the hypre_ParCSRMatrix_struct struct. There are others like diagT and offdT, which I suppose we could convert if they are set (since we are changing the precision afterwards), but I assume they are currently unset and would be generated later.

{
hypre_CSRMatrixCopy_pre(precision_A, A, B, 1);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an else statement or return missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be "return hypre_CSRMatrixCopy_pre()". Thanks for catching that.

return hypre_error_flag;

/* Call mixed-precision axpy on vector data */
return hypre_RealArrayAxpyn_mp(hypre_VectorPrecision (x), xp, hypre_VectorPrecision (y), yp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really Axpyn ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should do y += alpha*x or raw array data (in mixed precision). Are you seeing anything missing?

# Use awk to avoid issues with spacing
nm -P *.o *.obj | awk '$2 == "T" {print $1}' | sed -e 's/^_//' -e 's/_$//'
# Demangle any c++ name mangling and filter _device_stub_ prefixes.
nm -P *.o *.obj | awk '$2 == "T" {print $1}' | c++filt | sed -e 's/(.*$//' -e 's/^__device_stub__//' -e 's/^_//' -e 's/_$//'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you looked into the portability of c++flt?

Copy link
Contributor Author

@oseikuffuor1 oseikuffuor1 Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have thought a bit about this. From what I can tell, it is supported in binutils, macOS, most BSD systems, MinGW/Cygwin and so should be fairly portable. Ideally we would use the demangle option the "-C" for nm, since it is built into the function, but unfortunately work well. There is also --demangle for nm on GNU/linux systems, but c++filt is more portable.

cat mup.fixed mup.functions mup.methods | sort | uniq > mup_check.old

if [ "$BUILD_TYPE" = "GPU" ]; then
cat mup.fixed mup.fixed.gpu \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will probably need to modify this if we choose to only include the '.gpu' files in those directories that need them. Maybe a better way to do this is to create a 'FILES' variable that starts out having the standard three, then appends the '.gpu' files if needed, then runs only one 'cat' line at the end. This would also be easily extensible if something else comes up in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. For now, I parsed it to 2>/dev/null to suppress the "file not found messages" so I don't have to check for the files each directory, but your suggestion is probably a better approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants