-
Notifications
You must be signed in to change notification settings - Fork 236
GPU mixed krylov -- WIP #1423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
GPU mixed krylov -- WIP #1423
Conversation
src/config/HYPRE_config.h.in
Outdated
| #undef HAVE_INTTYPES_H | ||
|
|
||
| /* Define to 1 if you have the <memory.h> header file. */ | ||
| #undef HAVE_MEMORY_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably slipped in due to the usage of autoconf 2.69 instead of 2.71 (the version we've been using)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rfalgout and @victorapm . I will look into this. You are right, I generated it with autoconf v2.69.
|
You may be aware of this already, but just in case... This doesn't build cleanly on my mac currently. There are a bunch of unused variable warnings, nothing major. |
Thanks @rfalgout . Can you share which build options you are using? |
|
| HYPRE_Int hypre_CSRMatrixResize( hypre_CSRMatrix *matrix, HYPRE_Int new_num_rows, | ||
| HYPRE_Int new_num_cols, HYPRE_Int new_num_nonzeros ); | ||
| HYPRE_Int hypre_CSRMatrixEliminateRowsCols(hypre_CSRMatrix *A, HYPRE_Int nrows, HYPRE_Int *rows); | ||
| HYPRE_Int hypre_CSRMatrixResetData(hypre_CSRMatrix *matrix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently only called in one place. I am wondering if we really need it, but that is probably a discussion we need to have in the larger context of the ownership question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes currently it is only used in the mixed-precision convert function. It provides a workaround for the ownership question, so perhaps we can revisit it later. It may also be useful if one needs to reset matrix data while keeping the structure unchanged?
| * mixed precision code. */ | ||
| typedef double hypre_double; | ||
| typedef float hypre_float; | ||
| #if defined (HYPRE_USING_GPU) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious. Will this be true for all GPUs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rfalgout as far as I know, all GPUs that we support (NVIDIA, AMD and intel) represent long doubles as doubles (if the name of the type is supported). @victorapm @waynemitchell @liruipeng correct me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right for CUDA and HIP. With SYCL, it might be possible to get full 128-bit double support (I heard there's some hardware support on Intel PVC cards), but I'm not so sure about it. Wayne can probably comment better.
|
This looks good so far, Daniel! Thanks! |
| /*-------------------------------------------------------------------------- | ||
| * Mixed-Precision hypre_ParVectorAxpy | ||
| * Mixed-precision matrix conversion | ||
| * Note: This converts only the diag and offd matrices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What else could it convert? At first, I thought that statement meant that you don't reset the precision in the parcsr matrix, which would be bad, but you actually do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ulrikeyang this comment was just to specify that we only convert the main csr matrices (diag and offdiag) in the hypre_ParCSRMatrix_struct struct. There are others like diagT and offdT, which I suppose we could convert if they are set (since we are changing the precision afterwards), but I assume they are currently unset and would be generated later.
src/seq_mv/seq_mv_mp.c
Outdated
| { | ||
| hypre_CSRMatrixCopy_pre(precision_A, A, B, 1); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an else statement or return missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be "return hypre_CSRMatrixCopy_pre()". Thanks for catching that.
| return hypre_error_flag; | ||
|
|
||
| /* Call mixed-precision axpy on vector data */ | ||
| return hypre_RealArrayAxpyn_mp(hypre_VectorPrecision (x), xp, hypre_VectorPrecision (y), yp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really Axpyn ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should do y += alpha*x or raw array data (in mixed precision). Are you seeing anything missing?
| # Use awk to avoid issues with spacing | ||
| nm -P *.o *.obj | awk '$2 == "T" {print $1}' | sed -e 's/^_//' -e 's/_$//' | ||
| # Demangle any c++ name mangling and filter _device_stub_ prefixes. | ||
| nm -P *.o *.obj | awk '$2 == "T" {print $1}' | c++filt | sed -e 's/(.*$//' -e 's/^__device_stub__//' -e 's/^_//' -e 's/_$//' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked into the portability of c++flt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have thought a bit about this. From what I can tell, it is supported in binutils, macOS, most BSD systems, MinGW/Cygwin and so should be fairly portable. Ideally we would use the demangle option the "-C" for nm, since it is built into the function, but unfortunately work well. There is also --demangle for nm on GNU/linux systems, but c++filt is more portable.
| cat mup.fixed mup.functions mup.methods | sort | uniq > mup_check.old | ||
|
|
||
| if [ "$BUILD_TYPE" = "GPU" ]; then | ||
| cat mup.fixed mup.fixed.gpu \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will probably need to modify this if we choose to only include the '.gpu' files in those directories that need them. Maybe a better way to do this is to create a 'FILES' variable that starts out having the standard three, then appends the '.gpu' files if needed, then runs only one 'cat' line at the end. This would also be easily extensible if something else comes up in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct. For now, I parsed it to 2>/dev/null to suppress the "file not found messages" so I don't have to check for the files each directory, but your suggestion is probably a better approach.
The goal of this PR is to enable building and testing mixed-precision Krylov solvers on GPUs.
To Do: