-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ifpack2: rbiluk test failures in kokkos integration builds #12981
Comments
Automatic mention of the @trilinos/ifpack2 team |
1 similar comment
Automatic mention of the @trilinos/ifpack2 team |
@ndellingwood , I was not able to reproduce that test fail on my machine. Is there an easy way to see what the env was for the integration test? |
@jgfouca this the link containing the reproducer cmake files etc. for the gcc/8.3.0 job setup by frameworks: https://trilinos-cdash.sandia.gov/build/1511033/files That link come from this job: https://trilinos-cdash.sandia.gov/build/1511033 and some reproducer instructions are here https://github.com/trilinos/Trilinos/wiki/Reproducing-PR-Testing-Errors If you have access to the ascic machines I'm guessing you can follow the instructions pretty directly. Otherwise, I can send you some slightly modified configuration scripts to try and reproduce on kokkos-dev-2 |
@jgfouca in case it is helpful, the gist is that genconfig (I believe) generates the
If you don't have access to ascic, you can try reproducing using these sems modules (I've done this in the past on kokkos-dev-2):
|
@ndellingwood , I made a mistake and this error is actually easy for me to reproduce. Here's what I've found so far:
So , this is pretty confusing. The block-spiluk impl file is the same for all 3 of the above tests. |
@jgfouca that is odd, though the kokkos-kernels commit at the block spiluk fixes (3.) should be on top of other changes/diffs to kokkos-kernels made between 4.3.0 and the block spiluk fix? |
@ndellingwood , @brian-kelley , I've traced the cause of the fail to this change:
Which came in through this commit:
I don't see what's wrong with Brian's change, but I've double checked and that seems to be the cause. Since only RBILUK is failing, it must be the way I'm using Bsr in that file. |
@ndellingwood It looks like
aren't included as flags in this build, but we need them to build Trilinos for now (that goes for all KK develop/Trilinos integration builds). Or maybe on the KK side, BTW, @jhux2 is now working on allowing offsets other than size_t in the Tpetra stack. |
@brian-kelley , I've further isolated the problem to a hacky thing I needed to do to call Sequential::trsv on CUDA:
To sum up, L_Block and U_Block are device BSRs but I need host BSRs in order to call trsv. I ran into all kinds of problems dealing with the Views I was working with not being quite the right type as to be acceptable to the BsrMatrix constructor, so I ended up doing the reinterpret_cast above. It looked to me like the problem was View<const T*> vs. View<T*>, so I thought I was just casting away const-ness, but maybe the types were mismatched too. Maybe you can think of a better way to do this. |
@jgfouca @brian-kelley thanks for looking into this and resolving! The fix only needed to be applied to the kokkos-kernels@develop branch, not directly in Trilinos (Jim fixed with kokkos/kokkos-kernels#2196) but will be included with the 4.3.01 patch release |
Bug Report
@jgfouca @trilinos/ifpack2
Description
The Trilinos nightly Kokkos integration tests are failing in the Ifpack2_RBILUK_hb_belos_block_serial_MPI_1 test, @jgfouca can you take a look following the recent block spiluk #12908 and ifpack2 rbiluk #12911 updates?
Here are links to the failing jobs:
The jobs run on cee ascic resources (related script files for the gcc/8.3.0 build https://trilinos-cdash.sandia.gov/build/1511033/files), but I can send some slightly modified scripts for kokkos-dev-2 if that is helpful
The Trilinos + Kokkos integration testing was down for awhile (a kokkos_swap issue kokkos/kokkos#6960) that was resolved yesterday which is likely why this didn't show up earlier.
Steps to Reproduce
The text was updated successfully, but these errors were encountered: