-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amesos2_KLU2 loadA_impl widespread Nalu failures #13737
Comments
Some cases are complex, i.e., higher-order, overset, etc., while some are simple flows: The following tests FAILED: |
@spdomin. Sorry for the issues. I am looking into. Can you give me a suggestion/instruction on how I could reproduce the errors, or are there any log for the runs, that I could look into? My intention was at least for the complex case, it would take the original code path. |
I would be happy to help you set up a Nalu build and run the simplest case, 2d_quad4_channel. The most straight-forward path to explore is that you follow: https://nalu.readthedocs.io/en/latest/source/user/build_manually.html#linux-and-osx I wrapped this in a simple script and it should build somewhat easily. The Trilinos config we use is under: https://github.com/NaluCFD/Nalu/blob/master/build/do-configTrilinos_release while the Nalu config is: https://github.com/NaluCFD/Nalu/blob/master/build/do-configNalu_release Depending on your environment, you might be able to use all of my installations for TPLs. Write me at my work email address for more (Sandia or Stanford). In the meantime, it might be nice to revert so that we do not lose coverage while we sort this out |
Not surprisingly, 2d_quad4_channel passes with one MPI rank and fails for any count higher. |
Thank you, @spdomin. I managed to reproduce errors. Meanwhile, we created a PR to revert the changes. |
Great - let me know if I can help - especially if you find something that Nalu is doing that it probably should not. |
Thank you, @spdomin. I fixed a bug in the PR, and now I see the same results, running the regression tests with/without the changes. Unfortunately, my nalu build with the before-the-change (i.e.,
|
Sorry. ‘Meetings all day and out of pocket. After your revert, last night was clean (see way below). The elemClosed** is a special case. For a low-Mach flow, when closed, the elliptic solve can be singular. However, when relaxing the low-Mach assumption, while adding new terms that allow for a low-speed compressible use case, the system should be fine (and result in a successful test as seen in the last nightly test suite). Send me the error you are seeing and we can work through it. I should be free tomorrow:) 100% tests passed, 0 tests failed out of 84 Label Time Summary: Total Test time (real) = 858.13 sec |
Thank you, @spdomin. We merged a new PR with a fix yesterday. Can you let me know if any issue remains? |
We had one diff: FAILED: Luckily, this configuration has an analytical solution. I will run it out to full convergence tomorrow (Friday) and report back. |
Bug Report
Widespread new throws in many of our Nalu test suite. No real pattern that I can see.
Description
[ascic0204:3563317] *** Process received signal ***
[ascic0204:3563317] Signal: Aborted (6)
[ascic0204:3563317] Signal code: (-6)
terminate called after throwing an instance of 'std::runtime_error'
what(): /fgs/spdomin/nightly/Trilinos/packages/amesos2/src/Amesos2_KLU2_def.hpp:488:
Throw number = 1
Throw test that evaluated to true: nnz_ret != as<local_ordinal_type>(this->globalNumNonZeros_)
Steps to Reproduce
Good:
NaluCFD/Nalu SHA1: aa35b4d3d1dd9cc2d63ea79e1a1d34c3970ed25e
Trilinos/develop SHA1: 0678446
Bad:
NaluCFD/Nalu SHA1: aa35b4d3d1dd9cc2d63ea79e1a1d34c3970ed25e
Trilinos/develop SHA1: 5b94adf
The text was updated successfully, but these errors were encountered: