You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@wkdarko reported that his MPCD simulation failed in a skewed box for certain domain decompositions. He provided the script below, and I reproduced the issue running on as few as 4 processors with x and y decomposition. Not all decompositions trigger the error.
I suspect that there is a precision / rounding issue setting up the overlapping cells with the domain decomposition and binning the particles. When generalizing the code to non-cubic cells, I found that the implementation was very sensitive to this. I thought I had devised a scheme that would avoid these issues, but apparently not.
To fix this, I think the MPCD domain decomposition strategy should be reworked to follow the one in this paper: https://doi.org/10.1016/j.cpc.2024.109494. Particles would be communicated to the rank that owns the cell they are binned into for the entire collision step, ensuring we don't have this issue. The paper claims this strategy is actually more efficient than the overlapping cell scheme we currently use because it is a point-to-point pattern. However, this will require a substantial effort to implement because the way the cell properties are calculated and collisions are applied also needs to change. In the end, I think that effort is worthwhile because it may help close some of the performance gap noted in this paper too.
I will prioritize working on this, and I will likely open a separate issue to flesh out the scope of work and track progress.
(hoomd-dev) $ bash debug.sh
notice(2): Using domain decomposition: n_x = 2 n_y = 2 n_z = 1.
**ERROR**: (Rank 3): MPCD particle is no longer in the simulation box
Cartesian coordinates:
x: -0.481269 y: 9.48594 z: -2.29128
Grid shift:
x: 0.0157772 y: -0.0215342 z: -0.0098835
Traceback (most recent call last):
File "/home/mphoward/Documents/projects/mpcd_noncubic_cells/setup/debug/debug.py", line 42, in<module>
sim.run(10)
File "/home/mphoward/Documents/code/glotzerlab/build/hoomd/hoomd/simulation.py", line 562, in run
self._cpp_sys.run(steps_int, write_at_start)
RuntimeError: Error computing cell list
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Expected output
No error should be generated.
Platform
Linux, CPU
Installation method
Compiled from source
HOOMD-blue version
5.1.0
Python version
3.12.3
The text was updated successfully, but these errors were encountered:
Description
@wkdarko reported that his MPCD simulation failed in a skewed box for certain domain decompositions. He provided the script below, and I reproduced the issue running on as few as 4 processors with x and y decomposition. Not all decompositions trigger the error.
I suspect that there is a precision / rounding issue setting up the overlapping cells with the domain decomposition and binning the particles. When generalizing the code to non-cubic cells, I found that the implementation was very sensitive to this. I thought I had devised a scheme that would avoid these issues, but apparently not.
To fix this, I think the MPCD domain decomposition strategy should be reworked to follow the one in this paper: https://doi.org/10.1016/j.cpc.2024.109494. Particles would be communicated to the rank that owns the cell they are binned into for the entire collision step, ensuring we don't have this issue. The paper claims this strategy is actually more efficient than the overlapping cell scheme we currently use because it is a point-to-point pattern. However, this will require a substantial effort to implement because the way the cell properties are calculated and collisions are applied also needs to change. In the end, I think that effort is worthwhile because it may help close some of the performance gap noted in this paper too.
I will prioritize working on this, and I will likely open a separate issue to flesh out the scope of work and track progress.
Script
Input files
Output
Expected output
No error should be generated.
Platform
Linux, CPU
Installation method
Compiled from source
HOOMD-blue version
5.1.0
Python version
3.12.3
The text was updated successfully, but these errors were encountered: