fix the race condition in lu factorization #1850

yhmtsai · 2025-05-23T16:13:19Z

This PR fixes the race condition in lu factorization.

we need a sync before grabbing val[lower_nz] because the warp can modify the entry in the previous iteration.
We only need the warp sync. The wait(dep) implicitly has that if I understand it correctly, so I just move it after the wait(dep).
After move, we need another sync before assigning scale to ensure every thread gets the data before modification.

upsj · 2025-05-25T19:08:10Z

common/cuda_hip/factorization/lu_kernels.cpp

+        // We need to load vals after synchronize.
+        // the next lower_nz might be modified if the dep row has the same col
+        // as next lower_nz's col.
+        const auto val = vals[lower_nz];


Not sure I follow - each warp only modifies memory locations belonging to its row, so there are only data races between threads of the same warp. So is this reordering actually necessary?

What you essentially want to do is add another warp sync at the end of each iteration of the outer loop, to prevent the modification of previous loops racing with ready from the following loop iterations? I would prefer having that sync happen explicitly at the end of the loop than hidden inside the scheduler wait function. I don't think warp syncs should be costly.

Exactly. It is to ensure that the modification of the same warp can be seen from the others when getting it.
Yes, it is what I mean in the PR description. We only need warp sync.
I thought the wait(dep) should also imply that because it make the memory visible in block at least.
I just move that rather than introducing the warp sync

upsj · 2025-05-25T19:08:48Z

common/cuda_hip/factorization/lu_kernels.cpp

        const auto diag = vals[diag_idx];
+        // we need sync to ensure all threads get the data before assigning to
+        // scale.
+        warp.sync();


makes sense, good catch!

yhmtsai added this to the Ginkgo 1.10.0 milestone May 23, 2025

yhmtsai requested review from upsj and a team May 23, 2025 16:13

yhmtsai self-assigned this May 23, 2025

yhmtsai added 1:ST:ready-for-review This PR is ready for review is:bugfix This fixes a bug labels May 23, 2025

ginkgo-bot added mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. type:factorization This is related to the Factorizations labels May 23, 2025

fix the race condition in lu factorization

84c1bbe

yhmtsai force-pushed the fix_factorization_race branch from 4f684b5 to 84c1bbe Compare May 23, 2025 16:42

upsj reviewed May 25, 2025

View reviewed changes

yhmtsai requested a review from a team May 28, 2025 16:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix the race condition in lu factorization #1850

fix the race condition in lu factorization #1850

Uh oh!

yhmtsai commented May 23, 2025

Uh oh!

upsj May 25, 2025

Uh oh!

upsj May 25, 2025 •

edited

Loading

Uh oh!

yhmtsai May 26, 2025

Uh oh!

upsj May 25, 2025

Uh oh!

Uh oh!

fix the race condition in lu factorization #1850

Are you sure you want to change the base?

fix the race condition in lu factorization #1850

Uh oh!

Conversation

yhmtsai commented May 23, 2025

Uh oh!

upsj May 25, 2025

Choose a reason for hiding this comment

Uh oh!

upsj May 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yhmtsai May 26, 2025

Choose a reason for hiding this comment

Uh oh!

upsj May 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

upsj May 25, 2025 •

edited

Loading