[BUG FIX] Fix non-deterministic simulation on GPU (cont'd). by duburcqa · Pull Request #2907 · Genesis-Embodied-AI/genesis-world

duburcqa · 2026-06-06T09:14:16Z

Description

Follow-up to #2898. GPU simulation was still not run-to-run reproducible because two contact-pipeline stages consumed contacts in the racy atomic_add physical-slot order:

the contact sort keyed on a group key read from a physical-order scan (group_key = x of whichever contact was physically first in a geom-pair run), so the logical order was racy;
contact pruning ran its per-bucket hull on the racy within-bucket order through a non-transitive (u, v) tolerance sort, so the kept-contact set (and count) varied run-to-run.

Fix in contact.py (both the coop and serial func_clamp_prune_and_sort_contacts* kernels):

the sort key is now a total order on each contact's own position, (pos_x, geom_a, geom_b, pos_y, pos_z);
each prune bucket is deterministically pre-sorted by position before the coplanarity/hull, so the existing (tuned) (u, v) sort and monotone chain receive deterministic input.

Tuned tolerances and hull math are unchanged. CPU is serial/deterministic and unaffected. The perf-dispatch autotuner (monolith vs decomposed under prefer_decomposed_solver == -1) is a separate timing-based source, already pinned per backend in the test harness and out of scope here.

How Has This Been Tested?

New GPU-only tests/test_rigid_determinism.py: spawns independent processes (in-process resets cannot observe cross-process races) on the authored-decomposition tower, parametrized over solve variant (monolith/decomposed) x pruning (on/off), and compares per-step fingerprints in pipeline order so the first mismatch names the diverging stage (contact set -> narrowphase/pruning, order -> sort, velocity -> solve). Fails on main, passes with this fix. Tower step time is within measurement noise (<0.2%); no regression.

Checklist:

I tagged the title correctly (BUG FIX)
I tested my changes and added instructions on how to test it for reviewers.
I have added tests to cover my changes.
All new and existing tests passed.

duburcqa requested a review from YilingQiao as a code owner June 6, 2026 09:14

duburcqa force-pushed the fix_gpu_contact_determinism branch 3 times, most recently from 4ab7886 to 945276c Compare June 6, 2026 09:34

[BUG FIX] Fix non-deterministic simulation on GPU (cont'd).

8154a00

duburcqa force-pushed the fix_gpu_contact_determinism branch from 945276c to 8154a00 Compare June 6, 2026 09:39

duburcqa added 2 commits June 7, 2026 00:24

Fix GJK on GPU.

59c98d5

More comprehensive unit tests.

b75bdb4

duburcqa force-pushed the fix_gpu_contact_determinism branch from 51c4132 to 4944633 Compare June 6, 2026 22:24

Fix analytical capsule contact.

dada1c9

duburcqa force-pushed the fix_gpu_contact_determinism branch from 4944633 to dada1c9 Compare June 7, 2026 05:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG FIX] Fix non-deterministic simulation on GPU (cont'd).#2907

[BUG FIX] Fix non-deterministic simulation on GPU (cont'd).#2907
duburcqa wants to merge 4 commits into
Genesis-Embodied-AI:mainfrom
duburcqa:fix_gpu_contact_determinism

duburcqa commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

duburcqa commented Jun 6, 2026

Description

How Has This Been Tested?

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant