[BUG FIX] Fix non-deterministic simulation on GPU (cont'd).#2907
Open
duburcqa wants to merge 4 commits into
Open
[BUG FIX] Fix non-deterministic simulation on GPU (cont'd).#2907duburcqa wants to merge 4 commits into
duburcqa wants to merge 4 commits into
Conversation
4ab7886 to
945276c
Compare
945276c to
8154a00
Compare
51c4132 to
4944633
Compare
4944633 to
dada1c9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Follow-up to #2898. GPU simulation was still not run-to-run reproducible because two contact-pipeline stages consumed contacts in the racy
atomic_addphysical-slot order:group_key= x of whichever contact was physically first in a geom-pair run), so the logical order was racy;(u, v)tolerance sort, so the kept-contact set (and count) varied run-to-run.Fix in
contact.py(both the coop and serialfunc_clamp_prune_and_sort_contacts*kernels):(pos_x, geom_a, geom_b, pos_y, pos_z);(u, v)sort and monotone chain receive deterministic input.Tuned tolerances and hull math are unchanged. CPU is serial/deterministic and unaffected. The perf-dispatch autotuner (monolith vs decomposed under
prefer_decomposed_solver == -1) is a separate timing-based source, already pinned per backend in the test harness and out of scope here.How Has This Been Tested?
New GPU-only
tests/test_rigid_determinism.py: spawns independent processes (in-process resets cannot observe cross-process races) on the authored-decomposition tower, parametrized over solve variant (monolith/decomposed) x pruning (on/off), and compares per-step fingerprints in pipeline order so the first mismatch names the diverging stage (contact set -> narrowphase/pruning, order -> sort, velocity -> solve). Fails onmain, passes with this fix. Tower step time is within measurement noise (<0.2%); no regression.Checklist: