[FEATURE] Differentiable forward dynamics for rigid body sim. #1808

SonSang · 2025-10-07T23:16:24Z

Description

This PR has following changes:

Unify hibernation / non-hibernation code in rigid_solver_decomp.py
Unify dynamic and static inner loops based on is_backward (static when is_backward=True)
Differentiable formulations for functions in rigid_solver_decomp.py

We do not add additional dimension to the states in the rigid body simulation for the frames, because it incurs too much code change.

Related Issue

Resolves Genesis-Embodied-AI/Genesis#

Motivation and Context

This is the part of the process to make the rigid body simulation to be differentiable.

How Has This Been / Can This Be Tested?

There is a unit test to verify by solving an optimization problem tests/test_grad.py::test_differentiable_rigid.

Screenshots (if appropriate):

Checklist:

I read the CONTRIBUTING document.
I followed the Submitting Code Changes section of CONTRIBUTING document.
I tagged the title correctly (including BUG FIX/FEATURE/MISC/BREAKING)
I updated the documentation accordingly or no change is needed.
I tested my changes and added instructions on how to test it for reviewers.

I have added tests to cover my changes.
All new and existing tests passed.

github-actions · 2025-10-19T07:58:03Z

⚠️ Benchmark Regression Detected
Baselines considered: 5 commits

Commit 1: 948d10d
Commit 2: c413fde
Commit 3: ac29e41
Commit 4: dd405a3
Commit 5: 42a65a2

Thresholds: runtime ≤ −10%, compile ≥ +10%

Runtime FPS

status	benchmark_id	current FPS	baseline FPS	Δ FPS
✅	`batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=False-env=box_pyramid#5-gjk_collision=False-use_contact_island=False`	45,712	30,105	+51.84%
✅	`batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=False-env=box_pyramid#5-gjk_collision=True-use_contact_island=False`	19,600	20,663	-5.14%
✅	`batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=True-env=box_pyramid#5-gjk_collision=False-use_contact_island=False`	16,159	16,819	-3.92%
🔴	`batch_size=30000-constraint_solver=CG-env=anymal_c-gjk_collision=False-use_contact_island=False`	2,356,744	11,879,586	-80.16%
🔴	`batch_size=30000-constraint_solver=CG-env=anymal_c-gjk_collision=True-use_contact_island=False`	2,373,289	11,974,315	-80.18%
🔴	`batch_size=30000-constraint_solver=CG-env=batched_franka-gjk_collision=False-use_contact_island=False`	3,650,183	11,971,721	-69.51%
🔴	`batch_size=30000-constraint_solver=CG-env=batched_franka-gjk_collision=True-use_contact_island=False`	3,560,228	12,006,226	-70.35%
🔴	`batch_size=30000-constraint_solver=Newton-env=anymal_c-gjk_collision=False-use_contact_island=False`	2,353,910	11,966,930	-80.33%
🔴	`batch_size=30000-constraint_solver=Newton-env=anymal_c-gjk_collision=True-use_contact_island=False`	2,331,798	11,929,539	-80.45%
🔴	`batch_size=30000-constraint_solver=Newton-env=batched_franka-gjk_collision=False-use_contact_island=False`	3,576,319	12,001,647	-70.20%
🔴	`batch_size=30000-constraint_solver=Newton-env=batched_franka-gjk_collision=True-use_contact_island=False`	3,486,764	11,965,350	-70.86%
🔴	`batch_size=8192-constraint_solver=CG-env=cube#10-gjk_collision=False-use_contact_island=False`	519,700	1,597,992	-67.48%
🔴	`batch_size=8192-constraint_solver=CG-env=cube#10-gjk_collision=True-use_contact_island=False`	519,931	1,596,952	-67.44%
🔴	`batch_size=8192-constraint_solver=Newton-env=cube#10-gjk_collision=False-use_contact_island=False`	282,152	449,266	-37.20%
🔴	`batch_size=8192-constraint_solver=Newton-env=cube#10-gjk_collision=True-use_contact_island=False`	276,196	448,622	-38.43%

Compile Time

status	benchmark_id	current compile	baseline compile	Δ compile
✅	`batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=False-env=box_pyramid#5-gjk_collision=False-use_contact_island=False`	32	33	-3.03%
✅	`batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=False-env=box_pyramid#5-gjk_collision=True-use_contact_island=False`	37	36	+2.78%
🔴	`batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=True-env=box_pyramid#5-gjk_collision=False-use_contact_island=False`	31	28	+10.71%
✅	`batch_size=30000-constraint_solver=CG-env=anymal_c-gjk_collision=False-use_contact_island=False`	41	38	+7.89%
✅	`batch_size=30000-constraint_solver=CG-env=anymal_c-gjk_collision=True-use_contact_island=False`	41	40	+2.50%
✅	`batch_size=30000-constraint_solver=CG-env=batched_franka-gjk_collision=False-use_contact_island=False`	41	39	+5.13%
✅	`batch_size=30000-constraint_solver=CG-env=batched_franka-gjk_collision=True-use_contact_island=False`	39	41	-4.88%
✅	`batch_size=30000-constraint_solver=Newton-env=anymal_c-gjk_collision=False-use_contact_island=False`	41	39	+5.13%
✅	`batch_size=30000-constraint_solver=Newton-env=anymal_c-gjk_collision=True-use_contact_island=False`	42	40	+5.00%
🔴	`batch_size=30000-constraint_solver=Newton-env=batched_franka-gjk_collision=False-use_contact_island=False`	42	38	+10.53%
✅	`batch_size=30000-constraint_solver=Newton-env=batched_franka-gjk_collision=True-use_contact_island=False`	42	41	+2.44%
✅	`batch_size=8192-constraint_solver=CG-env=cube#10-gjk_collision=False-use_contact_island=False`	42	41	+2.44%
✅	`batch_size=8192-constraint_solver=CG-env=cube#10-gjk_collision=True-use_contact_island=False`	41	39	+5.13%
✅	`batch_size=8192-constraint_solver=Newton-env=cube#10-gjk_collision=False-use_contact_island=False`	43	41	+4.88%
✅	`batch_size=8192-constraint_solver=Newton-env=cube#10-gjk_collision=True-use_contact_island=False`	40	41	-2.44%

examples/collision/pyramid.py

duburcqa · 2025-10-08T07:34:49Z

genesis/engine/entities/rigid_entity/rigid_entity.py

                self._solver.entities_info,
                self._solver._rigid_global_info,
                self._solver._static_rigid_sim_config,
+                False,


Use keyword argument when passing unnamed variables.

duburcqa · 2025-10-08T07:34:55Z

genesis/engine/entities/rigid_entity/rigid_entity.py

                self._solver.entities_info,
                self._solver._rigid_global_info,
                self._solver._static_rigid_sim_config,
+                False,


Use keyword argument when passing unnamed variables.

duburcqa · 2025-10-08T07:36:00Z

genesis/utils/path_planning.py

                        entities_info,
                        rigid_global_info,
                        self._solver._static_rigid_sim_config,
+                        False,


Use keyword argument when passing unnamed variables.

duburcqa · 2025-10-20T15:49:14Z

genesis/utils/geom.py

+    # We need to use [norm_sqr] instead of [norm] to avoid nan gradients in the backward pass. Even when theta = 0,
+    # the gradient of [norm] operation is computed and used (it is multiplied by 0 but still results in nan gradients).
+    thetasq = rotvec.norm_sqr()
+    if thetasq > (gs.EPS * gs.EPS):


Just use gs.EPS ** 2

duburcqa · 2025-10-20T15:49:59Z

genesis/utils/geom.py

+    # We need to use [norm_sqr] instead of [norm] to avoid nan gradients in the backward pass. Even when theta = 0,
+    # the gradient of [norm] operation is computed and used (it is multiplied by 0 but still results in nan gradients).


Could you elaborate? Do you have a reproduction script? This is weird...

Yes, here is the reproduction code.

import gstaichi as ti import genesis as gs import numpy as np gs.init(precision="32", debug=True, backend=gs.cpu) a = ti.field(dtype=gs.ti_vec3, shape=(), needs_grad=True) b = ti.field(dtype=gs.ti_float, shape=(), needs_grad=True) @ti.kernel def foo(use_norm_sqr: ti.template()) -> None: if ti.static(use_norm_sqr): a_norm_sqr = a[None].norm_sqr() if a_norm_sqr < gs.EPS ** 2: b[None] = 0.0 else: b[None] = ti.sqrt(a_norm_sqr) else: a_norm = a[None].norm() # When a_norm is 0, it does not affect [b] and thus the gradient of [a] should be 0. However, it turns out that # the gradient of [a] becomes NaN when a_norm is 0. if a_norm < gs.EPS: b[None] = 0.0 else: b[None] = a_norm for use_norm_sqr in [True, False]: foo(use_norm_sqr) b.grad[None] = 1.0 foo.grad(use_norm_sqr) a_grad_isnan = np.isnan(a.grad.to_numpy()).any() if a_grad_isnan and not use_norm_sqr: print("a.grad is nan when use_norm_sqr is False, maybe because the gradient of the norm operation is NaN when the norm is 0.") print("Even though a_norm does not affect [b] when a_norm is 0, the gradient of the norm operation is seemingly still computed and used.")

As written here, I assume the gradient becomes nan because of the norm operation. Even when the result of the norm operation does not affect the final value, I assume the gradient is computed and reflected in the final gradient (maybe they simply multiply 0 to the nan gradient, which is still nan).

duburcqa · 2025-10-20T15:50:30Z

tests/test_grad.py

+@pytest.mark.precision("32")
+@pytest.mark.parametrize("backend", [gs.cpu])
+def test_differentiable_rigid(show_viewer):
+


Do not skip space here.

duburcqa · 2025-10-20T15:52:37Z

tests/test_grad.py

+
+@pytest.mark.required
+@pytest.mark.precision("32")
+@pytest.mark.parametrize("backend", [gs.cpu])


Why are you enforcing both precision and backend? I guess it should work just fine without enforcing any of them? It is strictly harder to pass on CPU (because it would be implicitly checking more things, related to some optional features being automatically disabled or something)?

duburcqa · 2025-10-20T15:53:47Z

tests/test_grad.py

+    if show_viewer:
+        target = scene.add_entity(
+            gs.morphs.Box(
+                pos=goal_pos.cpu().tolist(),


It should be possible to pass goal_pos directly (same for goal_quat). If not, it is rather gs.morphs.Box that must be fixed to avoid forcing casting to list.

duburcqa · 2025-10-20T16:18:29Z

genesis/engine/solvers/rigid/constraint_noslip.py

Is noslip feature supported when enabling gradient computation? If not, you should raise an exception at init.

duburcqa · 2025-10-20T16:22:29Z

genesis/engine/solvers/rigid/constraint_noslip.py

                constraint_state.Mgrad,
+                constraint_state.Mgrad,  # this will not be used anyway because is_backward is False


After checking, it seems that another variable is specified when is_backward=True, so here if I understand correctly you are setting it to anything because it will not be used in practice. This does work in practice but I don't like it much... What about defining some extra free variable PLACEHOLDER in array_class (0D taichi tensor of type array_class.V) that we could use everywhere an argument is not used? This would clarify the intend and avoid any mistake because you cannot do much with such tensor.

duburcqa · 2025-10-20T16:44:02Z

genesis/utils/array_class.py


+# =========================================== RigidAdjointCache ===========================================
+@dataclasses.dataclass
+class StructRigidAdjointCache:


Could you document in which context this is used and what this quantities are storing?

I know it was not documented for the other data structure, but it was a bad decision. Never to late to raise the bar of our standards.

duburcqa · 2025-10-20T17:14:56Z

genesis/engine/solvers/rigid/rigid_solver_decomp.py

+from genesis.engine.states.cache import QueriedStates
 from genesis.engine.states.solvers import RigidSolverState


What about exposing all *State(s) class on the top-level of states submodule? That would be more convenient than having to go through the hierarchy for each State to import. To do this, just add this kind of lines to genesis/engine/states/__init__.py:

from .solvers import *

duburcqa · 2025-10-20T17:18:21Z

genesis/engine/solvers/rigid/rigid_solver_decomp.py

+        self.init_ckpt()
+
+    def init_ckpt(self):
+        self._ckpt = dict()
+


This is a bad pattern. All the attributes of a class should be unconditionally defined (set to None by default for optional attributes is acceptable), in the __init__ function ONLY. We are violating these basic rules everywhere unfortunately.

If init_ckpt may be called outside init, it should rather be called something like clear_* or reset_*. If it is only called at init, then it should be part of __init__.

github-actions · 2025-10-29T10:06:39Z

⚠️ Benchmark Regression Detected
Baselines considered: 5 commits

Commit 1: 87a8e0b
Commit 2: c3adf9c
Commit 3: fe67d4b
Commit 4: 839b544
Commit 5: 839e214

Thresholds: runtime ≤ −10%, compile ≥ +10%

Runtime FPS

status	benchmark_id	current FPS	baseline FPS	Δ FPS
✅	`batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=False-env=box_pyramid#5-gjk_collision=False-use_contact_island=False`	43,946	30,086	+46.07%
✅	`batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=False-env=box_pyramid#5-gjk_collision=True-use_contact_island=False`	19,992	20,513	-2.54%
✅	`batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=True-env=box_pyramid#5-gjk_collision=False-use_contact_island=False`	16,605	16,756	-0.90%
🔴	`batch_size=30000-constraint_solver=CG-env=anymal_c-gjk_collision=False-use_contact_island=False`	2,059,965	13,411,484	-84.64%
🔴	`batch_size=30000-constraint_solver=CG-env=anymal_c-gjk_collision=True-use_contact_island=False`	2,331,107	13,402,335	-82.61%
🔴	`batch_size=30000-constraint_solver=CG-env=batched_franka-gjk_collision=False-use_contact_island=False`	3,304,798	18,408,303	-82.05%
🔴	`batch_size=30000-constraint_solver=CG-env=batched_franka-gjk_collision=True-use_contact_island=False`	4,030,974	18,336,977	-78.02%
ℹ️	`batch_size=30000-constraint_solver=CG-env=random-gjk_collision=False-use_contact_island=False`	946,802	nan	+nan%
ℹ️	`batch_size=30000-constraint_solver=CG-env=random-gjk_collision=True-use_contact_island=False`	843,972	nan	+nan%
🔴	`batch_size=30000-constraint_solver=Newton-env=anymal_c-gjk_collision=False-use_contact_island=False`	1,999,001	13,036,483	-84.67%
🔴	`batch_size=30000-constraint_solver=Newton-env=anymal_c-gjk_collision=True-use_contact_island=False`	2,081,023	13,004,375	-84.00%
🔴	`batch_size=30000-constraint_solver=Newton-env=batched_franka-gjk_collision=False-use_contact_island=False`	3,109,363	18,417,110	-83.12%
🔴	`batch_size=30000-constraint_solver=Newton-env=batched_franka-gjk_collision=True-use_contact_island=False`	3,256,759	18,607,265	-82.50%
ℹ️	`batch_size=30000-constraint_solver=Newton-env=random-gjk_collision=False-use_contact_island=False`	1,084,131	nan	+nan%
ℹ️	`batch_size=30000-constraint_solver=Newton-env=random-gjk_collision=True-use_contact_island=False`	1,030,587	nan	+nan%
🔴	`batch_size=8192-constraint_solver=CG-env=cube#10-gjk_collision=False-use_contact_island=False`	539,180	1,614,643	-66.61%
🔴	`batch_size=8192-constraint_solver=CG-env=cube#10-gjk_collision=True-use_contact_island=False`	537,193	1,617,820	-66.80%
🔴	`batch_size=8192-constraint_solver=Newton-env=cube#10-gjk_collision=False-use_contact_island=False`	287,082	450,107	-36.22%
🔴	`batch_size=8192-constraint_solver=Newton-env=cube#10-gjk_collision=True-use_contact_island=False`	289,511	449,772	-35.63%

Compile Time

status	benchmark_id	current compile	baseline compile	Δ compile
✅	`batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=False-env=box_pyramid#5-gjk_collision=False-use_contact_island=False`	35	37	-5.41%
✅	`batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=False-env=box_pyramid#5-gjk_collision=True-use_contact_island=False`	35	37	-5.41%
✅	`batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=True-env=box_pyramid#5-gjk_collision=False-use_contact_island=False`	31	31	+0.00%
✅	`batch_size=30000-constraint_solver=CG-env=anymal_c-gjk_collision=False-use_contact_island=False`	42	41	+2.44%
✅	`batch_size=30000-constraint_solver=CG-env=anymal_c-gjk_collision=True-use_contact_island=False`	41	40	+2.50%
✅	`batch_size=30000-constraint_solver=CG-env=batched_franka-gjk_collision=False-use_contact_island=False`	42	42	+0.00%
✅	`batch_size=30000-constraint_solver=CG-env=batched_franka-gjk_collision=True-use_contact_island=False`	42	42	+0.00%
ℹ️	`batch_size=30000-constraint_solver=CG-env=random-gjk_collision=False-use_contact_island=False`	42	nan	+nan%
ℹ️	`batch_size=30000-constraint_solver=CG-env=random-gjk_collision=True-use_contact_island=False`	43	nan	+nan%
✅	`batch_size=30000-constraint_solver=Newton-env=anymal_c-gjk_collision=False-use_contact_island=False`	42	39	+7.69%
✅	`batch_size=30000-constraint_solver=Newton-env=anymal_c-gjk_collision=True-use_contact_island=False`	42	40	+5.00%
✅	`batch_size=30000-constraint_solver=Newton-env=batched_franka-gjk_collision=False-use_contact_island=False`	42	40	+5.00%
✅	`batch_size=30000-constraint_solver=Newton-env=batched_franka-gjk_collision=True-use_contact_island=False`	43	42	+2.38%
ℹ️	`batch_size=30000-constraint_solver=Newton-env=random-gjk_collision=False-use_contact_island=False`	42	nan	+nan%
ℹ️	`batch_size=30000-constraint_solver=Newton-env=random-gjk_collision=True-use_contact_island=False`	43	nan	+nan%
✅	`batch_size=8192-constraint_solver=CG-env=cube#10-gjk_collision=False-use_contact_island=False`	41	39	+5.13%
✅	`batch_size=8192-constraint_solver=CG-env=cube#10-gjk_collision=True-use_contact_island=False`	40	39	+2.56%
✅	`batch_size=8192-constraint_solver=Newton-env=cube#10-gjk_collision=False-use_contact_island=False`	39	39	+0.00%
✅	`batch_size=8192-constraint_solver=Newton-env=cube#10-gjk_collision=True-use_contact_island=False`	41	40	+2.50%

Satvik1701 · 2025-11-08T04:11:58Z

Hey! I was just wondering if this PR is close to being merged soon, since the Heterogeneous Simulation PR is dependent on this. Thanks! PR: #1589

SonSang · 2025-11-08T04:22:14Z

Hey! I was just wondering if this PR is close to being merged soon, since the Heterogeneous Simulation PR is dependent on this. Thanks! PR: #1589

Hey, sorry for being late. This PR needs some polishing to pass some benchmark tests, and I'm working on it. Sorry for being late, I'll try to wrap up as soon as possible.

Satvik1701 · 2025-11-08T04:41:52Z

Hey! I was just wondering if this PR is close to being merged soon, since the Heterogeneous Simulation PR is dependent on this. Thanks! PR: #1589

Hey, sorry for being late. This PR needs some polishing to pass some benchmark tests, and I'm working on it. Sorry for being late, I'll try to wrap up as soon as possible.

Thanks so much! Yeah sorry for being pushy but do you have a rough timeline for this because I'm looking to train with the heterogeneous simulation and might reprioritize somethings.

SonSang · 2025-11-08T04:47:43Z

Hey! I was just wondering if this PR is close to being merged soon, since the Heterogeneous Simulation PR is dependent on this. Thanks! PR: #1589

Hey, sorry for being late. This PR needs some polishing to pass some benchmark tests, and I'm working on it. Sorry for being late, I'll try to wrap up as soon as possible.

Thanks so much! Yeah sorry for being pushy but do you have a rough timeline for this because I'm looking to train with the heterogeneous simulation and might reprioritize somethings.

It's hard to say, but I think at least a week is needed (because code review is needed again for merging). I'll let you know if I have a better estimate.

SonSang requested review from YilingQiao and duburcqa as code owners October 7, 2025 23:16

finalize differentiable pass

2a8c057

SonSang force-pushed the easydiffrigid branch from 9f033bb to 2a8c057 Compare October 18, 2025 10:31

SonSang changed the title ~~[FEATURE] WIP: Differentiable forward dynamics for rigid body simulation (minimal fix)~~ [FEATURE] Differentiable forward dynamics for rigid body simulation (minimal fix) Oct 18, 2025

SonSang added 3 commits October 18, 2025 06:36

Merge branch 'main' into easydiffrigid

48b00ce

minor fix

6e9f65f

fixed bug

dadece9

duburcqa reviewed Oct 20, 2025

View reviewed changes

duburcqa changed the title ~~[FEATURE] Differentiable forward dynamics for rigid body simulation (minimal fix)~~ [FEATURE] Differentiable forward dynamics for rigid body sim. Oct 23, 2025

YilingQiao mentioned this pull request Oct 24, 2025

[FEATURE] Heterogeneous Simulation #1589

Open

7 tasks

SonSang added 6 commits October 28, 2025 16:37

Merge branch 'main' into easydiffrigid

66d71f6

fixed bug

89fa76c

fixing pr

bc38956

fixing pr

bdac4fe

fixed pr

61a9902

Merge branch 'main' into easydiffrigid

82d2251

		# We need to use [norm_sqr] instead of [norm] to avoid nan gradients in the backward pass. Even when theta = 0,
		# the gradient of [norm] operation is computed and used (it is multiplied by 0 but still results in nan gradients).

		constraint_state.Mgrad,
		constraint_state.Mgrad, # this will not be used anyway because is_backward is False

		from genesis.engine.states.cache import QueriedStates
		from genesis.engine.states.solvers import RigidSolverState

[FEATURE] Differentiable forward dynamics for rigid body sim. #1808

Are you sure you want to change the base?

[FEATURE] Differentiable forward dynamics for rigid body sim. #1808

Uh oh!

Conversation

SonSang commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Motivation and Context

How Has This Been / Can This Be Tested?

Screenshots (if appropriate):

Checklist:

Uh oh!

github-actions bot commented Oct 19, 2025

Runtime FPS

Compile Time

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 29, 2025

Runtime FPS

Compile Time

Uh oh!

Satvik1701 commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SonSang commented Nov 8, 2025

Uh oh!

Satvik1701 commented Nov 8, 2025

Uh oh!

SonSang commented Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SonSang commented Oct 7, 2025 •

edited

Loading

Satvik1701 commented Nov 8, 2025 •

edited

Loading