Skip to content

Conversation

@SonSang
Copy link
Collaborator

@SonSang SonSang commented Oct 7, 2025

Description

This PR has following changes:

  • Unify hibernation / non-hibernation code in rigid_solver_decomp.py
  • Unify dynamic and static inner loops based on is_backward (static when is_backward=True)
  • Differentiable formulations for functions in rigid_solver_decomp.py

We do not add additional dimension to the states in the rigid body simulation for the frames, because it incurs too much code change.

Related Issue

Resolves Genesis-Embodied-AI/Genesis#

Motivation and Context

This is the part of the process to make the rigid body simulation to be differentiable.

How Has This Been / Can This Be Tested?

There is a unit test to verify by solving an optimization problem tests/test_grad.py::test_differentiable_rigid.

Screenshots (if appropriate):

Checklist:

  • I read the CONTRIBUTING document.
  • I followed the Submitting Code Changes section of CONTRIBUTING document.
  • I tagged the title correctly (including BUG FIX/FEATURE/MISC/BREAKING)
  • I updated the documentation accordingly or no change is needed.
  • I tested my changes and added instructions on how to test it for reviewers.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@SonSang SonSang changed the title [FEATURE] WIP: Differentiable forward dynamics for rigid body simulation (minimal fix) [FEATURE] Differentiable forward dynamics for rigid body simulation (minimal fix) Oct 18, 2025
@github-actions
Copy link

⚠️ Benchmark Regression Detected
Baselines considered: 5 commits

Thresholds: runtime ≤ −10%, compile ≥ +10%

Runtime FPS

status benchmark_id current FPS baseline FPS Δ FPS
batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=False-env=box_pyramid#5-gjk_collision=False-use_contact_island=False 45,712 30,105 +51.84%
batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=False-env=box_pyramid#5-gjk_collision=True-use_contact_island=False 19,600 20,663 -5.14%
batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=True-env=box_pyramid#5-gjk_collision=False-use_contact_island=False 16,159 16,819 -3.92%
🔴 batch_size=30000-constraint_solver=CG-env=anymal_c-gjk_collision=False-use_contact_island=False 2,356,744 11,879,586 -80.16%
🔴 batch_size=30000-constraint_solver=CG-env=anymal_c-gjk_collision=True-use_contact_island=False 2,373,289 11,974,315 -80.18%
🔴 batch_size=30000-constraint_solver=CG-env=batched_franka-gjk_collision=False-use_contact_island=False 3,650,183 11,971,721 -69.51%
🔴 batch_size=30000-constraint_solver=CG-env=batched_franka-gjk_collision=True-use_contact_island=False 3,560,228 12,006,226 -70.35%
🔴 batch_size=30000-constraint_solver=Newton-env=anymal_c-gjk_collision=False-use_contact_island=False 2,353,910 11,966,930 -80.33%
🔴 batch_size=30000-constraint_solver=Newton-env=anymal_c-gjk_collision=True-use_contact_island=False 2,331,798 11,929,539 -80.45%
🔴 batch_size=30000-constraint_solver=Newton-env=batched_franka-gjk_collision=False-use_contact_island=False 3,576,319 12,001,647 -70.20%
🔴 batch_size=30000-constraint_solver=Newton-env=batched_franka-gjk_collision=True-use_contact_island=False 3,486,764 11,965,350 -70.86%
🔴 batch_size=8192-constraint_solver=CG-env=cube#10-gjk_collision=False-use_contact_island=False 519,700 1,597,992 -67.48%
🔴 batch_size=8192-constraint_solver=CG-env=cube#10-gjk_collision=True-use_contact_island=False 519,931 1,596,952 -67.44%
🔴 batch_size=8192-constraint_solver=Newton-env=cube#10-gjk_collision=False-use_contact_island=False 282,152 449,266 -37.20%
🔴 batch_size=8192-constraint_solver=Newton-env=cube#10-gjk_collision=True-use_contact_island=False 276,196 448,622 -38.43%

Compile Time

status benchmark_id current compile baseline compile Δ compile
batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=False-env=box_pyramid#5-gjk_collision=False-use_contact_island=False 32 33 -3.03%
batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=False-env=box_pyramid#5-gjk_collision=True-use_contact_island=False 37 36 +2.78%
🔴 batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=True-env=box_pyramid#5-gjk_collision=False-use_contact_island=False 31 28 +10.71%
batch_size=30000-constraint_solver=CG-env=anymal_c-gjk_collision=False-use_contact_island=False 41 38 +7.89%
batch_size=30000-constraint_solver=CG-env=anymal_c-gjk_collision=True-use_contact_island=False 41 40 +2.50%
batch_size=30000-constraint_solver=CG-env=batched_franka-gjk_collision=False-use_contact_island=False 41 39 +5.13%
batch_size=30000-constraint_solver=CG-env=batched_franka-gjk_collision=True-use_contact_island=False 39 41 -4.88%
batch_size=30000-constraint_solver=Newton-env=anymal_c-gjk_collision=False-use_contact_island=False 41 39 +5.13%
batch_size=30000-constraint_solver=Newton-env=anymal_c-gjk_collision=True-use_contact_island=False 42 40 +5.00%
🔴 batch_size=30000-constraint_solver=Newton-env=batched_franka-gjk_collision=False-use_contact_island=False 42 38 +10.53%
batch_size=30000-constraint_solver=Newton-env=batched_franka-gjk_collision=True-use_contact_island=False 42 41 +2.44%
batch_size=8192-constraint_solver=CG-env=cube#10-gjk_collision=False-use_contact_island=False 42 41 +2.44%
batch_size=8192-constraint_solver=CG-env=cube#10-gjk_collision=True-use_contact_island=False 41 39 +5.13%
batch_size=8192-constraint_solver=Newton-env=cube#10-gjk_collision=False-use_contact_island=False 43 41 +4.88%
batch_size=8192-constraint_solver=Newton-env=cube#10-gjk_collision=True-use_contact_island=False 40 41 -2.44%

self._solver.entities_info,
self._solver._rigid_global_info,
self._solver._static_rigid_sim_config,
False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use keyword argument when passing unnamed variables.

self._solver.entities_info,
self._solver._rigid_global_info,
self._solver._static_rigid_sim_config,
False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use keyword argument when passing unnamed variables.

entities_info,
rigid_global_info,
self._solver._static_rigid_sim_config,
False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use keyword argument when passing unnamed variables.

# We need to use [norm_sqr] instead of [norm] to avoid nan gradients in the backward pass. Even when theta = 0,
# the gradient of [norm] operation is computed and used (it is multiplied by 0 but still results in nan gradients).
thetasq = rotvec.norm_sqr()
if thetasq > (gs.EPS * gs.EPS):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use gs.EPS ** 2

Comment on lines 86 to 87
# We need to use [norm_sqr] instead of [norm] to avoid nan gradients in the backward pass. Even when theta = 0,
# the gradient of [norm] operation is computed and used (it is multiplied by 0 but still results in nan gradients).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate? Do you have a reproduction script? This is weird...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, here is the reproduction code.

import gstaichi as ti
import genesis as gs
import numpy as np

gs.init(precision="32", debug=True, backend=gs.cpu)

a = ti.field(dtype=gs.ti_vec3, shape=(), needs_grad=True)
b = ti.field(dtype=gs.ti_float, shape=(), needs_grad=True)

@ti.kernel
def foo(use_norm_sqr: ti.template()) -> None:
    if ti.static(use_norm_sqr):
        a_norm_sqr = a[None].norm_sqr()
        if a_norm_sqr < gs.EPS ** 2:
            b[None] = 0.0
        else:
            b[None] = ti.sqrt(a_norm_sqr)
    else:
        a_norm = a[None].norm()
        # When a_norm is 0, it does not affect [b] and thus the gradient of [a] should be 0. However, it turns out that
        # the gradient of [a] becomes NaN when a_norm is 0.
        if a_norm < gs.EPS:
            b[None] = 0.0
        else:
            b[None] = a_norm

for use_norm_sqr in [True, False]:
    foo(use_norm_sqr)
    b.grad[None] = 1.0
    foo.grad(use_norm_sqr)
    
    a_grad_isnan = np.isnan(a.grad.to_numpy()).any()
    if a_grad_isnan and not use_norm_sqr:
        print("a.grad is nan when use_norm_sqr is False, maybe because the gradient of the norm operation is NaN when the norm is 0.")
        print("Even though a_norm does not affect [b] when a_norm is 0, the gradient of the norm operation is seemingly still computed and used.")

As written here, I assume the gradient becomes nan because of the norm operation. Even when the result of the norm operation does not affect the final value, I assume the gradient is computed and reflected in the final gradient (maybe they simply multiply 0 to the nan gradient, which is still nan).

@pytest.mark.precision("32")
@pytest.mark.parametrize("backend", [gs.cpu])
def test_differentiable_rigid(show_viewer):

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not skip space here.


@pytest.mark.required
@pytest.mark.precision("32")
@pytest.mark.parametrize("backend", [gs.cpu])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you enforcing both precision and backend? I guess it should work just fine without enforcing any of them? It is strictly harder to pass on CPU (because it would be implicitly checking more things, related to some optional features being automatically disabled or something)?

if show_viewer:
target = scene.add_entity(
gs.morphs.Box(
pos=goal_pos.cpu().tolist(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to pass goal_pos directly (same for goal_quat). If not, it is rather gs.morphs.Box that must be fixed to avoid forcing casting to list.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is noslip feature supported when enabling gradient computation? If not, you should raise an exception at init.

Comment on lines 38 to 39
constraint_state.Mgrad,
constraint_state.Mgrad, # this will not be used anyway because is_backward is False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After checking, it seems that another variable is specified when is_backward=True, so here if I understand correctly you are setting it to anything because it will not be used in practice. This does work in practice but I don't like it much... What about defining some extra free variable PLACEHOLDER in array_class (0D taichi tensor of type array_class.V) that we could use everywhere an argument is not used? This would clarify the intend and avoid any mistake because you cannot do much with such tensor.


# =========================================== RigidAdjointCache ===========================================
@dataclasses.dataclass
class StructRigidAdjointCache:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you document in which context this is used and what this quantities are storing?

I know it was not documented for the other data structure, but it was a bad decision. Never to late to raise the bar of our standards.

Comment on lines 13 to 14
from genesis.engine.states.cache import QueriedStates
from genesis.engine.states.solvers import RigidSolverState
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about exposing all *State(s) class on the top-level of states submodule? That would be more convenient than having to go through the hierarchy for each State to import. To do this, just add this kind of lines to genesis/engine/states/__init__.py:

from .solvers import *

Comment on lines 137 to 141
self.init_ckpt()

def init_ckpt(self):
self._ckpt = dict()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bad pattern. All the attributes of a class should be unconditionally defined (set to None by default for optional attributes is acceptable), in the __init__ function ONLY. We are violating these basic rules everywhere unfortunately.

If init_ckpt may be called outside init, it should rather be called something like clear_* or reset_*. If it is only called at init, then it should be part of __init__.

@duburcqa duburcqa changed the title [FEATURE] Differentiable forward dynamics for rigid body simulation (minimal fix) [FEATURE] Differentiable forward dynamics for rigid body sim. Oct 23, 2025
@github-actions
Copy link

⚠️ Benchmark Regression Detected
Baselines considered: 5 commits

Thresholds: runtime ≤ −10%, compile ≥ +10%

Runtime FPS

status benchmark_id current FPS baseline FPS Δ FPS
batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=False-env=box_pyramid#5-gjk_collision=False-use_contact_island=False 43,946 30,086 +46.07%
batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=False-env=box_pyramid#5-gjk_collision=True-use_contact_island=False 19,992 20,513 -2.54%
batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=True-env=box_pyramid#5-gjk_collision=False-use_contact_island=False 16,605 16,756 -0.90%
🔴 batch_size=30000-constraint_solver=CG-env=anymal_c-gjk_collision=False-use_contact_island=False 2,059,965 13,411,484 -84.64%
🔴 batch_size=30000-constraint_solver=CG-env=anymal_c-gjk_collision=True-use_contact_island=False 2,331,107 13,402,335 -82.61%
🔴 batch_size=30000-constraint_solver=CG-env=batched_franka-gjk_collision=False-use_contact_island=False 3,304,798 18,408,303 -82.05%
🔴 batch_size=30000-constraint_solver=CG-env=batched_franka-gjk_collision=True-use_contact_island=False 4,030,974 18,336,977 -78.02%
ℹ️ batch_size=30000-constraint_solver=CG-env=random-gjk_collision=False-use_contact_island=False 946,802 nan +nan%
ℹ️ batch_size=30000-constraint_solver=CG-env=random-gjk_collision=True-use_contact_island=False 843,972 nan +nan%
🔴 batch_size=30000-constraint_solver=Newton-env=anymal_c-gjk_collision=False-use_contact_island=False 1,999,001 13,036,483 -84.67%
🔴 batch_size=30000-constraint_solver=Newton-env=anymal_c-gjk_collision=True-use_contact_island=False 2,081,023 13,004,375 -84.00%
🔴 batch_size=30000-constraint_solver=Newton-env=batched_franka-gjk_collision=False-use_contact_island=False 3,109,363 18,417,110 -83.12%
🔴 batch_size=30000-constraint_solver=Newton-env=batched_franka-gjk_collision=True-use_contact_island=False 3,256,759 18,607,265 -82.50%
ℹ️ batch_size=30000-constraint_solver=Newton-env=random-gjk_collision=False-use_contact_island=False 1,084,131 nan +nan%
ℹ️ batch_size=30000-constraint_solver=Newton-env=random-gjk_collision=True-use_contact_island=False 1,030,587 nan +nan%
🔴 batch_size=8192-constraint_solver=CG-env=cube#10-gjk_collision=False-use_contact_island=False 539,180 1,614,643 -66.61%
🔴 batch_size=8192-constraint_solver=CG-env=cube#10-gjk_collision=True-use_contact_island=False 537,193 1,617,820 -66.80%
🔴 batch_size=8192-constraint_solver=Newton-env=cube#10-gjk_collision=False-use_contact_island=False 287,082 450,107 -36.22%
🔴 batch_size=8192-constraint_solver=Newton-env=cube#10-gjk_collision=True-use_contact_island=False 289,511 449,772 -35.63%

Compile Time

status benchmark_id current compile baseline compile Δ compile
batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=False-env=box_pyramid#5-gjk_collision=False-use_contact_island=False 35 37 -5.41%
batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=False-env=box_pyramid#5-gjk_collision=True-use_contact_island=False 35 37 -5.41%
batch_size=2048-constraint_solver=Newton-enable_mujoco_compatibility=True-env=box_pyramid#5-gjk_collision=False-use_contact_island=False 31 31 +0.00%
batch_size=30000-constraint_solver=CG-env=anymal_c-gjk_collision=False-use_contact_island=False 42 41 +2.44%
batch_size=30000-constraint_solver=CG-env=anymal_c-gjk_collision=True-use_contact_island=False 41 40 +2.50%
batch_size=30000-constraint_solver=CG-env=batched_franka-gjk_collision=False-use_contact_island=False 42 42 +0.00%
batch_size=30000-constraint_solver=CG-env=batched_franka-gjk_collision=True-use_contact_island=False 42 42 +0.00%
ℹ️ batch_size=30000-constraint_solver=CG-env=random-gjk_collision=False-use_contact_island=False 42 nan +nan%
ℹ️ batch_size=30000-constraint_solver=CG-env=random-gjk_collision=True-use_contact_island=False 43 nan +nan%
batch_size=30000-constraint_solver=Newton-env=anymal_c-gjk_collision=False-use_contact_island=False 42 39 +7.69%
batch_size=30000-constraint_solver=Newton-env=anymal_c-gjk_collision=True-use_contact_island=False 42 40 +5.00%
batch_size=30000-constraint_solver=Newton-env=batched_franka-gjk_collision=False-use_contact_island=False 42 40 +5.00%
batch_size=30000-constraint_solver=Newton-env=batched_franka-gjk_collision=True-use_contact_island=False 43 42 +2.38%
ℹ️ batch_size=30000-constraint_solver=Newton-env=random-gjk_collision=False-use_contact_island=False 42 nan +nan%
ℹ️ batch_size=30000-constraint_solver=Newton-env=random-gjk_collision=True-use_contact_island=False 43 nan +nan%
batch_size=8192-constraint_solver=CG-env=cube#10-gjk_collision=False-use_contact_island=False 41 39 +5.13%
batch_size=8192-constraint_solver=CG-env=cube#10-gjk_collision=True-use_contact_island=False 40 39 +2.56%
batch_size=8192-constraint_solver=Newton-env=cube#10-gjk_collision=False-use_contact_island=False 39 39 +0.00%
batch_size=8192-constraint_solver=Newton-env=cube#10-gjk_collision=True-use_contact_island=False 41 40 +2.50%

@Satvik1701
Copy link

Satvik1701 commented Nov 8, 2025

Hey! I was just wondering if this PR is close to being merged soon, since the Heterogeneous Simulation PR is dependent on this. Thanks! PR: #1589

@SonSang
Copy link
Collaborator Author

SonSang commented Nov 8, 2025

Hey! I was just wondering if this PR is close to being merged soon, since the Heterogeneous Simulation PR is dependent on this. Thanks! PR: #1589

Hey, sorry for being late. This PR needs some polishing to pass some benchmark tests, and I'm working on it. Sorry for being late, I'll try to wrap up as soon as possible.

@Satvik1701
Copy link

Hey! I was just wondering if this PR is close to being merged soon, since the Heterogeneous Simulation PR is dependent on this. Thanks! PR: #1589

Hey, sorry for being late. This PR needs some polishing to pass some benchmark tests, and I'm working on it. Sorry for being late, I'll try to wrap up as soon as possible.

Thanks so much! Yeah sorry for being pushy but do you have a rough timeline for this because I'm looking to train with the heterogeneous simulation and might reprioritize somethings.

@SonSang
Copy link
Collaborator Author

SonSang commented Nov 8, 2025

Hey! I was just wondering if this PR is close to being merged soon, since the Heterogeneous Simulation PR is dependent on this. Thanks! PR: #1589

Hey, sorry for being late. This PR needs some polishing to pass some benchmark tests, and I'm working on it. Sorry for being late, I'll try to wrap up as soon as possible.

Thanks so much! Yeah sorry for being pushy but do you have a rough timeline for this because I'm looking to train with the heterogeneous simulation and might reprioritize somethings.

It's hard to say, but I think at least a week is needed (because code review is needed again for merging). I'll let you know if I have a better estimate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants