-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[FEATURE] Differentiable forward dynamics for rigid body sim. #1808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
9f033bb to
2a8c057
Compare
|
Thresholds: runtime ≤ −10%, compile ≥ +10% Runtime FPS
Compile Time
|
| self._solver.entities_info, | ||
| self._solver._rigid_global_info, | ||
| self._solver._static_rigid_sim_config, | ||
| False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use keyword argument when passing unnamed variables.
| self._solver.entities_info, | ||
| self._solver._rigid_global_info, | ||
| self._solver._static_rigid_sim_config, | ||
| False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use keyword argument when passing unnamed variables.
genesis/utils/path_planning.py
Outdated
| entities_info, | ||
| rigid_global_info, | ||
| self._solver._static_rigid_sim_config, | ||
| False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use keyword argument when passing unnamed variables.
genesis/utils/geom.py
Outdated
| # We need to use [norm_sqr] instead of [norm] to avoid nan gradients in the backward pass. Even when theta = 0, | ||
| # the gradient of [norm] operation is computed and used (it is multiplied by 0 but still results in nan gradients). | ||
| thetasq = rotvec.norm_sqr() | ||
| if thetasq > (gs.EPS * gs.EPS): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use gs.EPS ** 2
genesis/utils/geom.py
Outdated
| # We need to use [norm_sqr] instead of [norm] to avoid nan gradients in the backward pass. Even when theta = 0, | ||
| # the gradient of [norm] operation is computed and used (it is multiplied by 0 but still results in nan gradients). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate? Do you have a reproduction script? This is weird...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, here is the reproduction code.
import gstaichi as ti
import genesis as gs
import numpy as np
gs.init(precision="32", debug=True, backend=gs.cpu)
a = ti.field(dtype=gs.ti_vec3, shape=(), needs_grad=True)
b = ti.field(dtype=gs.ti_float, shape=(), needs_grad=True)
@ti.kernel
def foo(use_norm_sqr: ti.template()) -> None:
if ti.static(use_norm_sqr):
a_norm_sqr = a[None].norm_sqr()
if a_norm_sqr < gs.EPS ** 2:
b[None] = 0.0
else:
b[None] = ti.sqrt(a_norm_sqr)
else:
a_norm = a[None].norm()
# When a_norm is 0, it does not affect [b] and thus the gradient of [a] should be 0. However, it turns out that
# the gradient of [a] becomes NaN when a_norm is 0.
if a_norm < gs.EPS:
b[None] = 0.0
else:
b[None] = a_norm
for use_norm_sqr in [True, False]:
foo(use_norm_sqr)
b.grad[None] = 1.0
foo.grad(use_norm_sqr)
a_grad_isnan = np.isnan(a.grad.to_numpy()).any()
if a_grad_isnan and not use_norm_sqr:
print("a.grad is nan when use_norm_sqr is False, maybe because the gradient of the norm operation is NaN when the norm is 0.")
print("Even though a_norm does not affect [b] when a_norm is 0, the gradient of the norm operation is seemingly still computed and used.")
As written here, I assume the gradient becomes nan because of the norm operation. Even when the result of the norm operation does not affect the final value, I assume the gradient is computed and reflected in the final gradient (maybe they simply multiply 0 to the nan gradient, which is still nan).
tests/test_grad.py
Outdated
| @pytest.mark.precision("32") | ||
| @pytest.mark.parametrize("backend", [gs.cpu]) | ||
| def test_differentiable_rigid(show_viewer): | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not skip space here.
tests/test_grad.py
Outdated
|
|
||
| @pytest.mark.required | ||
| @pytest.mark.precision("32") | ||
| @pytest.mark.parametrize("backend", [gs.cpu]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you enforcing both precision and backend? I guess it should work just fine without enforcing any of them? It is strictly harder to pass on CPU (because it would be implicitly checking more things, related to some optional features being automatically disabled or something)?
tests/test_grad.py
Outdated
| if show_viewer: | ||
| target = scene.add_entity( | ||
| gs.morphs.Box( | ||
| pos=goal_pos.cpu().tolist(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to pass goal_pos directly (same for goal_quat). If not, it is rather gs.morphs.Box that must be fixed to avoid forcing casting to list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is noslip feature supported when enabling gradient computation? If not, you should raise an exception at init.
| constraint_state.Mgrad, | ||
| constraint_state.Mgrad, # this will not be used anyway because is_backward is False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After checking, it seems that another variable is specified when is_backward=True, so here if I understand correctly you are setting it to anything because it will not be used in practice. This does work in practice but I don't like it much... What about defining some extra free variable PLACEHOLDER in array_class (0D taichi tensor of type array_class.V) that we could use everywhere an argument is not used? This would clarify the intend and avoid any mistake because you cannot do much with such tensor.
genesis/utils/array_class.py
Outdated
|
|
||
| # =========================================== RigidAdjointCache =========================================== | ||
| @dataclasses.dataclass | ||
| class StructRigidAdjointCache: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you document in which context this is used and what this quantities are storing?
I know it was not documented for the other data structure, but it was a bad decision. Never to late to raise the bar of our standards.
| from genesis.engine.states.cache import QueriedStates | ||
| from genesis.engine.states.solvers import RigidSolverState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about exposing all *State(s) class on the top-level of states submodule? That would be more convenient than having to go through the hierarchy for each State to import. To do this, just add this kind of lines to genesis/engine/states/__init__.py:
from .solvers import *| self.init_ckpt() | ||
|
|
||
| def init_ckpt(self): | ||
| self._ckpt = dict() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bad pattern. All the attributes of a class should be unconditionally defined (set to None by default for optional attributes is acceptable), in the __init__ function ONLY. We are violating these basic rules everywhere unfortunately.
If init_ckpt may be called outside init, it should rather be called something like clear_* or reset_*. If it is only called at init, then it should be part of __init__.
|
Thresholds: runtime ≤ −10%, compile ≥ +10% Runtime FPS
Compile Time
|
|
Hey! I was just wondering if this PR is close to being merged soon, since the Heterogeneous Simulation PR is dependent on this. Thanks! PR: #1589 |
Hey, sorry for being late. This PR needs some polishing to pass some benchmark tests, and I'm working on it. Sorry for being late, I'll try to wrap up as soon as possible. |
Thanks so much! Yeah sorry for being pushy but do you have a rough timeline for this because I'm looking to train with the heterogeneous simulation and might reprioritize somethings. |
It's hard to say, but I think at least a week is needed (because code review is needed again for merging). I'll let you know if I have a better estimate. |
Description
This PR has following changes:
is_backward(static whenis_backward=True)We do not add additional dimension to the states in the rigid body simulation for the frames, because it incurs too much code change.
Related Issue
Resolves Genesis-Embodied-AI/Genesis#
Motivation and Context
This is the part of the process to make the rigid body simulation to be differentiable.
How Has This Been / Can This Be Tested?
There is a unit test to verify by solving an optimization problem
tests/test_grad.py::test_differentiable_rigid.Screenshots (if appropriate):
Checklist:
Submitting Code Changessection of CONTRIBUTING document.